- Hardware Direct

HBM, or High Bandwidth Memory, is a technology that in recent years has become an indispensable component of hardware used in AI, HPC, and data centers. This isn't just another version of memory—it's a completely new architecture that allows accelerators to run faster, more smoothly, and more energy-efficiently. In this post, you'll see why artificial intelligence engineers love HBM, what its development looks like, how much it really costs—and why, despite these costs, no one is looking back at classic DRAMs anymore.

HBM Is Not Just a New Type of Memory - It's an AI Architecture Like Never Before

Let's start with the basics. High Bandwidth Memory (HBM) is a DRAM technology designed for maximum throughput and minimal power consumption, used mainly in graphics cards and AI accelerators. Unlike classic DDR or GDDR modules, HBM is based on a vertical stacking of DRAM chips connected using TSVs (Through-Silicon Vias) and integrated with the processor using silicon interposers. This isn't just a new type of chip - it's a completely new way of thinking about memory access.

In practice, this architecture means one thing: significantly shorter signal paths, higher packing density, and an unprecedented data bus width. A single HBM stack can have a bus as wide as 1024 bits, whereas GDDR typically operates on 32 bits. HBM3E, introduced in 2023, achieves up to 1229 GB/s of throughput with 48 GB of capacity per stack, which already allows GPUs to handle huge datasets without clogging the buffers.

And that's just the beginning - HBM4 (planned for 2026) is expected to reach 1.6 TB/s with 64 GB of capacity. In the context of AI or HPC infrastructure, this is a game-changer that enables scaling workloads without shifting bottlenecks to the RAM level.

Why HBM Wins Against DDR and GDDR: Throughput, Latency, and Power Consumption Leave No Doubt

Traditional DRAM - whether we're talking about DDR5, GDDR6, or LPDDR—simply can't keep up with the throughput required by modern AI chips. HBM not only offers a much wider interface but also does so at a lower voltage and with less power consumption. In specific numbers: HBM consumes 3.5 to 4.5 times less energy in the PHY layer than GDDR6 for the same throughput. This means not only energy savings but also lower thermal requirements and greater cooling efficiency - crucial for operation in server rooms or data centers.

The differences in throughput are just as significant. GDDR6 tops out at around 600 GB/s, while HBM3E can achieve up to 1.2 TB/s. In turn, data access latencies are lower not only thanks to the physical proximity of the chips but also due to the elimination of some delays resulting from the classic motherboard layout. As a result, an AI model can not only learn faster but also infer faster - and without downtime from waiting for RAM access. That's why if your infrastructure uses intensive computational workloads, switching to HBM isn't an option - it's a prerequisite for stable growth.

AI Loves HBM Because It Has No Choice - Traditional DRAM Can't Keep Up with the Models

It's not just that HBM is faster. It's that today's artificial intelligence models leave no other choice. When you're dealing with hundreds of billions of parameters, tens of thousands of tokens, and daily data inputs measured in petabytes, every microsecond of memory access becomes a bottleneck. Traditional DRAM - even in its DDR5 version - cannot provide the same throughput and density as HBM. As a result, the machine waits. And it's not waiting for the GPU, but for the RAM.

High Bandwidth Memory was created precisely to eliminate this problem. GPUs and AI accelerators like the NVIDIA H100, AMD MI300, or Intel Gaudi 3 have such an enormous demand for bandwidth that their potential would be wasted without HBM. As Jim Handy of Objective Analysis said, without HBM, you'd need several processors instead of one, which still wouldn't produce a similar effect. HBM simplifies the system architecture, reduces the number of components, and at the same time, scales linearly with the growing demands of AI models. Thanks to this, you don't have to expand the entire platform - you just need to use a better memory interface.

How Much Does HBM Cost and Why Is It Worth Every Penny Anyway?

The high price of HBM is no secret - and it can indeed be surprising. One gigabyte of HBM costs about $10.6, while DRAM costs around $2.9, and DDR5 even less. But with HBM, you're not paying for capacity. You're paying for throughput, energy efficiency, and the absence of bottlenecks, which in AI systems translate directly into model training time, and thus into operational costs. According to cost analyses, an NVIDIA H100 GPU costs about $3,000, with half of the price being the HBM memory supplied by SK Hynix.

Despite this price, the operating margin for manufacturers reaches as high as 87% - which means that corporate clients are willing to pay because they gain predictability and performance. It simply adds up. You train the model faster, the system uses less energy, and more effective load management allows you to better utilize hardware resources. And even if you pay more at the start, the real ROI (return on investment) on an annual basis turns out to be more favorable than with cheap systems using classic DRAM. Especially if you're building infrastructure for years, not for a quarter.

Custom HBM Is the Future - Full Control Over Throughput and Consumption

Standard HBM is already a powerful tool, but more and more companies are going a step further. Custom HBM (cHBM) allows for designing memory tailored to specific tasks—such as AI inference, 3D graphics, or quantum simulations. Thanks to the 2.5D architecture, silicon interposers, and proprietary communication channels, companies like Marvell are creating dedicated cHBM chips that exceed the limitations of classic interfaces. In practice, this means greater data density, a better power-to-performance ratio, and better control over the system topology.

This is no longer just memory - it's part of the entire platform design. You can decide how many channels you use, how you distribute data, and what latencies you accept. In a world where performance differences are measured not in percentages but in training times, this flexibility provides a real advantage. According to Samsung, the share of custom HBM is expected to exceed 50% in the coming years because more and more companies need to precisely match hardware to their workload. And contrary to appearances, this isn't an option just for giants - it's enough that you operate in a niche where the standard configuration limits you.

Watch Out for Availability - HBM Could Run Out Faster Than You Can Plan Your Project

Currently, there are only three HBM suppliers on the market: SK Hynix, Samsung, and Micron, and production capabilities are limited by 2.5D packaging technology and demanding testing processes. TSMC has a dominant position in this segment, but they too are signaling capacity constraints in their factories. Today, HBM prices are already rising by 5–10% quarterly, and market reports suggest that availability will be a strategic problem for the next two years.

So, if you're thinking about implementing an AI infrastructure soon, consider HBM availability as a key planning parameter. It's not just about how many units you need, but when you need to contract for them, with whom to negotiate supplies, and how to manage inventory. Many companies are already signing multi-year agreements with suppliers to secure the continuity of their supply chain. If memory is treated as the last "to be selected" item in your project schedule, you risk not being able to buy it before everything goes into production. And then, even the best GPU won't make a difference.

AI Servers – which gpu and cpu are best for deep learning workloads?

Training large-scale AI models is far beyond the capabilities of ordinary desktops.

How does AI inference work, and which server ensures top performance?

From air conditioning to access control - comprehensive requirements for a secure server room

A server room is more than just a space for rack cabinets and blinking LEDs

Server virtualization in practice – how to increase flexibility without investing in new hardware?

Server virtualization is a method to maximize the efficiency of your existing infrastructure

System administrator – the foundation of secure and reliable infrastructure. What does a server admin actually do?

Without them, nothing works as it should.

SSD or HDD in the data center – what really pays off with large data volumes?

SSD or HDD

Cluster computing – what it is, how it works, and why it scales better than traditional servers

Tired of overloaded servers that can’t keep up with your company’s growth?

AI server cooling - how to keep temperatures under control at high TDP?

AI is not only models and data - it is also heat.

Hybrid drives in servers – real savings or unnecessary complication?

Hybrid drives in servers

Dell Power Edge server naming convention

Naming convention of Dell Enterprise products explained

How to choose a server

See our guide to server types. Their strengths and weaknesses.

Cybersecurity Optimization in Accordance with NIS2 Directive

Read whether the NIS 2 directive applies to your bussines.

NVMe drives: how do they work and why should you choose them?

Learn how an NVMe drive works and what are the advantages of using it in modern servers.

New server or recertified server – which one to choose?

See what server renewal is all about and what benefits it brings to your organization.

Advantages of On-Premise IT hardware over cloud solutions

New vulnerability "regreSSHion" in Dell iDRAC modules

Attention! We are reporting a critical security issue that may impact your server.

How to effectively prevent DDoS attacks

Learn how to effectively prevent DDoS attacks

RAID – Data Protection or Unnecessary Expense?

Are RAID arrays real data protection or an unnecessary expense?

How to effectively manage power in a server room?

Do you know how complex energy and power management can be in a Data Center ecosystem?

DNS server not responding? See what to do before you lose your patience.

SNMP protocol – what do you need to know before you start?

What is SNMP and why is it important to know before implementation?

IOPS – The Unsung Hero of Performance. Does Your Drive Have It?

In this post, you will learn what IOPS really means and how to measure it.

TBW – what does this parameter mean and why does it affect the lifespan of an SSD?

TBW (Total Bytes Written) is an indicator that tells you how much data you can write to an SSD over its lifetime.

ECC and non-ECC in IT infrastructure – when must performance give way to reliability?

ECC or non-ECC RAM – a decision that can affect the stability of the entire infrastructure.

How to Understand Networking in the Context of Modern Server Environments?

A computer network is more than just cables and routers – it is the foundation of every company's IT infrastructure.

Intel Processors in Servers and Workstations – How to Decipher Markings and Choose the Right Series

Choosing a processor for a server or workstation is not just about the number of cores.

Remote Server Access Even Without an OS? Get to Know IPMI and Its Capabilities

Remote access to the server, even when the system is down? IPMI makes it possible – without any tricks.

Hardware Direct is Now an Official Proxmox Partner

Hardware Direct is proud to announce that we have become an authorized partner of Proxmox Server Solutions.

High Bandwidth Memory – what is it and why do AI engineers love it?

HBM Is Not Just a New Type of Memory - It's an AI Architecture Like Never Before

Why HBM Wins Against DDR and GDDR: Throughput, Latency, and Power Consumption Leave No Doubt

AI Loves HBM Because It Has No Choice - Traditional DRAM Can't Keep Up with the Models

How Much Does HBM Cost and Why Is It Worth Every Penny Anyway?

Custom HBM Is the Future - Full Control Over Throughput and Consumption

Watch Out for Availability - HBM Could Run Out Faster Than You Can Plan Your Project

CONTACT

TECHNICAL SUPPORT

OUR COMPANY

INFORMATION