BIG TECH

Why OpenAI's New 'Jalapeño' ASIC Chip Is Nvidia's Worst Nightmare

OpenAI and Broadcom have co-designed Jalapeño, a custom ASIC inference chip built in a record 9 months to slash server costs by 50%.

Published on 6/28/2026

Paying Nvidia a 75% gross margin for the privilege of running ChatGPT is no longer a viable corporate strategy for Sam Altman. On June 24, 2026, OpenAI and Broadcom ended the speculation by introducing Jalapeño, a custom-designed ASIC chip optimized exclusively for LLM inference workloads. This targeted strike on Nvidia’s monopoly was designed, tested, and taped out in a record nine months.

What Is OpenAI’s Custom Chip Jalapeño?

OpenAI’s Jalapeño chip is a custom application-specific integrated circuit (ASIC) designed specifically to handle large language model (LLM) inference. Co-developed with Broadcom, the chip optimizes memory bandwidth and tensor mathematics to run models like ChatGPT at twice the speed of standard merchant silicon.

For the past three years, the tech industry has operated on a simple assumption: if you want to run frontier models, you must buy Nvidia H100 or Blackwell GPUs. This hardware dependency represents a massive bottleneck. While general-purpose graphics processing units (GPUs) are excellent at the training phase of machine learning because of their raw parallel processing capabilities, they are highly inefficient for running already-trained models. Inference is a memory-bound problem, not a compute-bound one. When a user asks a question, the server spends most of its energy shuffling weights from memory chips to processing cores.

Jalapeño strips away the graphics rendering components, display pipelines, and general computing overhead that populate standard GPUs. Instead, it places massive high-bandwidth memory (HBM) packages directly adjacent to a specialized logic die designed for matrix multiplication. Laboratory samples are already running active workloads in San Francisco, specifically executing tests on OpenAI’s GPT-5.3-Codex-Spark model.

Who Manufactured the OpenAI Jalapeño Chip?

Broadcom co-designed the logic die and high-speed input/output systems for the Jalapeño chip, which is fabricated on TSMC’s 3nm process node. The physical server boards, liquid-cooled power delivery systems, and data center racks are assembled by electronics manufacturer Celestica.

Building a chip from scratch usually takes two to three years. OpenAI bypassed this timeline by partnering with Broadcom, utilizing the chip giant’s XPU custom design platform. Broadcom is the hidden architect of the custom silicon trend, having previously co-designed Google’s TPU series and Meta’s MTIA v2 chips. By licensing Broadcom’s established intellectual property for high-speed networking and memory interfaces, OpenAI completed the design-to-tape-out pipeline in just nine months.

However, silicon is useless without a system to house it. This is where Celestica enters the supply chain. As the primary systems integrator for the project, Celestica designs and assembles the custom server blades, multi-rack liquid cooling loops, and high-voltage power shelves required to run Jalapeño at scale. These server systems are built to slot directly into existing data center infrastructures, including the cloud clusters owned by Microsoft.

Can OpenAI’s Custom Chip Replace Nvidia GPUs?

The Jalapeño chip cannot replace Nvidia GPUs for training massive AI models, as it is built exclusively for LLM inference. OpenAI will continue using Nvidia’s Blackwell architecture to train its frontier models, while migrating its live consumer chat traffic to Jalapeño hardware to optimize operational efficiency.

It is a common misconception that custom silicon represents an immediate replacement for Nvidia. In reality, the hardware stack is split in two. Training a model like GPT-5 requires thousands of interconnected processors running for months, exchanging massive amounts of gradient data. Nvidia’s proprietary NVLink interconnect technology and CUDA software ecosystem remain unchallenged in this training domain.

However, once a model is trained, it must be run millions of times a day for users. This is where the massive capital investments in frontier models face a harsh reality check: running inference on general GPUs burns money. By using custom ASICs for inference, OpenAI can offload its daily consumer traffic from expensive Nvidia processors, freeing up those GPUs to run training loops.

Let us look at the hardware specifications comparison between Jalapeño and merchant silicon options:

SpecificationOpenAI Jalapeño ASIC (2026)Nvidia Blackwell B200 (Merchant)Google TPU v5p (Custom)
Primary WorkloadDedicated LLM InferenceGeneral Training & InferenceGeneral Training & Inference
ArchitectureCustom ASIC (Broadcom XPU)General-Purpose GPUCustom ASIC
Fabrication NodeTSMC 3nmTSMC 4NTSMC 4nm
Memory Configuration192GB HBM4192GB HBM3e96GB HBM2e
Target DeploymentMicrosoft Azure / Late 2026Public Cloud / 2025Google Cloud
Software StackCustom OpenAI RuntimeNvidia CUDAGoogle JAX / XLA

How Much Does the Jalapeño Chip Reduce OpenAI’s Server Costs?

OpenAI projects that deploying the Jalapeño ASIC chip at scale will reduce inference costs by approximately 50%. By optimizing memory-to-core transfer speeds and reducing thermal dissipation, the chip significantly lowers the per-query electricity and hardware depreciation costs of running ChatGPT.

Computing cost is the ultimate gatekeeper of the hardware infrastructure race. In 2024, industry estimates placed the cost of running a single ChatGPT query at roughly $0.003, with hardware depreciation and power consumption representing the bulk of that figure. When scaled to hundreds of millions of daily active users, OpenAI’s infrastructure bill easily exceeds several billion dollars annually.

Jalapeño addresses this margin crisis directly. Because the chip is designed to execute only the specific matrix operations used in transformer architectures, it draws substantially less power than a general-purpose GPU. Broadcom’s high-speed Ethernet packaging also allows OpenAI to cluster these chips in high-density racks without the typical networking latency bottlenecks that plague larger data centers. Deployment of these systems is scheduled to begin in late 2026, starting with gigawatt-scale clusters inside Microsoft Azure data centers.

Key Takeaways

  • OpenAI and Broadcom co-designed the Jalapeño chip in nine months using OpenAI’s internal models to accelerate the layout design.
  • Jalapeño is a custom ASIC chip designed exclusively for LLM inference, meaning it cannot be used for training frontier models.
  • Celestica serves as the primary system integrator, building the server boards, power systems, and liquid-cooled racks.
  • The chip is fabricated on TSMC’s 3nm process node and features 192GB of HBM4 memory.
  • Deployments are scheduled to begin in late 2026 across Microsoft Azure data centers to reduce OpenAI’s server operating costs by 50%.

FAQ

What is OpenAI’s custom chip Jalapeño?

OpenAI’s Jalapeño chip is a custom-designed application-specific integrated circuit (ASIC) built to optimize Large Language Model (LLM) inference. Co-developed with Broadcom, the hardware strips away the graphical overhead of standard GPUs to focus entirely on memory bandwidth and matrix multiplication, delivering faster and cheaper ChatGPT queries.

Who manufactured the OpenAI Jalapeño chip?

The logic and memory interfaces of the Jalapeño chip were co-designed by OpenAI and Broadcom, with fabrication outsourced to TSMC’s 3nm semiconductor foundry. The physical server boards, liquid-cooling loops, and server rack assemblies are manufactured and integrated by Celestica.

Can OpenAI’s custom chip replace Nvidia GPUs?

No, the Jalapeño chip is not a training processor and cannot replace Nvidia GPUs for training frontier AI models. OpenAI will continue using Nvidia’s Blackwell GPU architecture for model training, while migrating its active chat traffic to Jalapeño to reduce live inference costs.

How much does the Jalapeño chip reduce OpenAI’s server costs?

OpenAI expects the Jalapeño ASIC to reduce the operational cost of running its models by roughly 50%. The chip achieves this through high energy efficiency, optimized memory layouts that reduce power draw, and high-speed networking that allows dense server clustering.

When will OpenAI deploy the Jalapeño chip?

Initial gigawatt-scale deployments of the Jalapeño chip are scheduled to begin in late 2026. The hardware will be integrated directly into Microsoft Azure data center facilities to support OpenAI’s consumer and enterprise API services.

How does the Jalapeño design process differ from traditional chip manufacturing?

The co-design process between OpenAI and Broadcom was completed in nine months, a fraction of the typical two-year semiconductor development cycle. This design acceleration was achieved by using OpenAI’s own AI models to optimize the logic layout and verify signal paths before manufacturing.

Sources

Continue Reading

Recommended Reports