OpenAI Broadcom Jalapeño Chip Targets AI Inference at Scale

OpenAI has announced a partnership with Broadcom to design Jalapeño, its first custom AI inference ASIC — a silicon bet on reducing the cost of serving ChatGPT and its API at hyperscale, manufactured by TSMC.

Published: June 28, 2026 By Sarah Chen, AI & Automotive Technology Editor Category: AI

Sarah covers AI, automotive technology, gaming, robotics, quantum computing, and genetics. Experienced technology journalist covering emerging technologies and market trends.

OpenAI Broadcom Jalapeño Chip Targets AI Inference at Scale

SAN FRANCISCO, June 28, 2026OpenAI has unveiled a partnership with Broadcom to design its first custom AI inference chip, codenamed Jalapeño. According to OpenAI's official announcement, the application-specific integrated circuit (ASIC) is purpose-built to run OpenAI's frontier models at scale, and will be manufactured by TSMC using its advanced process node. The move is the most significant step yet in OpenAI's push to control its own compute infrastructure, reducing dependence on commercially available NVIDIA GPUs sourced via Microsoft Azure.

Why OpenAI Is Building Its Own Chip

OpenAI's operating costs are driven overwhelmingly by inference — serving millions of queries across ChatGPT, the API, and enterprise products. General-purpose GPUs are optimised for flexibility and training throughput; inference at the scale OpenAI operates demands a different profile: lower latency per token, higher throughput per watt, and dramatically lower cost per query. Custom silicon built around a known workload can deliver all three.

Broadcom is the logical partner for this effort. The company has previously designed custom AI accelerators for Google's Tensor Processing Units (TPUs) — now on their sixth generation — and for Meta's MTIA inference chip. It is one of a handful of companies globally with the networking, packaging, and chip co-design expertise required to bring a hyperscale ASIC from specification to volume production.

Technical Context: What an Inference ASIC Gains Over GPUs

A purpose-built inference ASIC can be stripped of the programmability overhead that makes GPUs versatile but power-hungry for fixed workloads. Transistor area previously dedicated to general shader cores, large register files, and flexible memory hierarchies can instead be reallocated to the multiply-accumulate (MAC) units and on-chip SRAM that dominate transformer inference. The result is a chip that runs a specific class of model faster and with fewer watts — but cannot easily be repurposed if the model architecture changes substantially.

As documented in IDC's Worldwide Technology Forecast (January 2026), According to longitudinal study data spanning 18 months of market observation, Related: AWS Expands AI Solutions with New Integrations

This tradeoff has historically made ASICs a bet on architectural stability. OpenAI's public commitment to the transformer architecture, combined with the predictable inference patterns of autoregressive language generation, makes that bet credible. Reuters and Bloomberg have both noted that the Jalapeño chip is expected to handle OpenAI's current generation of models, with design flexibility for near-term successors.

For deeper context, see our AI analysis: "Hugging Face Streamlines VLLM Deployment via HF Jobs in 2026".

Competitive Landscape

OpenAI is not the first AI lab to pursue custom silicon. Google has operated its own TPU fleet for a decade and uses it across Search, Workspace, and Gemini inference. Meta's MTIA chip is deployed at scale for recommendation model inference. Amazon operates its own Inferentia and Trainium ASICs across AWS. Microsoft, OpenAI's primary compute partner, has developed the Azure Maia 100 accelerator for its own data centre workloads. OpenAI enters this field later than its infrastructure-owning peers, but with the advantage of designing around a known, high-volume workload from day one.

Additional coverage: Allen AI Hybrid Model Sharpens Token Prediction Accuracy in 2026

CompanyCustom ChipDesignerPrimary Use
GoogleTPU v6 (Trillium)Google / BroadcomTraining + Inference
MetaMTIAMeta / BroadcomRecommendation Inference
AmazonInferentia 3 / Trainium 2AWSInference + Training
MicrosoftAzure Maia 100MicrosoftAzure AI Workloads
OpenAIJalapeño (announced)Broadcom / TSMCInference

What It Means for NVIDIA

NVIDIA's dominance in AI compute has rested partly on the absence of credible alternatives. Each major lab that ships its own inference silicon reduces the total addressable market for H100 and H200 sales at the margin. The announcement does not threaten NVIDIA's training business — training large foundation models remains GPU-dominant — but inference is where volume and operating leverage accumulate. If Jalapeño proves out at scale, it signals that OpenAI will purchase fewer H-series GPUs for inference expansion over the next two to three years, a category that represents meaningful revenue concentration for NVIDIA.

Related: NVIDIA CUDA AI Science Software Accelerates Research Breakthroughs

NVIDIA's Financial Times-reported gross margins on data centre GPUs exceed 75 percent. Custom silicon buyers are, in effect, internalising that margin. For OpenAI — a company that has disclosed operating losses in prior years — the long-run unit economics of owning inference infrastructure are material to its path to profitability.

For deeper context, see our AI analysis: "Nokia Google Cloud Agentic AI Reshapes Telecom Networks".

Timeline and Next Steps

OpenAI has not disclosed a production timeline for Jalapeño. Bringing a new ASIC from tape-out to volume deployment typically takes 18 to 24 months. If the chip entered tape-out in early 2026, volume deployment could begin in 2027. OpenAI will continue to source NVIDIA hardware for training and for inference capacity during the ramp period.

The Broadcom partnership also positions OpenAI within the broader TSMC advanced-node allocation ecosystem. TSMC's N2 and N3 process nodes are heavily subscribed by Apple, NVIDIA, AMD, and the hyperscalers. Securing allocation alongside Broadcom — which already has an established TSMC relationship — gives OpenAI a more credible path to volume production than a cold approach to the foundry would.


Source: OpenAI — Broadcom Jalapeño Inference Chip announcement. Additional context: Broadcom Investor Relations; TSMC Advanced Logic; NVIDIA Newsroom.

Sources include company disclosures, regulatory filings, analyst reports, and industry briefings.

Related Coverage

About the Author

SC

Sarah Chen

AI & Automotive Technology Editor

Sarah covers AI, automotive technology, gaming, robotics, quantum computing, and genetics. Experienced technology journalist covering emerging technologies and market trends.

About Our Mission Editorial Guidelines Corrections Policy Contact

Frequently Asked Questions

What is OpenAI's Jalapeño chip?

Jalapeño is OpenAI's first custom AI inference ASIC, designed in partnership with Broadcom and manufactured by TSMC, purpose-built to run OpenAI's frontier language models at lower cost and higher throughput than general-purpose GPUs.

Who is building the Jalapeño chip with OpenAI?

Broadcom is co-designing the chip alongside OpenAI. Broadcom has previously designed Google's TPU and Meta's MTIA inference chip, making it one of the most experienced custom AI ASIC partners in the industry.

Why is OpenAI building its own chip?

Inference — serving queries from ChatGPT and the API — is OpenAI's largest operating cost. A purpose-built ASIC can deliver lower cost per query, lower latency, and better energy efficiency than general-purpose NVIDIA GPUs for fixed model workloads.

Does this mean OpenAI will stop using NVIDIA GPUs?

Not in the near term. NVIDIA GPUs will continue to be used for training and for inference capacity during the ramp period. Jalapeño targets inference specifically, and volume production would likely be 18–24 months from tape-out.

Who manufactures the Jalapeño chip?

TSMC will manufacture the chip using its advanced process node, leveraging Broadcom's existing TSMC foundry relationship for allocation and production ramp.