OpenAI Unveils Top Inference Chip for LLMs - ai inference chip
OpenAI Unveils Top Inference Chip for LLMs

OpenAI has unveiled its first custom-designed chip for artificial intelligence workloads, a processor called Jalapeño built specifically for large language model inference. According to the company, the silicon, produced by chipmaking partner Broadcom, is designed to make systems like ChatGPT run faster and handle the most demanding workloads with greater efficiency.

The Jalapeño Intelligence Processor was shown by OpenAI CEO Sam Altman and Broadcom CEO Hock Tan. It represents a move by AI labs to design their own specialized hardware, following a path already taken by firms like Google with its Tensor Processing Units. The broader goal is to reduce heavy reliance on general-purpose chips sourced from a single dominant supplier in the market.

Related: Unreal Engine update smooths out stuttering issues

This is not a modified existing design.

The device was created from a blank slate specifically for the computational patterns of modern large language model inference. The entire development process, from initial concept all the way through to manufacturing tape-out, took just nine months — a notably compressed timeline the company pointed to as evidence of a close collaboration with the chipmaker.

The chip’s architecture draws on data from systems the firm operates daily, including those powering ChatGPT, Codex, and the API. The design aims to blend the high throughput of leading accelerators with substantially lower latency, a combination that is critical for interactive, real-time applications where response speed matters. They position it as the start of a multi-generation platform built in partnership with the chipmaker and manufacturer Celestica.

Related: Apple clarifies Siri AI Not Human

GPT-5.3-Codex-Spark is among the models being tested on the silicon.

First engineering samples are already up and running machine learning workloads at the device’s target power envelope and operating frequency. Published visuals of the die show eight memory sites surrounding the central compute area, although detailed technical specifications have not been disclosed.

Deployment is scheduled to begin before the end of 2026. The first wave of platforms featuring the processor will be followed by expanded rollouts across subsequent years, gradually extending into more data centers. Over the long term, the firm is committing to building a stable, diversified computing foundation for a growing fleet of services.

Related: 7 Surprising Facts About Networks That Will Blow Your Mind

The announcement reflects a wider industry shift toward custom-designed application-specific circuits.

Last year, they secured a major partnership for a large-scale NVIDIA deployment, reflecting a deliberate effort to diversify sourcing. However, persistent supply chain pressures and a desire for architectural control are driving other companies to develop their own processors, creating a more heterogeneous and resilient compute ecosystem.