Search
Find news
3 results for “h200”

Nvidia Offers Arm-Based Vera Server CPUs to Chinese Clients
Nvidia has informed its Chinese clients that its Arm-based Vera server CPUs may become available for shipment as early as August. This development comes as sales of the company's H200 GPUs in China reportedly remain frozen. Customers are being encouraged to place orders for the upcoming CPU shipments.

New Research Enables 16x LLM Context Compression with Speed and Efficiency Gains
A research team from NYU, Columbia, Princeton, University of Maryland, Harvard, and Lawrence Livermore National Laboratory has developed Latent Context Language Models (LCLMs) to address computational bottlenecks in Large Language Models (LLMs). LCLMs achieve up to 16x input context compression without significant accuracy degradation, producing output 8.8 times faster than previous methods on benchmarks. These encoder-decoder models compress input before it reaches the decoder, directly reducing compute and memory demands. The models and code are open-sourced.

Google Unveils DiffusionGemma for Parallel Text Generation and Self-Correction
Google has released DiffusionGemma, an experimental open-source model that applies the diffusion process, typically used in image generation, to text generation at production scale. Built on the Gemma 4 backbone, DiffusionGemma generates blocks of 256 tokens in parallel, refining them iteratively and self-correcting along the way, unlike traditional sequential language models. This approach allows for significantly faster text generation, with Google reporting up to 4x speed improvements on GPUs compared to standard models, particularly for local inference and low-concurrency deployments. While faster, Google acknowledges that its overall output quality is currently lower than standard Gemma 4.