Inference is adistributed GPU cluster for LLM inference built on Solana. Inference.net is a global network of data centers serving fast, scalable, pay-per-token APIs for models like DeepSeek V3 and Llama 3.3.
sources say NVIDIA has begun pitching its first independent central processing unit (CPU) product, Vera, to Chinese clients. Designed specifically for Agentic AI systems, the chip has entered mass production, marking NVIDIA's attempt to further expand its presence in the Chinese market with a CPU offering.According to sources, some Chinese clients have already shown interest in Vera. One major Chinese cloud computing company plans to procure over 300 servers equipped with dual Vera CPUs for testing, and will decide whether to expand procurement after the tests are completed.Built on the Arm Holdings architecture, Vera is NVIDIA's first independent CPU product. NVIDIA has previously stated that Vera's performance in AI agent-related computing tasks is 1.8 times that of comparable competitor products, and expects the product to contribute approximately $20 billion in revenue by the end of this fiscal year (ending January next year).The report notes that as the AI industry's focus gradually shifts from model training to inference computing, CPUs and custom chips are gaining more attention. Vera also positions NVIDIA to directly compete with Intel and Advanced Micro Devices (AMD), which have long dominated the server CPU market.Sources indicate that due to strict U.S. export restrictions on high-end GPUs, CPUs face relatively smaller regulatory hurdles in the Chinese market compared to GPU products. Currently, some Chinese clients plan to first deploy Vera chips for testing in overseas data centers. Meanwhile, software ecosystem compatibility and existing domestic AI chip deployment frameworks may still impact the subsequent large-scale adoption of Vera. (Reuters)
According to Tech Funding News, AMD CEO Lisa Su announced at London Tech Week that the company will invest up to £2 billion in UK AI infrastructure over the next five years, covering national supercomputing infrastructure development and university research collaborations. Meanwhile, AMD is partnering with Oriole Networks—a startup spun out from University College London (UCL)—to deploy the world’s first large-scale, all-photonic network AI system under the UK government’s £50 million ARIA Inference Scaling Lab initiative. This system integrates Oriole’s PRISM photonic networking platform with AMD Instinct GPUs and EPYC CPUs; by completely eliminating electronic switches from the network core, it reduces core network energy consumption by 81% and cuts GPU idle time from 60% to under 1%.
NVIDIA announced on X platform that it has launched the open-source multimodal model Nemotron 3 Nano Omni today. The model adopts a 30B-A3B mixture-of-experts (MoE) architecture, supports a 256K context window, and can uniformly process video, audio, image, and text inputs. Compared to open-source omnimodal models at a similar interaction level, this model achieves up to a 9x increase in throughput, significantly reducing inference costs and improving scalability. Nemotron 3 Nano Omni is now available on Hugging Face, OpenRouter, and NVIDIA NIM, and has been adopted by enterprises including Aible, Applied Scientific Intelligence, and H Company.
According to Yonhap News Agency, SK Telecom announced the signing of a trilateral memorandum of understanding (MOU) with UK-based chip design company Arm and Korean AI chip startup Rebellions to jointly develop AI data center inference server solutions. Under the agreement, the three parties will integrate Arm’s newly launched AGI CPU with Rebellions’ AI acceleration chip—RebelCard, scheduled for launch in Q3 this year—to jointly develop AI inference servers, which will be tested and validated at SK Telecom’s AI data centers. The Arm AGI CPU is optimized for high-density inference environments and large-scale AI deployments, while the RebelCard is specifically designed for large-scale AI inference.
sources say NVIDIA has begun pitching its first independent central processing unit (CPU) product, Vera, to Chinese clients. Designed specifically for Agentic AI systems, the chip has entered mass production, marking NVIDIA's attempt to further expand its presence in the Chinese market with a CPU offering.According to sources, some Chinese clients have already shown interest in Vera. One major Chinese cloud computing company plans to procure over 300 servers equipped with dual Vera CPUs for testing, and will decide whether to expand procurement after the tests are completed.Built on the Arm Holdings architecture, Vera is NVIDIA's first independent CPU product. NVIDIA has previously stated that Vera's performance in AI agent-related computing tasks is 1.8 times that of comparable competitor products, and expects the product to contribute approximately $20 billion in revenue by the end of this fiscal year (ending January next year).The report notes that as the AI industry's focus gradually shifts from model training to inference computing, CPUs and custom chips are gaining more attention. Vera also positions NVIDIA to directly compete with Intel and Advanced Micro Devices (AMD), which have long dominated the server CPU market.Sources indicate that due to strict U.S. export restrictions on high-end GPUs, CPUs face relatively smaller regulatory hurdles in the Chinese market compared to GPU products. Currently, some Chinese clients plan to first deploy Vera chips for testing in overseas data centers. Meanwhile, software ecosystem compatibility and existing domestic AI chip deployment frameworks may still impact the subsequent large-scale adoption of Vera. (Reuters)
According to Tech Funding News, AMD CEO Lisa Su announced at London Tech Week that the company will invest up to £2 billion in UK AI infrastructure over the next five years, covering national supercomputing infrastructure development and university research collaborations. Meanwhile, AMD is partnering with Oriole Networks—a startup spun out from University College London (UCL)—to deploy the world’s first large-scale, all-photonic network AI system under the UK government’s £50 million ARIA Inference Scaling Lab initiative. This system integrates Oriole’s PRISM photonic networking platform with AMD Instinct GPUs and EPYC CPUs; by completely eliminating electronic switches from the network core, it reduces core network energy consumption by 81% and cuts GPU idle time from 60% to under 1%.
NVIDIA announced on X platform that it has launched the open-source multimodal model Nemotron 3 Nano Omni today. The model adopts a 30B-A3B mixture-of-experts (MoE) architecture, supports a 256K context window, and can uniformly process video, audio, image, and text inputs. Compared to open-source omnimodal models at a similar interaction level, this model achieves up to a 9x increase in throughput, significantly reducing inference costs and improving scalability. Nemotron 3 Nano Omni is now available on Hugging Face, OpenRouter, and NVIDIA NIM, and has been adopted by enterprises including Aible, Applied Scientific Intelligence, and H Company.
According to Yonhap News Agency, SK Telecom announced the signing of a trilateral memorandum of understanding (MOU) with UK-based chip design company Arm and Korean AI chip startup Rebellions to jointly develop AI data center inference server solutions. Under the agreement, the three parties will integrate Arm’s newly launched AGI CPU with Rebellions’ AI acceleration chip—RebelCard, scheduled for launch in Q3 this year—to jointly develop AI inference servers, which will be tested and validated at SK Telecom’s AI data centers. The Arm AGI CPU is optimized for high-density inference environments and large-scale AI deployments, while the RebelCard is specifically designed for large-scale AI inference.