GetChain News
中简 中繁 EN
GetChain News
Toggle sidebar
Inference

Inference

Active

Distributed GPU cluster for LLM inference

News Heat Trend

Project Overview

Inference is adistributed GPU cluster for LLM inference built on Solana. Inference.net is a global network of data centers serving fast, scalable, pay-per-token APIs for models like DeepSeek V3 and Llama 3.3.

Event-related news

Sources: NVIDIA plans to pitch Vera AI CPU to Chinese clients, some cloud providers eyeing test deployment

sources say NVIDIA has begun pitching its first independent central processing unit (CPU) product, Vera, to Chinese clients. Designed specifically for Agentic AI systems, the chip has entered mass production, marking NVIDIA's attempt to further expand its presence in the Chinese market with a CPU offering.According to sources, some Chinese clients have already shown interest in Vera. One major Chinese cloud computing company plans to procure over 300 servers equipped with dual Vera CPUs for testing, and will decide whether to expand procurement after the tests are completed.Built on the Arm Holdings architecture, Vera is NVIDIA's first independent CPU product. NVIDIA has previously stated that Vera's performance in AI agent-related computing tasks is 1.8 times that of comparable competitor products, and expects the product to contribute approximately $20 billion in revenue by the end of this fiscal year (ending January next year).The report notes that as the AI industry's focus gradually shifts from model training to inference computing, CPUs and custom chips are gaining more attention. Vera also positions NVIDIA to directly compete with Intel and Advanced Micro Devices (AMD), which have long dominated the server CPU market.Sources indicate that due to strict U.S. export restrictions on high-end GPUs, CPUs face relatively smaller regulatory hurdles in the Chinese market compared to GPU products. Currently, some Chinese clients plan to first deploy Vera chips for testing in overseas data centers. Meanwhile, software ecosystem compatibility and existing domestic AI chip deployment frameworks may still impact the subsequent large-scale adoption of Vera. (Reuters)

Sources: NVIDIA plans to pitch Vera AI CPU to Chinese clients, some cloud providers eyeing test deployment

sources say NVIDIA has begun pitching its first independent central processing unit (CPU) product, Vera, to Chinese clients. Designed specifically for Agentic AI systems, the chip has entered mass production, marking NVIDIA's attempt to further expand its presence in the Chinese market with a CPU offering.According to sources, some Chinese clients have already shown interest in Vera. One major Chinese cloud computing company plans to procure over 300 servers equipped with dual Vera CPUs for testing, and will decide whether to expand procurement after the tests are completed.Built on the Arm Holdings architecture, Vera is NVIDIA's first independent CPU product. NVIDIA has previously stated that Vera's performance in AI agent-related computing tasks is 1.8 times that of comparable competitor products, and expects the product to contribute approximately $20 billion in revenue by the end of this fiscal year (ending January next year).The report notes that as the AI industry's focus gradually shifts from model training to inference computing, CPUs and custom chips are gaining more attention. Vera also positions NVIDIA to directly compete with Intel and Advanced Micro Devices (AMD), which have long dominated the server CPU market.Sources indicate that due to strict U.S. export restrictions on high-end GPUs, CPUs face relatively smaller regulatory hurdles in the Chinese market compared to GPU products. Currently, some Chinese clients plan to first deploy Vera chips for testing in overseas data centers. Meanwhile, software ecosystem compatibility and existing domestic AI chip deployment frameworks may still impact the subsequent large-scale adoption of Vera. (Reuters)

AMD Announces $2.5 Billion Investment in UK AI Infrastructure, Collaborating with Startup Oriole to Deploy the World’s First All-Photonic-Network AI System

According to Tech Funding News, AMD CEO Lisa Su announced at London Tech Week that the company will invest up to £2 billion in UK AI infrastructure over the next five years, covering national supercomputing infrastructure development and university research collaborations. Meanwhile, AMD is partnering with Oriole Networks—a startup spun out from University College London (UCL)—to deploy the world’s first large-scale, all-photonic network AI system under the UK government’s £50 million ARIA Inference Scaling Lab initiative. This system integrates Oriole’s PRISM photonic networking platform with AMD Instinct GPUs and EPYC CPUs; by completely eliminating electronic switches from the network core, it reduces core network energy consumption by 81% and cuts GPU idle time from 60% to under 1%.

NVIDIA Launches Nemotron 3 Nano Omni Model, Boosting Multimodal Inference Efficiency by 9x

NVIDIA announced on X platform that it has launched the open-source multimodal model Nemotron 3 Nano Omni today. The model adopts a 30B-A3B mixture-of-experts (MoE) architecture, supports a 256K context window, and can uniformly process video, audio, image, and text inputs. Compared to open-source omnimodal models at a similar interaction level, this model achieves up to a 9x increase in throughput, significantly reducing inference costs and improving scalability. Nemotron 3 Nano Omni is now available on Hugging Face, OpenRouter, and NVIDIA NIM, and has been adopted by enterprises including Aible, Applied Scientific Intelligence, and H Company.

SK Telecom Collaborates with Arm and Rebellions to Jointly Develop AI Data Center Inference Solutions

According to Yonhap News Agency, SK Telecom announced the signing of a trilateral memorandum of understanding (MOU) with UK-based chip design company Arm and Korean AI chip startup Rebellions to jointly develop AI data center inference server solutions. Under the agreement, the three parties will integrate Arm’s newly launched AGI CPU with Rebellions’ AI acceleration chip—RebelCard, scheduled for launch in Q3 this year—to jointly develop AI inference servers, which will be tested and validated at SK Telecom’s AI data centers. The Arm AGI CPU is optimized for high-density inference environments and large-scale AI deployments, while the RebelCard is specifically designed for large-scale AI inference.

Related news

Ornn AI Launches Token Price Index to Measure the Actual Cost of Inference Tokens from OpenAI and Anthropic

According to PR Newswire, Ornn AI has launched the Ornn Token Price Index (OTPI), which measures the actual cost of tokens generated by AI model developers such as OpenAI and Anthropic. Weighted by the volume of tokens traded for each model, the OTPI is a daily metric expressed in “USD per million tokens,” reflecting how factors—including model usage patterns, input-to-output ratios, and caching—impact actual costs.

Sources: NVIDIA plans to pitch Vera AI CPU to Chinese clients, some cloud providers eyeing test deployment

sources say NVIDIA has begun pitching its first independent central processing unit (CPU) product, Vera, to Chinese clients. Designed specifically for Agentic AI systems, the chip has entered mass production, marking NVIDIA's attempt to further expand its presence in the Chinese market with a CPU offering.According to sources, some Chinese clients have already shown interest in Vera. One major Chinese cloud computing company plans to procure over 300 servers equipped with dual Vera CPUs for testing, and will decide whether to expand procurement after the tests are completed.Built on the Arm Holdings architecture, Vera is NVIDIA's first independent CPU product. NVIDIA has previously stated that Vera's performance in AI agent-related computing tasks is 1.8 times that of comparable competitor products, and expects the product to contribute approximately $20 billion in revenue by the end of this fiscal year (ending January next year).The report notes that as the AI industry's focus gradually shifts from model training to inference computing, CPUs and custom chips are gaining more attention. Vera also positions NVIDIA to directly compete with Intel and Advanced Micro Devices (AMD), which have long dominated the server CPU market.Sources indicate that due to strict U.S. export restrictions on high-end GPUs, CPUs face relatively smaller regulatory hurdles in the Chinese market compared to GPU products. Currently, some Chinese clients plan to first deploy Vera chips for testing in overseas data centers. Meanwhile, software ecosystem compatibility and existing domestic AI chip deployment frameworks may still impact the subsequent large-scale adoption of Vera. (Reuters)

AMD Announces $2.5 Billion Investment in UK AI Infrastructure, Collaborating with Startup Oriole to Deploy the World’s First All-Photonic-Network AI System

According to Tech Funding News, AMD CEO Lisa Su announced at London Tech Week that the company will invest up to £2 billion in UK AI infrastructure over the next five years, covering national supercomputing infrastructure development and university research collaborations. Meanwhile, AMD is partnering with Oriole Networks—a startup spun out from University College London (UCL)—to deploy the world’s first large-scale, all-photonic network AI system under the UK government’s £50 million ARIA Inference Scaling Lab initiative. This system integrates Oriole’s PRISM photonic networking platform with AMD Instinct GPUs and EPYC CPUs; by completely eliminating electronic switches from the network core, it reduces core network energy consumption by 81% and cuts GPU idle time from 60% to under 1%.

NVIDIA Launches Nemotron 3 Nano Omni Model, Boosting Multimodal Inference Efficiency by 9x

NVIDIA announced on X platform that it has launched the open-source multimodal model Nemotron 3 Nano Omni today. The model adopts a 30B-A3B mixture-of-experts (MoE) architecture, supports a 256K context window, and can uniformly process video, audio, image, and text inputs. Compared to open-source omnimodal models at a similar interaction level, this model achieves up to a 9x increase in throughput, significantly reducing inference costs and improving scalability. Nemotron 3 Nano Omni is now available on Hugging Face, OpenRouter, and NVIDIA NIM, and has been adopted by enterprises including Aible, Applied Scientific Intelligence, and H Company.

SK Telecom Collaborates with Arm and Rebellions to Jointly Develop AI Data Center Inference Solutions

According to Yonhap News Agency, SK Telecom announced the signing of a trilateral memorandum of understanding (MOU) with UK-based chip design company Arm and Korean AI chip startup Rebellions to jointly develop AI data center inference server solutions. Under the agreement, the three parties will integrate Arm’s newly launched AGI CPU with Rebellions’ AI acceleration chip—RebelCard, scheduled for launch in Q3 this year—to jointly develop AI inference servers, which will be tested and validated at SK Telecom’s AI data centers. The Arm AGI CPU is optimized for high-density inference environments and large-scale AI deployments, while the RebelCard is specifically designed for large-scale AI inference.