Breaking
BreakingChannel News AsiaHaaland's World Cup Double Secures Norway Victory Over Iraq· a minute agoBreakingChannel News AsiaMartinez Remains Guarded on Future While Plotting Portugal's World Cup Triumph· a minute agoBreakingThe Guardian SportFrance Secures 3-1 World Cup Victory Over Senegal After Disjointed Start· a minute agoBreakingReddit r/worldnews14-Point Draft Memorandum Between US and Iran Identified· 6 minutes agoBreakingPhys.orgSwitzerland Mapped for Ground Light and Shade with 10-Meter Detail· 13 minutes agoBreakingSydney Morning HeraldSouth Melbourne Gym Gutted by Fire Following Second Attack in Two Nights· 13 minutes agoBreakingDW EnglishFive Charged in Alleged Plot Against Trump's UFC Event· 13 minutes agoBreakingESPN SoccerLionel Messi Becomes First Man to Play in Six World Cups· 18 minutes agoBreakingChannel News AsiaNorway Secures 4-1 World Cup Victory Over Iraq; Haaland Scores Two Goals· 23 minutes agoBreakingIGNWar of the Dragon: The Wheel of Time Board Game Details Revealed· 23 minutes agoBreakingChannel News AsiaHaaland's World Cup Double Secures Norway Victory Over Iraq· a minute agoBreakingChannel News AsiaMartinez Remains Guarded on Future While Plotting Portugal's World Cup Triumph· a minute agoBreakingThe Guardian SportFrance Secures 3-1 World Cup Victory Over Senegal After Disjointed Start· a minute agoBreakingReddit r/worldnews14-Point Draft Memorandum Between US and Iran Identified· 6 minutes agoBreakingPhys.orgSwitzerland Mapped for Ground Light and Shade with 10-Meter Detail· 13 minutes agoBreakingSydney Morning HeraldSouth Melbourne Gym Gutted by Fire Following Second Attack in Two Nights· 13 minutes agoBreakingDW EnglishFive Charged in Alleged Plot Against Trump's UFC Event· 13 minutes agoBreakingESPN SoccerLionel Messi Becomes First Man to Play in Six World Cups· 18 minutes agoBreakingChannel News AsiaNorway Secures 4-1 World Cup Victory Over Iraq; Haaland Scores Two Goals· 23 minutes agoBreakingIGNWar of the Dragon: The Wheel of Time Board Game Details Revealed· 23 minutes ago
Technology
Source: VentureBeat

Weibo's VibeThinker-3B Challenges AI Scaling Laws, Sparks Benchmark Debate

Researchers at Sina Weibo have unveiled VibeThinker-3B, a 3-billion-parameter language model that reportedly matches or exceeds the reasoning capabilities of AI systems hundreds of times larger, including those from Google DeepMind, OpenAI, and DeepSeek. The model achieved high scores on demanding math and coding benchmarks, such as AIME 2026 and unseen LeetCode contests. This unexpected performance from a compact model, capable of running on a consumer laptop, has ignited skepticism within the AI community regarding the reliability of current benchmarks and the industry's focus on ever-larger models.

By Fainaron·Jun 17, 2026 (an hour ago)·1 views
Weibo's VibeThinker-3B Challenges AI Scaling Laws, Sparks Benchmark Debate

A team of nine researchers at Chinese social media giant Sina Weibo recently published a technical report on arXiv, introducing VibeThinker-3B. This new language model, with only 3 billion parameters, claims to achieve reasoning performance comparable to or exceeding flagship AI systems that are hundreds of times larger. These larger models include offerings from Google DeepMind, OpenAI, Anthropic, and DeepSeek.

VibeThinker-3B scored 94.3 on the American Invitational Mathematics Examination (AIME) 2026, a demanding standardized math competition. This score places it alongside DeepSeek V3.2 (a 671 billion-parameter model) and ahead of Google's Gemini 3 Pro, which scored 91.7. With a test-time scaling technique called Claim-Level Reliability Assessment, its score reportedly rises to 97.1. The model also demonstrated strong performance on other math benchmarks, including AIME 2025, HMMT 2025, BruMO 2025, and IMO-AnswerBench. In coding, it achieved an 80.2 Pass@1 on LiveCodeBench v6 and a 96.1% acceptance rate on unseen LeetCode contests from late April through late May 2026.

The researchers propose the "Parametric Compression-Coverage Hypothesis," suggesting that verifiable reasoning capabilities, like those tested in math and coding, are "parameter-dense" and can be compressed into a compact core. Conversely, open-domain knowledge is "parameter-expansive," requiring more parameters. This distinction is supported by VibeThinker-3B's lower score (70.2) on GPQA-Diamond, a graduate-level science knowledge benchmark, compared to Gemini 3 Pro (91.9) and Claude Opus 4.5 (87.0).

The model was developed through a multi-stage post-training pipeline, building upon Alibaba's Qwen2.5-Coder-3B. This process includes supervised fine-tuning, reinforcement learning using the MaxEnt-Guided Policy Optimization algorithm, distillation of high-quality reasoning trajectories, and Instruct RL for instruction-following tasks.

The AI research community's reaction has been mixed. While the paper quickly gained traction online, many expressed skepticism, questioning whether the benchmarks are genuinely reflective of real-world utility or if they have become "gameable." Some users who tested the model reported it struggled with common developer tools, suggesting a gap between benchmark scores and practical performance. The authors, however, state that training sets underwent "strict benchmark decontamination," and the LeetCode evaluation used contests from dates postdating any plausible training data cutoff, aiming to address concerns about data contamination.

VibeThinker-3B's emergence challenges the prevailing "scaling hypothesis" that larger models inherently perform better. The paper suggests that compact models offer a "promising research trajectory" for specific verifiable reasoning tasks, potentially complementing larger general-purpose models. This could lead to hybrid AI architectures and significantly reduce the cost and hardware requirements for deploying advanced AI reasoning capabilities. The model's weights and code are openly available under the MIT License.

(Source: VentureBeat)

Source attribution: This article was AI-curated and rewritten by Fainaron from a piece originally published by VentureBeat. Read the original at VentureBeat →

More like this

Anthropic Faces Lawsuit Over Claude Max Subscription Usage Limits
Technology
6 minutes ago

Anthropic Faces Lawsuit Over Claude Max Subscription Usage Limits

AI company Anthropic is reportedly facing legal action concerning the subscription usage limits for its Claude Max service. The lawsuit focuses on the terms and restrictions placed on subscribers' use of the artificial intelligence product. Specific details about the plaintiff or the nature of the legal claims were not disclosed.

Yahoo Finance
Breaking
Lerat Portable Turbo Fan on Sale at Amazon
Technology
23 minutes ago

Lerat Portable Turbo Fan on Sale at Amazon

Amazon is offering the Lerat Rechargeable Portable Turbo Fan for $7.94 after applying a specific coupon code. This lightweight device, weighing 7.4 ounces, features a 14,550rpm motor with five speed settings, a 4,000mAh battery providing up to 12 hours of operation, and USB-C charging. The fan is designed for portability and can be delivered by Father's Day, June 21.

IGN
T-Mobile Offers 5G Home Internet Promotion
Technology
an hour ago

T-Mobile Offers 5G Home Internet Promotion

T-Mobile has introduced a limited-time promotional offer for new subscribers to its 5G Home Internet service. Customers who sign up for the service will receive their first month free. Additionally, the promotion includes the opportunity to receive up to $200 back. This special deal can be claimed online.

Hollywood Reporter
DOJ Claims xAI's Unpermitted Gas Turbines Pose National Security Risk
Technology
an hour ago

DOJ Claims xAI's Unpermitted Gas Turbines Pose National Security Risk

The U.S. Department of Justice (DOJ) has reportedly asserted that gas turbines operated by xAI without proper permits are a concern for national, economic, and energy security. This claim highlights potential regulatory issues surrounding the company's infrastructure and its broader implications.

Reddit r/technology

By the numbers

Fainaron — live counters

Updated every 30 seconds. Automatically — no human edits.

Total Articles

0

Visitors Today

0

This Month

0

Lifetime Visitors

0

Article Views

0

Pageviews Today

0

Pageviews Lifetime

0

Last 30 Days

0

as of 6/17/2026, 1:35:32 AM