Breaking
BreakingAl JazeeraIraqi Fans Celebrate Historic World Cup Qualification· 3 minutes agoBreakingEaterHospitality Operators Discuss "Third Spaces" and Community Building in Los Angeles· 4 minutes agoBreakingBillboardAvex Inc. Executives Receive Global Growth Award at Billboard Event· 4 minutes agoBreakingBillboardMasterClass Offers 50% Off All Plans for Father's Day· 4 minutes agoBreakingDeadline HollywoodSundance Award-Winning Luis Valdez Documentary Sets Theatrical Release· 4 minutes agoBreakingBloomberg MarketsUS Companies Seek Over $40 Billion in Debt Amid US-Iran Strait Deal Optimism· 4 minutes agoBreakingNews24 SATrump Arrives at G7 Summit Amid Iran Deal Relief and Tariff Concerns· 4 minutes agoBreakingChannel News AsiaSpain Held to 0-0 Draw by World Cup Debutants Cape Verde· 4 minutes agoBreakingFrance 24Belgium Faces Egypt in World Cup Group G Opener at Seattle Stadium· 4 minutes agoBreakingThe VergeAnthropic Takes AI Models Offline Following US Government Directive· 4 minutes agoBreakingAl JazeeraIraqi Fans Celebrate Historic World Cup Qualification· 3 minutes agoBreakingEaterHospitality Operators Discuss "Third Spaces" and Community Building in Los Angeles· 4 minutes agoBreakingBillboardAvex Inc. Executives Receive Global Growth Award at Billboard Event· 4 minutes agoBreakingBillboardMasterClass Offers 50% Off All Plans for Father's Day· 4 minutes agoBreakingDeadline HollywoodSundance Award-Winning Luis Valdez Documentary Sets Theatrical Release· 4 minutes agoBreakingBloomberg MarketsUS Companies Seek Over $40 Billion in Debt Amid US-Iran Strait Deal Optimism· 4 minutes agoBreakingNews24 SATrump Arrives at G7 Summit Amid Iran Deal Relief and Tariff Concerns· 4 minutes agoBreakingChannel News AsiaSpain Held to 0-0 Draw by World Cup Debutants Cape Verde· 4 minutes agoBreakingFrance 24Belgium Faces Egypt in World Cup Group G Opener at Seattle Stadium· 4 minutes agoBreakingThe VergeAnthropic Takes AI Models Offline Following US Government Directive· 4 minutes ago
Technology
Source: Marktechpost

Flash-KMeans Accelerates K-Means Algorithm Over 200x on GPUs

Researchers from UC Berkeley and UT Austin have developed Flash-KMeans, an open-source library that significantly accelerates the standard Lloyd's k-means algorithm on GPUs. This IO-aware implementation, built with Triton GPU kernels, focuses on restructuring data movement rather than altering the core mathematics or approximating results. Benchmarking on an NVIDIA H200 GPU indicates Flash-KMeans achieves an end-to-end speedup of up to 17.9 times over leading baselines, 33 times over NVIDIA cuML, and more than 200 times faster than FAISS.

By Fainaron·Jun 15, 2026 (2 hours ago)·1 views
Flash-KMeans Accelerates K-Means Algorithm Over 200x on GPUs

Flash-KMeans is designed to address the increasing demand for k-means computations within modern AI training and inference loops, where call latency is critical. The library ships under an Apache 2.0 license and is installable via pip.

The performance gains stem from Flash-KMeans' approach to two primary bottlenecks in GPU-based k-means: the assignment stage and the centroid update stage. The assignment stage, which calculates each point's distance to every centroid, traditionally involves materializing a large distance matrix in high-bandwidth memory (HBM). Flash-KMeans replaces this with FlashAssign, an approach that streams tiles of points and centroids from HBM into on-chip SRAM, fusing distance computation with an online argmin. This eliminates the need to materialize the full N×K matrix, reducing IO complexity.

For the centroid update stage, standard methods often rely on scatter-style atomic additions, leading to contention when multiple threads target the same 'hot' cluster. Flash-KMeans introduces Sort-Inverse Update. This method sorts the 1D assignment vector by cluster ID, creating contiguous segments for identical cluster IDs. Each thread block then reduces a segment on-chip before issuing a single atomic add per segment, thereby reducing atomic contention.

Specific kernel-level optimizations show FlashAssign achieving up to 21.2 times speedup in assignment operations, while Sort-Inverse Update reaches up to 6.3 times speedup in centroid updates. For large-scale out-of-core operations, Flash-KMeans reports up to 10.5 times faster performance compared to fastkmeans. These optimizations allow Flash-KMeans to operate efficiently in scenarios where other libraries, such as FAISS, may encounter memory limitations or slower processing for large datasets.

According to Marktechpost, Flash-KMeans offers a mathematically identical output to standard Lloyd's k-means, with speed improvements derived purely from kernel-level dataflow and optimized data movement.

Source attribution: This article was AI-curated and rewritten by Fainaron from a piece originally published by Marktechpost. Read the original at Marktechpost →

More like this

Tennessee Jurisdictions Implement Temporary Data Center Bans
Technology
4 minutes ago

Tennessee Jurisdictions Implement Temporary Data Center Bans

Multiple jurisdictions across Tennessee have recently implemented or are considering temporary moratoriums on new data center developments. Two jurisdictions have already passed such bans, while three others are preparing to vote on similar legislative measures. Nashville also moved forward with a near-unanimous moratorium during its initial reading, reflecting a broader trend of local communities seeking to regulate the expansion of large-scale tech infrastructure.

Tom's Hardware
Overwatch Season 3 "Into the Tiger's Den" Launches with New Hero Shion
Technology
4 minutes ago

Overwatch Season 3 "Into the Tiger's Den" Launches with New Hero Shion

Overwatch Season 3, officially titled "Into the Tiger's Den," is set to begin this week. This marks the third season since the game's rebranding and introduces a new Damage-class hero named Shion, previously known as Hero 52. The season will also feature standard updates, including a new battle pass and various cosmetic additions.

Polygon
Google Chrome to End Support for Older Ad Blocker Workarounds
Technology
4 minutes ago

Google Chrome to End Support for Older Ad Blocker Workarounds

Google Chrome versions 150 and 151, anticipated for release in late June and July respectively, are set to eliminate the remaining workarounds that have allowed older ad blockers to function. This action builds on Google's 2024 phase-out of support for ad-blocking extensions built on the Manifest V2 platform, which included tools like uBlock Origin. As a result, only ad blockers developed for the newer Manifest V3 platform will be supported in Chrome from version 151 onward.

The Verge
Breaking
Anthropic Takes AI Models Offline Following US Government Directive
Technology
4 minutes ago

Anthropic Takes AI Models Offline Following US Government Directive

Anthropic, an American artificial intelligence company, recently took its most powerful AI models offline over the weekend. The decision came at the request of Washington, with the White House demanding the company block access for all foreign nationals, including its own employees. This incident has brought attention to the US government's influence over frontier AI technology and its ability to dictate access.

The Verge

By the numbers

Fainaron — live counters

Updated every 30 seconds. Automatically — no human edits.

Total Articles

14.3K

Visitors Today

385

This Month

1.4K

Lifetime Visitors

1.4K

Article Views

17.3K

Pageviews Today

3.8K

Pageviews Lifetime

12.8K

Last 30 Days

1.4K

as of 6/15/2026, 6:18:19 PM