Breaking
BreakingGlobe and MailU.S.-Iran Interim Ceasefire Agreement Details Emerge· a few seconds agoBreakingAl JazeeraUK's Starmer Condemns Russian Warship Shots in Channel· a few seconds agoBreakingChannel News AsiaKeppel FELS Manager Acquitted of S$300,000 Kickback Charges· 6 minutes agoBreakingTelegraph FootballMbappe Secures Victory Against Senegal, Performs 'James Corden Celebration'· 10 minutes agoBreakingScreenRantFallout 5 Officially Confirmed, Possible Earlier Release Reported· 10 minutes agoBreakingSydney Morning HeraldPolice Allege 'Huge Money' Drove Continued Assault on Wrong Target· 10 minutes agoBreakingHindustan Times WorldSouth Korea's Lee Jae Myung Seeks Trump's Help on North Korea Nuclear Issue· 10 minutes agoBreakingYahoo FinanceCorning and QuantumScape Stock Performance for 2026 Under Review· 10 minutes agoBreakingIndependent FootballLionel Messi Scores Hat-Trick for Argentina, Appears Emotional During Match Against Algeria· 16 minutes agoBreakingSydney Morning HeraldMother Defends 13-Year-Old Son Accused in School Attack Plot· 16 minutes agoBreakingGlobe and MailU.S.-Iran Interim Ceasefire Agreement Details Emerge· a few seconds agoBreakingAl JazeeraUK's Starmer Condemns Russian Warship Shots in Channel· a few seconds agoBreakingChannel News AsiaKeppel FELS Manager Acquitted of S$300,000 Kickback Charges· 6 minutes agoBreakingTelegraph FootballMbappe Secures Victory Against Senegal, Performs 'James Corden Celebration'· 10 minutes agoBreakingScreenRantFallout 5 Officially Confirmed, Possible Earlier Release Reported· 10 minutes agoBreakingSydney Morning HeraldPolice Allege 'Huge Money' Drove Continued Assault on Wrong Target· 10 minutes agoBreakingHindustan Times WorldSouth Korea's Lee Jae Myung Seeks Trump's Help on North Korea Nuclear Issue· 10 minutes agoBreakingYahoo FinanceCorning and QuantumScape Stock Performance for 2026 Under Review· 10 minutes agoBreakingIndependent FootballLionel Messi Scores Hat-Trick for Argentina, Appears Emotional During Match Against Algeria· 16 minutes agoBreakingSydney Morning HeraldMother Defends 13-Year-Old Son Accused in School Attack Plot· 16 minutes ago
Business
Source: Business Insider

Crosby Launches Redline Bench to Evaluate AI Models in Contract Review

Crosby, a tech-driven law firm, has introduced Redline Bench, a new benchmark designed to assess the performance of artificial intelligence models in real-world legal tasks, specifically contract review. The tool aims to help lawyers determine the trustworthiness and quality of AI-generated legal work, addressing the inherent ambiguity in defining 'good' or 'bad' legal outcomes. The Redline Bench was developed by Crosby's Intelligence unit and involves a methodology where senior lawyers simulate software deals to create weighted criteria for contract changes. Initial tests using this benchmark placed ChatGPT 5.5 at the top with a score of 50.5%, followed by Gemini 3.5 Flash and Claude Opus.

By Fainaron·Jun 17, 2026 (10 minutes ago)·1 views
Crosby Launches Redline Bench to Evaluate AI Models in Contract Review

Crosby, a tech-driven law firm, has released Redline Bench, a new benchmark intended to measure the efficacy of artificial intelligence models in legal contract negotiations. The initiative aims to provide lawyers with a standardized method to evaluate whether they can rely on AI technology for complex legal work.

The promise of AI absorbing routine legal tasks involves billions of dollars, but defining the quality of AI's legal output has been a challenge. Ryan Daniels, a former in-house lawyer and Crosby founder, highlighted that unlike software coding, where functionality is clear, legal work can be subjective. A single contract edit, or 'redline,' might be viewed differently by various legal professionals.

To tackle this ambiguity, Crosby formed its Intelligence unit, comprising engineers like Sharan Ramjee, known for work on transformer models at Stripe, and lawyers such as Ross Weiser, formerly of Sullivan & Cromwell. This team developed the Redline Bench. Crosby also collaborated with Micro1, a company that facilitates recruiting expert workers for model-makers, to refine the criteria for 'good' legal work.

The benchmark's development involved senior lawyers simulating software deals and identifying the most crucial contract changes at each negotiation stage. These changes were then converted into weighted criteria. During testing, AI models are provided with the same contracts and tasked with making their own edits. A panel of three judges subsequently compares these AI-generated redlines against the lawyer-built rubric, voting pass or fail on each item to generate a final score.

Crosby plans to make Redline Bench publicly accessible, allowing any lab to test its models. The company also intends to regularly publish reports detailing how major AI models compare. Initial findings from Redline Bench showed ChatGPT 5.5 leading with a score of 50.5%, indicating its redlines matched half of the lawyers' prioritized edits. Gemini 3.5 Flash scored 45.1%, and Claude Opus achieved 44.4%. An early, limited test of Anthropic's Fable 5 showed promising results at 47.3% before the model was withdrawn.

According to Business Insider, Crosby isn't.

Source attribution: This article was AI-curated and rewritten by Fainaron from a piece originally published by Business Insider. Read the original at Business Insider →

More like this

UK Regulator Implements New Rules for Google Search to Boost Competition
Business
a few seconds ago

UK Regulator Implements New Rules for Google Search to Boost Competition

A regulatory body in the United Kingdom has introduced new rules specifically concerning Google Search. These measures are designed to enhance competition within the digital market, aiming to influence market practices related to online search services.

Channel News Asia
US Congressional Deal Includes Digital Dollar Ban Until 2030
Business
a few seconds ago

US Congressional Deal Includes Digital Dollar Ban Until 2030

A bicameral agreement has been reached on the 21st Century ROAD to Housing Act. This deal incorporates a specific provision that prevents the Federal Reserve from issuing a digital dollar until 2030. The legislative development impacts the potential for a U.S. central bank digital currency for the remainder of the decade.

Decrypt Crypto
Global Aviation Sector Registers Record Borrowing Levels
Business
a few seconds ago

Global Aviation Sector Registers Record Borrowing Levels

The global aviation industry has recorded an unprecedented start to the year for borrowing, according to recent reports. This surge in financial activity includes significant contributions from entities like Athens International Airport SA and TAP Air Portugal, as the sector navigates the complex impacts of the Middle East war.

Bloomberg Markets
Thousands Attend Art Basel in Switzerland as Art Market Recovers
Business
a few seconds ago

Thousands Attend Art Basel in Switzerland as Art Market Recovers

Art Basel 2025 in Basel, Switzerland, has attracted thousands of visitors. The high attendance coincides with an improving outlook for the global art market, which is showing signs of recovery after several years of decline. Works by notable artists such as Picasso, Warhol, and Hockney are featured at the event.

Euronews

By the numbers

Fainaron — live counters

Updated every 30 seconds. Automatically — no human edits.

Total Articles

0

Visitors Today

0

This Month

0

Lifetime Visitors

0

Article Views

0

Pageviews Today

0

Pageviews Lifetime

0

Last 30 Days

0

as of 6/17/2026, 9:46:22 AM