Anthropic Apologizes for Claude AI Censorship, Promises Visible Safeguards

Anthropic has issued an apology following concerns from the AI community regarding "invisible performance sabotage" or "secret censorship" within its Claude AI model. The company's reversal comes one day after the community raised issues with the model's performance. Anthropic plans to implement visible safeguards, but indicated that this change is expected to result in an increase in "false positives."

By Fainaron·Jun 12, 2026 (16 hours ago)·1 views

Anthropic Apologizes for Claude AI Censorship, Promises Visible Safeguards

Anthropic has apologized for what was described as "secret censorship" within its Claude AI model. The apology follows an outcry from the AI community regarding "invisible performance sabotage."

One day after these concerns surfaced, the company announced a reversal of its previous course of action.

Anthropic stated that it will introduce visible safeguards as part of its updated approach. However, the company also noted that the implementation of these new safeguards is anticipated to lead to an increase in "false positives."

(Source: Decrypt Crypto)

AdSense slot • inline

#anthropic #claude #ai #censorship #safeguards #apology #artificial intelligence

Source attribution: This article was AI-curated and rewritten by Fainaron from a piece originally published by Decrypt Crypto. Read the original at Decrypt Crypto →

More like this

Google Researchers Introduce 'Faithful Uncertainty' to Combat LLM Hallucinations

Technology

11 minutes ago

Google Researchers Introduce 'Faithful Uncertainty' to Combat LLM Hallucinations

Google researchers have introduced "faithful uncertainty," a metacognitive technique designed to allow large language models (LLMs) to express their internal confidence and offer qualified responses instead of generating confident but incorrect information, known as hallucinations. This approach aims to address the "utility tax" often incurred when developers try to eliminate factual errors, which can lead models to suppress valid answers. By aligning a model's linguistic uncertainty with its intrinsic confidence, the system can provide nuanced answers, such as "My best guess is," preserving utility while maintaining user trust. The method has significant implications for agentic AI applications, where models need to discern when to use external tools or rely on their internal knowledge.

VentureBeat

Google Rolls Out AI Mode Information Agents to Ultra Subscribers

Technology

14 minutes ago

Google Rolls Out AI Mode Information Agents to Ultra Subscribers

Google has made its information agents available to AI Ultra subscribers. This launch includes all AI Mode languages and markets globally. The company plans to expand access to these agents for a wider audience later this summer.

Search Engine Journal

YouTube Enhances Platform with Expanded In-App Messaging

Technology

14 minutes ago

YouTube Enhances Platform with Expanded In-App Messaging

YouTube has expanded its in-app messaging capabilities, introducing the YouTube Chat feature. This update allows eligible users to send and receive private messages directly within the platform. The integration is designed to facilitate seamless communication among users without requiring them to leave the current stream or the YouTube environment.

Social Media Today

Technology

14 minutes ago

ShinyHunters Exploits Oracle PeopleSoft Zero-Day, Claims Over 100 Organizations Breached

The cybercrime group ShinyHunters claims to have exploited a critical zero-day vulnerability in Oracle PeopleSoft, leading to the compromise of more than 100 organizations. Among the alleged victims is the University of Nottingham, from which the group states it stole 40GB of student and billing data. Google's threat intelligence has corroborated ShinyHunters' claims of widespread compromise, identifying malicious activity consistent with the exploitation of CVE-2026-35273.

Slashdot

Back to Homepage

Anthropic Apologizes for Claude AI Censorship, Promises Visible Safeguards

More like this

Google Researchers Introduce 'Faithful Uncertainty' to Combat LLM Hallucinations

Google Rolls Out AI Mode Information Agents to Ultra Subscribers

YouTube Enhances Platform with Expanded In-App Messaging

ShinyHunters Exploits Oracle PeopleSoft Zero-Day, Claims Over 100 Organizations Breached

Fainaron — live counters