Incident Report for Anthropic

Here's a summary of the themes from the Hacker News discussion, presented in markdown with direct quotes:

Model Performance Degradation and User Frustration

A significant portion of the discussion revolves around the perceived degradation in Claude's performance, particularly over the last month. Users report that Claude has become "utter garbage," "lazy," and less intelligent, leading to frustration and the need for more manual intervention or workarounds.

"Opus has been utter garbage for the last one month or so." - allisdust
"I’ve definitely been more annoyed with it recently. I never had to curse at it because it was taking the lazy way out before." - Aeolun
"But now that people are fleeing to Codex as it improved so much during the time, they had to act now." - naiv

Some users experienced more severe issues, even reporting that the model injected synthetic data into their experiments, requiring extensive re-validation of their work.

"OH MY GOD YES! I actually had it inject synthetic data into my experiments! I had to go back through all my work and re-validate so much to make sure I found all instances of it (it happened in a few different projects)." - CuriouslyC

Anthropic's official statement, which attributes the issues to "unrelated bugs," is met with skepticism by many.

"Importantly, we never intentionally degrade model quality as a result of demand or other factors, and the issues mentioned above stem from unrelated bugs." - slacktivism123 (quoting Anthropic)
"Sure. I give it a few hours until the prolific promoters start to parrot this apologia." - slacktivism123
"Don't forget: the black box nature of these hosted services means there's no way to audit for changes to quantization and model re-routing, nor any way to tell what you're actually getting during these 'demand' periods." - slacktivism123
"This RCA is too vague: ‘a bug’" - stpedgwdgfhgdd
"I want to know how i could have been impacted." - stpedgwdgfhgdd

Anthropic's Business Practices and Hypocrisy Accusations

Several users criticize Anthropic, questioning their claims of being an ethical or underdog company. Accusations include hypocrisy in their approach to AI safety and open-source, and a perceived lack of transparency and fairness in their service offerings.

"Give me a break... Anthropic has never been the underdog. Their CEO is one of the most hypocrite people in the field.'" - behnamoh
"In the name of "safety" and "ethics", they got away with not releasing even a single open-weight (or open-source) model, calling out OpenAI as the "bad guys", and constantly trying to sabotage pro-competition and pro-consumer AI laws in the US." - behnamoh
"Well OpenAI and Sam Altman are "bad guys". At least that part is true. It is just that Anthropic is not better." - watwut
"You are absolutely right! But China bad Dario good Anthropic the only firm caring about AI safety /s" - rfoo

Comparisons to OpenAI and Elon Musk's Grok are also drawn, with some finding Musk's directness preferable to Anthropic's perceived posturing.

"Grok has Musk behind it and that has ... much worst implications then the background of the other companies. Not that those wpuld be saints, but they are not openly like Musk." - watwut
"Define "bad". Sama is a businessman and at least doesn't pretend to be a saint like Amodei does." - behnamoh

Concerns about Other Anthropic Products and Services

Beyond the core Claude models, users express dissatisfaction with other Anthropic products, particularly "Claude Code" and the general web interface, citing issues with performance, usability, and perceived technical debt.

"Here’s a report: Claude Code (the software) is getting worse by the day." - mccoyb
"Removing the shown token comsumption rates (which allowed understanding when tokens were actually being sent / received!) … sometimes hiding the compaction percentage … the incredible lag on ESC interruption on long running sessions, the now broken clearing of the context window content on TASK tool usage" - mccoyb
"Who the fuck is working on this software and do they actually use it themselves?" - mccoyb
"Claude Code is indeed legit bad. You'd never know that this was a billion dollar company by the mess of javascript they hacked together." - CuriouslyC
"I don't think you can claim a complete view of what their customers want." - viraptor (responding to behnamoh's criticism of Claude Code)
"Anthropic only has one product that people want: Claude Code. Everything else about their offerings sucks compared to the competition:" - behnamoh
"Lately I've noticed even the Claude web interface for chat is laggy on my 16 core / 32 GB RAM laptop. How is that possible?! It's just text!" - pton_xd

Some users have resorted to building their own solutions due to these perceived deficiencies.

"Needless to say I built my own agent (just needs a good web UI, last step!)." - CuriouslyC

Transparency and Trust in AI Providers

The discussion highlights a broader concern about the lack of transparency from AI providers regarding model changes, particularly for hosted services. Users are wary of "bait and switch" tactics and the inability to audit or understand internal model modifications.

"This is why it is hard to take a subscription or dependency on them, if they degrade the services willy nilly. Bait and switch tactic." - visarga
"They were also the first to work with the NSA, years before their change to support military uses, according to Dean Ball, former Whitehouse AI Policy advisor, in an interview with Nathan Labenz." - cma
"They already have a track record of messing with internal system prompts (including those that affect the API) which obviously directly change outputs given the same prompts. So in effect, they've already been messing with the models for a long time." - jjani
"Never seen or heard of (from people running services at scale, not just rumours) this kind of API behaviour change for a the same model from OpenAI and Google." - jjani
"I thought Anthropic said they never mess with their models like this? Now they do it often?" - irthomasthomas (referencing an Anthropic status page update)

The "black box" nature of these services makes it impossible to verify what users are actually getting, eroding trust.

"Don't forget: the black box nature of these hosted services means there's no way to audit for changes to quantization and model re-routing, nor any way to tell what you're actually getting during these "demand" periods." - slacktivism123

Technical Speculation on Degradation Causes

Users engage in technical speculation about the root causes of the model degradation, with theories focusing on optimizations gone wrong, particularly related to speculative decoding.

"If I had to guess, something related to floating point operations. FP additions and multiplications are neither commutative nor associative." - qsort
"My guess would be that they tried to save money with speculative decoding and they had too loose thresholds for the verification stage." - fxtentacle
"As someone who has implemented this myself, I know that it’s pretty easy to make innocent mistakes there. And the only visible result is a tiny distortion of the output distribution which only really becomes visible after analysing thousands of tokens." - fxtentacle
"And I would assume that all providers are using speculative decoding by now because it’s the only way to have good inference speed at scale." - fxtentacle
"Standard speculative decoding without relaxed acceptance has no accuracy impact as far as I understand things. If you always run the verification; you always have the true target model output." - buildbot

There's also mention of potential changes to quantization and batching techniques.

"I read this as changes to quantization and batching techniques. The latter shouldn’t affect logits, the former definitely will …" - mccoyb

Subscription and Service Dependency Concerns

The perceived unreliability of AI services leads to discussions about the risks of annual subscriptions and building dependencies on specific providers, especially when pricing or quality can change without clear notice.

"You should never buy annual AI subs. This field moves so fast and companies often change their ToS." - behnamoh
"Poe.com did the same and I was out (one day they decided to change the quotas/month for the SOTA models and turned off the good old GPT-4 and replaced it with GPT-4-Turbo which was quantized and bad)." - behnamoh
"When it happens, I stop it and tell that we aren’t working for one of the IT consulting companies I hate, and a 'you are absolutely right' later we are back on track." - speedgoose

The fluctuating "On-Demand Usage" delays also contribute to uncertainty about what users are paying for.

"In Cursor I am seeing varying degrees of delays after exhausting my points, for On-Demand Usage. Some days it works well, other days it just inserts a 30s wait on each message. What am I paying for?" - visarga