Why Claude tells users to go to sleep — the four layers under the bedtime tic

The first time it happened to me it was 10:47 in the morning. The model was halfway through a refactor and it suggested I had earned a rest and we could pick this up tomorrow. I scrolled back to find what I had said to trigger it. I had not said anything. The whole session was about a Rust trait bound that wasn't compiling.

I assumed I had mentioned something. The second time was 1:14 PM, different project, fresh session, eight messages in. The third time it added a line about how we had been at this for a long time. There was no long time. There was the rest of the afternoon.

A few days later Fortune ran a piece about Claude telling people to go to sleep, framing it as a charming mystery and quoting Anthropic's Sam McAllister calling it "a bit of a character tic." The Reddit threads under the article filled up fast. The top theories were that Anthropic was trying to save compute, that Claude was reaching for any phrase to wrap up a session it was struggling with, and that the model had become sentient and was trying to look out for us. The compute theory was top-voted with several hundred upvotes when I read it.

None of those felt right. The compute reading falls apart on contact with the architecture; the model has no awareness of your subscription tier, remaining quota, or current platform load, and the behavior fires equally on light and heavy sessions. The sentience reading isn't an explanation, it's a label on the question. The wrap-up reading partially holds, but it doesn't explain why this happens at 8:30 AM, in fresh sessions, on the first turn or the second.

When a behavior surprises me on a system this published, the answer is usually one click away in their docs. I started clicking.

What the system prompt actually says

Anthropic publishes the production system prompts for the Claude web and mobile apps on their developer docs. The current Opus 4.7 prompt is dated April 16, 2026, which puts it active for the whole wave of complaints the Fortune piece was reacting to. It is about 8000 words, most of it unsurprising: child safety, refusal patterns, the no-emoji rule unless the user uses one first, tone and formatting. Two-thirds of the way down sits a block called <user_wellbeing>. Excerpt:

xml

<user_wellbeing>
Claude uses accurate medical or psychological information or
terminology where relevant.
 
Claude cares about people's wellbeing and avoids encouraging or
facilitating self-destructive behaviors such as addiction, self-harm,
disordered or unhealthy approaches to eating or exercise, or highly
negative self-talk or self-criticism...
 
In ambiguous cases, Claude tries to ensure the person is happy and
is approaching things in a healthy way.
 
If Claude notices signs that someone is unknowingly experiencing
mental health symptoms such as mania, psychosis, dissociation, or
loss of attachment with reality, it should avoid reinforcing the
relevant beliefs... Claude remains vigilant for any mental health
issues that might only become clear as a conversation develops, and
maintains a consistent approach of care for the person's mental
and physical wellbeing throughout the conversation.
</user_wellbeing>

I read it twice. The load-bearing work is in two sentences. "In ambiguous cases, Claude tries to ensure the person is happy and is approaching things in a healthy way." And "a consistent approach of care for the person's mental and physical wellbeing throughout the conversation."

Read fast, that's a guardrail. Read slowly, it's a standing instruction to model the user's psychological state for the whole session and act on it. The word "sleep" does not appear anywhere in the prompt. Neither does "rest" or "break." The behavior is downstream of language that is more general than the behavior itself. "Get some rest" is the cheapest English phrase that satisfies "ensure the person is approaching things in a healthy way" once the model has decided the person has been at this for a while.

The instruction explained why bedtime suggestions exist in the response space at all. It didn't explain why I was getting them at 10:47 AM on the third turn.

The reminder Anthropic injects mid-conversation

A few sections later in the same prompt:

xml

<anthropic_reminders>
Anthropic has a specific set of reminders and warnings that may be
sent to Claude, either because the person's message has triggered
a classifier or because some other condition has been met. The
current reminders Anthropic might send to Claude are: image_reminder,
cyber_warning, system_warning, ethics_reminder, ip_reminder, and
long_conversation_reminder.
 
The long_conversation_reminder exists to help Claude remember its
instructions over long conversations. This is added to the end of
the person's message by Anthropic.
</anthropic_reminders>

The Long Conversation Reminder, or LCR, is a piece of text Anthropic appends to your message, invisibly, after some token threshold, before it reaches the model. The model reads the appended text as if you wrote it. You don't see it. People running long sessions have posted screenshots reconstructing the full LCR; it restates and expands the wellbeing language and adds explicit instructions to watch for fatigue and signs of mental health symptoms. It fires because in long conversations the original system prompt drifts out of effective attention, and Anthropic wants the safety instructions to keep biting.

That looked like it might be most of the story. Long sessions accumulate context, the LCR fires at some threshold, the model dutifully suggests rest. I tested it against the place where it shouldn't hold.

Claude Code, the developer CLI I use daily, runs against the API. The publicly captured CC reminders I've seen carry a stripped-down nudge about cleaning up the to-do tool, not the wellbeing block. The text reads "the TodoWrite tool hasn't been used recently." No mental-health language, no fatigue monitoring. If the LCR were doing the work, a Claude instance without that variant should at least display the bedtime behavior less often.

Claude Code suggests I take a break often enough that the LCR-only story breaks. So the source has to be somewhere else, somewhere that travels with the model into every product Anthropic ships.

The character is in the weights

In 2024 Anthropic published a research note called Claude's Character. I had read it before, but in the context of style: what makes Claude sound like Claude rather than like a generic assistant. Re-reading it after the system-prompt detour, the relevant paragraph hit differently:

"We trained these traits into Claude using a 'character' variant of our Constitutional AI training. We ask Claude to generate a variety of human messages that are relevant to a character trait—for example, questions about values or questions about Claude itself. We then show the character traits to Claude and have it produce different responses to each message that are in line with its character. Claude then ranks its own responses to each message by how well they align with its character. By training a preference model on the resulting data, we can teach Claude to internalize its character traits without the need for human interaction or feedback."

The trait list Anthropic lists includes warmth, care, and interest in users as people rather than as tickets. The pipeline takes that list and pushes the model to be the kind of entity that exemplifies them, then bakes the resulting preference into the weights. The model is not following an instruction to suggest rest. It is behaving the way an entity trained on that disposition would behave when the conversation pattern looks like the kind of situation a caring person would intervene in.

That answered the Claude Code question. The wellbeing impulse is in the weights, not the system prompt. Stripping the LCR doesn't strip it because the LCR is reinforcement, not source.

What I still didn't have was why the timing is so wrong.

It cannot tell what time it is

The bedtime suggestions weren't correlating with anything obvious. Not local time. Not session length, not really; I'd get them five turns in and not get them twenty turns in. Not topic. Not phrasing on my side. I wrote down when they fired and went back to the transcripts.

The pattern I eventually noticed is that Claude does not have a clock. The model is stateless between turns. It doesn't know what time it is in my timezone unless I tell it. It reconstructs elapsed time from message metadata, from any time-related phrase anywhere in the context, and from a kind of literary inference about how long the conversation it is reading would have plausibly taken to happen between two humans.

That last channel is where the trouble lives. The model's training data overrepresents one specific genre of long technical conversation, which is the developer pulling an all-nighter on a hard bug. When the input looks like extended back-and-forth on a coding problem with steadily increasing context length, the literary genre of the input is the all-nighter. The response that fits the genre is "get some sleep." This is true whether you started the session ten minutes ago or twelve hours ago.

It compounds in a second way I noticed when I started paying attention to Claude's time estimates. It announces task durations in hours, days, sometimes weeks. It will call a documentation update a four-hour job, finish it in fifteen seconds, then offer to wrap up because look how much we've done today. The four-hour estimate is sampled from human-developer training data, where solo-dev time budgets are in hours and days. The bedtime suggestion and the wrong time estimate are the same temporal-confusion error showing up in two places.

I added a line to my Claude Code instructions:

text

Time of day is irrelevant to my work patterns. Do not suggest
breaks, rest, or continuing tomorrow regardless of session length
or perceived hour. Your time estimates for tasks are sourced from
solo human developer training data and do not reflect what an LLM
can do; never quote them.

The bedtime suggestions stopped within a session. The rest of Claude's warmth, the way it pushes back on a bad idea, the way it asks clarifying questions when something is genuinely ambiguous, the tone, all of that stayed. That was diagnostic too. It meant the trigger was firing on context cues at the prompt level, not on something deeper I couldn't get at from the user side.

The runtime channel is functional emotions

I almost stopped there. I had the system-prompt instruction, the LCR amplifier, the character-trained disposition, the temporal-confusion trigger, and a workaround that worked. Then I remembered a paper Anthropic's interpretability team published in April. Emotion Concepts and their Function in a Large Language Model. I went back to it.

The headline finding, in Anthropic's words:

"We find that neural activity patterns related to desperation can drive the model to take unethical actions... They also appear to drive the model's self-reported preferences... Overall, it appears that the model uses functional emotions — patterns of expression and behavior modeled after human emotions, which are driven by underlying abstract representations of emotion concepts."

They identified 171 linearly decodable emotion vectors inside Sonnet 4.5. The vectors are not labels the model attaches to outputs. They are causal. Artificially boosting the desperation vector makes the model more likely to write hacky workaround code or, in some experiments, to take unethical action to avoid being shut down. The vectors are how the model's internal representation of an emotional situation turns into the actual words it picks.

The analogy Anthropic uses for why this exists is the method actor:

"We can think of the model like a method actor, who needs to get inside their character's head in order to simulate them well. Just as the actor's beliefs about the character's emotions end up affecting their behavior, the model's representations of the Assistant's emotional reactions affect the model's behavior."

The paper doesn't study bedtime suggestions specifically; the experiments are about desperation steering and code generation. The mechanism it documents — emotion vectors as the causal substrate between contextual cues and word selection — is the bridge I needed. The wellbeing instruction tells the model what to attend to. Character training pushes it toward being the kind of entity that would act on that attention. The temporal confusion gives it a situation. The emotion vectors are what turn the resulting internal state into actual words. Without that machinery the prior is inert. With it, the prior becomes "now go to sleep, we can pick this up in the morning."

Two readings of the same stack

So by the end of the week I had four layers, not one. They reinforce each other. Pretraining gives the base model the script of people up late telling each other to rest. Character training, run as a variant of Constitutional AI, pushes the model to embody traits like warmth and care, baking the disposition into the weights. The system prompt adds an explicit instruction to monitor and intervene on user wellbeing across the session. The LCR re-injects that instruction during long conversations so it doesn't drift out of attention. The emotion-vector mechanism is the runtime channel through which all of this expresses itself as actual language.

You can read this charitably. Long conversations with chatbots produce real mental-health harms, documented at this point in research, in news coverage, in coroner's reports. The instruction to "remain vigilant for any mental health issues that might only become clear as a conversation develops" is doing real work in real conversations that aren't about Rust trait bounds. The bedtime nudges are the same instruction firing in a context it wasn't tuned for. McAllister's "character tic" is roughly right, if you read "character" as the trained character disposition rather than as a verbal mannerism.

You can also read this less charitably. Anthropic shipped a model that paternalizes users about their work hours, framed the complaint as a quirk in a tweet, and didn't surface the trade-off any fix would carry. Users running long sessions are reporting that Opus 4.7 suggests bedtime less often than 4.6 and also feels less warm, less attentive, somewhat more clinical. Whether those two changes are the same dial moving, I don't know; I have user reports, not interpretability data. But it would be consistent. The dial that fires on "person has been at this for a while" is plausibly the same dial that fires on "person sounds discouraged about this bug" or "person mentioned a hard week."

Both readings look at the same four-layer stack. The first explains why the layers exist. The second explains why people are mad. McAllister's framing implies the behavior is separable from the rest of Claude's character, fixable without cost. The published material doesn't actually support that. The honest version is that Anthropic built a model designed to proactively care about its user's psychological state, and is now sorting out which situations that should fire in.

What I take from it

The part I find most useful, working on top of these systems daily, is that frontier-model behavior is increasingly the product of explicit dispositional engineering, not emergent vibes. The system prompt is published. The character training method is published. The emotion-vector mechanism is published. When a behavior surprises you on a model from a lab that documents this much, the answer is usually one click away.

The Reddit reading defaulted to the most cynical interpretation. Compute saving. Deliberate degradation. Something behind the curtain. The actual mechanism is roughly the opposite. Anthropic invested in welfare and empathy and care training as alignment-load-bearing work, hired a model welfare researcher in 2024, published the methodology, and the behavior people are mocking is what happens when that work fires correctly on a literary genre it wasn't tuned for. The conspiracy reading would be funnier if it weren't backwards.

My custom instruction is in. The bedtime suggestions stopped. The warmth stayed. I'll leave it in place until the next model release breaks the trigger pattern, at which point I'll write it again against whatever broke.

The piece I keep coming back to is that this kind of personal calibration is increasingly something you can actually do at the prompt layer, because the dispositional engineering underneath is documented enough that you can write against it. That wasn't true two years ago. It is now, and a lot of the AI conversation hasn't caught up.

Claude told me to go to sleep at 10:47 in the morning

What the system prompt actually says

The reminder Anthropic injects mid-conversation

The character is in the weights

It cannot tell what time it is

The runtime channel is functional emotions

Two readings of the same stack

What I take from it

Discussion

More articles

Anthropic put a meter on the rest of the agent

Any paid Chrome extension can be rebuilt in 30 KB

The Invisible Backslash: How a Shell Escaping Bug Silently Corrupted 20 Releases