You wake up, reach for your phone, and the first thing you check is your sleep score. It says 74. Yesterday it was 81. You slept the same number of hours, you feel roughly the same. So what changed? And more to the point: what does that number actually mean?
If you have been wearing an Apple Watch for more than a few weeks, you have probably noticed that the score does not always match how you feel. Sometimes you feel sharp after a night the watch considers mediocre. Sometimes you feel foggy despite a high score. That gap is not a bug in your perception. It is a limitation of what wearable devices can and cannot measure. Understanding the limitation makes the data more useful.
What the Apple Watch sleep score is actually based on
The score and your sleep stage breakdown are two different things. This article covers both, but it is worth being clear on what drives which.
According to Apple's documentation, the sleep score is built from three inputs: how long you slept relative to your sleep goal, how consistent your bedtime was compared to your recent history, and how many interruptions the watch detected. It is primarily a behavioural consistency metric. Duration and regularity drive it more than anything else.
That is a reasonable thing to measure. Consistent sleep timing is genuinely associated with better sleep quality in the research literature. But it also means the score can look good on a night when your actual sleep architecture — the distribution of sleep stages across the night — was disrupted, as long as you were in bed at your usual time and stayed there. And it can look poor on a night when you slept deeply but shorter than usual.
This matters because most people assume the score is summarising sleep quality in a richer sense than it is.
What sleep stages are, and why they are different
Sleep staging is a separate concept. Sleep architecture refers to the way your brain cycles through distinct stages across a night: three phases of non-REM (NREM) sleep, which get progressively deeper, and REM (rapid eye movement) sleep. A typical cycle lasts roughly 90 minutes. Most people move through four to six per night.
The deeper NREM stages — slow-wave or deep sleep — are associated with physical restoration. REM sleep is tied more to cognitive processing, emotional regulation, and memory consolidation.
These stages are defined by brain activity. The gold standard for measuring them is polysomnography, a clinical sleep study using electrodes on the scalp to record electrical signals, alongside sensors for eye movement, muscle tone, heart rate, and breathing. It is conducted in a lab. Nothing about it resembles what your watch does.
What the watch can and cannot infer
Apple Watch does not read your brain. It reads your wrist.
It uses two main signals: an accelerometer, which detects movement, and photoplethysmography (PPG), the green light sensor on the back of the watch that estimates heart rate. From these two signals, the watch infers what stage of sleep you are likely in.
The logic is not unreasonable. Movement tends to decrease as sleep deepens. Heart rate falls during NREM and fluctuates during REM. Heart rate variability (HRV), which the watch also estimates, shifts across sleep stages in ways that correlate with the underlying physiology. Algorithms trained on large datasets can use these signals to make probabilistic guesses about which stage you are in at any given moment.
The key word is "guesses." Sophisticated, well-calibrated guesses, but guesses. The signal available at the wrist is a meaningful proxy, not a direct measurement.
Research comparing consumer wearables to polysomnography consistently shows these devices perform reasonably well at detecting overall sleep duration and distinguishing broad categories (asleep versus awake). They are less reliable at accurately staging sleep into specific phases, particularly at distinguishing light NREM from deep NREM. REM detection is better than chance but still imperfect. Accuracy varies by device, individual physiology, and night-to-night conditions.
This is not a criticism of Apple specifically. It applies to the entire category of consumer wrist-worn sleep trackers.
Why the score fluctuates, and what drives it
Understanding the measurement method makes the score's volatility easier to interpret. A single-night fluctuation of five to ten points on most sleep metrics is often within the range of normal variability, both in your actual sleep and in what the watch can detect.
Several things can shift your sleep data on a given night that have nothing to do with sleep quality in the clinical sense:
- Alcohol. Suppresses REM sleep in the first half of the night and can cause a rebound effect in the second half. Even one or two drinks several hours before bed can alter your sleep architecture in ways the watch will detect: more restlessness, elevated heart rate, reduced HRV.
- Late meals. Influence core body temperature and digestion, both of which affect the cardiac signals the watch reads.
- Stress. Whether from a difficult day, an anticipatory worry, or a subtly elevated cortisol level, it keeps heart rate and sympathetic nervous system activity higher during sleep. This registers as reduced recovery quality in the data even if you slept a reasonable number of hours.
- Room temperature. Affects thermoregulation, which in turn affects heart rate during sleep. A room that is too warm tends to push resting heart rate up slightly through the night.
- Watch fit and position. Matters more than most people realise. A loose-fitting band, a watch worn differently than usual, or sleeping on the wrist can introduce noise into the PPG signal that distorts stage estimates.
None of this means the watch is wrong about what it detected. The data reflects a complex system — your body, your environment, your behaviour — and cannot tell you whether you slept "well" in some absolute sense.
Sleep architecture changes naturally with age
One pattern worth knowing: the proportion of deep slow-wave sleep tends to decrease with age, often beginning in the mid-thirties and continuing gradually from there. This is a well-documented and normal part of how sleep architecture shifts over a lifetime.
If your deep sleep percentage looks lower than benchmarks you have read about online, it is worth knowing that most of those benchmarks are population averages that do not always account well for age. A deep sleep percentage that might look low compared to a 25-year-old's profile may be entirely typical for your age and health status.
The same applies to total sleep time. Sleep consolidation — sleeping in one continuous block — often becomes lighter with age. More nighttime awakenings are common and do not necessarily indicate a sleep disorder.
If daytime fatigue is persistent and affecting your quality of life, or if trends in your data feel genuinely significant, those are reasonable things to discuss with a healthcare provider. But a single night with less deep sleep than usual, or a week of lower scores during a stressful period, is not a signal to treat as a clinical finding.
How to use the score well
Given what the score does and does not measure, a few habits make it more useful:
- Watch the weekly trend, not last night. A single score is noise. Your rolling seven-day average, or whether scores have been trending up or down over the past few weeks, carries actual signal. A consistent pattern of lower scores during a high-stress period or after a run of late nights is informative. A single 72 is not.
- Check bedtime consistency first. Because the score weights regularity heavily, erratic bedtimes will drag it down regardless of how deeply you slept. If your score dropped, check whether your bedtime shifted before drawing any other conclusions.
- Compare with how you feel. The data is most interesting when it diverges from your subjective experience. A week of decent scores during which you feel flat is worth paying attention to. A modest score after which you feel sharp is equally informative. Divergences are where the useful questions live.
- Note alcohol and late meals before interpreting dips. Both reliably affect the cardiac signals the watch reads — elevated resting heart rate, suppressed HRV, increased restlessness — without affecting how long you were in bed. If you had a late dinner or a drink and your score dropped, you likely have your explanation.
- Treat the score as a prompt, not a verdict. A score that prompts you to recall what you did the night before is doing its job. A score that makes you anxious before you have had coffee is not being read correctly.
What the score does not capture
There are meaningful dimensions of sleep quality that consumer wearables simply cannot measure: sleep continuity (how many brief micro-arousals you had that were too short to register as full awakenings), breathing irregularities (shallow breathing or mild apnoeic events that disrupt sleep architecture without waking you fully), sleep latency (how long it actually took you to fall asleep, versus the watch's estimate), and subjective sleep quality itself, which has its own clinical weight that no sensor can replicate.
If you have concerns about your breathing during sleep, snoring that is disruptive to you or a partner, or persistent excessive daytime sleepiness, those are questions for a clinician, not a wearable. A proper sleep study captures data that no consumer device currently approximates.
What to make of the score
Your Apple Watch sleep score is a useful approximation built from indirect signals. The score itself — built from duration, bedtime consistency, and interruptions — is a behavioural metric. The sleep stage breakdown is a separate layer on top of that: the watch's inference about which phase you were likely in, drawn from movement and heart rate. These two outputs have different methods and different accuracy profiles. Neither is measuring your brain. Both are measuring what is available at the wrist and drawing conclusions from it. When either is wrong, it is usually wrong at the margins — minor duration miscalculations, imprecise stage boundaries — rather than wildly wrong about whether you slept or how long.
Used over time, the data is genuinely informative. Used as a daily grade on how you slept, it will frequently mislead you.
The number on your screen this morning is one observation from a night shaped by dozens of variables. It is worth noticing, not worth treating as a verdict.
Vitanzo is built around exactly this kind of pattern — connecting your Apple Health data, including sleep, HRV, and heart rate, with your own daily observations, and generating a plain-language picture of what the numbers are actually saying about you specifically.