Beyond sensor data: Foundation models of behavioral data from wearables

Here's a summary of the themes from the Hacker News discussion:

Advancement in Wearable Health Monitoring

The core of the discussion revolves around Apple's research paper on using a "foundation model" trained on behavioral biomarkers (derived from raw sensor data) for health predictions. Users highlighted the shift in abstraction from raw sensor data (like PPG and accelerometers) to derived metrics (like HRV and resting heart rate).

brandonb stated, "The innovation of this 2025 paper from Apple is moving up to a higher level of abstraction: instead of training on raw sensor data (PPG, accelerometer), it trains on a timeseries of behavioral biomarkers derived from that data (e.g., HRV, resting heart rate, and so on.)."
The paper claims high accuracy in detecting conditions like diabetes (83%), heart failure (90%), and sleep apnea (85%).
aanet expressed interest in longitudinal analysis of their own Apple Health data, mentioning, "I have about 3-3.5 years worth of Apple Health + Fitness data (via my Apple Watch) encompassing daily walks / workouts / runs / HIIT / weight + BMI / etc."

Data Privacy and Corporate Interest

A significant concern raised was the potential misuse of health data collected by wearables. Users expressed anxieties about insurance companies, other corporations, and even government agencies gaining access to this sensitive information and using it for profiling or increasing premiums.

jeron articulated this fear: "I'm sure they're also interested in the data. Imagine raising premiums based on conditions they detect from your wearables. That's why it's of utmost importance to secure biometrics data."
autoexec added to this sentiment, stating, "There are so many companies across many industries who are salivating at the thought of everyone using wearables to monitor their "health" and getting their hands on that data. Including law enforcement, lawyers, and other government agencies."
piratesAndSons voiced a direct distrust of corporate handling of data: "Trusting your health data with AI brothers is... extremely ill-advised. I don't even trust Apple themselves, which will sell your health data any insurance company any minute now."
autoexec further elaborated on how data might be exploited indirectly: "They might not sell 'your' data outright, but it doesn't mean they won't sell inferences/assumptions that they make about you using your data."

Accuracy, Precision, and Recall

The discussion touched upon the nuances of reporting accuracy in machine learning models, particularly in the context of health predictions. Users debated the meaning of reported accuracy percentages and the trade-offs between precision and recall.

teiferer questioned the reporting of accuracy: "What is an "accuracy" of 83%? Do 83% of predicted diabetes cases actually have diabetes? Or did 83% of those who have diabetes get diagnosed as such? It's about precision vs. recall. You can improve one by sacrificing the other. Boiling it down to one number is hard."
LPisGood expressed surprise at the performance: "Is anyone else surprised by how poorly performing the results are for the vast majority of cases? The foundation model which had access to sensor data and behavioral biomarkers actually underperformed the baseline predictor that just uses nonspecific demographic data in almost 10 areas."
Herring offered a pragmatic perspective on data limitations: "I worked with similar data in grad school. I'm not surprised. You can have a lot of data, but sometimes the signal (or signal quality) just isn't present in that haystack, and there's nothing you can do about it."

Data Availability and Research Opportunities

The availability of data for research and product development was a key point of discussion, with some users lamenting the lack of open data and models.

fiduciarytemp inquired about the release of weights or an API: "Has anyone seen the publishing of the weights or even an API release?"
brandonb responded that weights could not be released due to consent terms from study participants.
vibecodermcswag highlighted the issue of proprietary data: "i love this because I build in medtech, but the big problem is no open weights, nor open data."
pricklyprice sought resources for acquiring data for research: "what is the best way for non-big tech to buy such data for research and product development?"
guzik and pricklyprice shared links to publicly available datasets: "aidlab.com/datasets", "physionet.org", and "cseweb.ucsd.edu/~jmcauley/datasets/fitrec.html".

Terminology and Model Origins

There was a brief exchange about the term "foundation model" and speculation about the origins of Apple's VO2Max calculations.

throwaway314155 asked if "foundation model" had become a term of art.
brandonb provided context, stating, "However, the phrase "foundation model" wasn't coined until 2021, to my knowledge."
A debate ensued regarding Apple's VO2Max estimations and whether they utilized the deep neural network mentioned in one of the linked papers, with llm_nerd arguing against a direct link and brandonb seeking clarification.

Technical Aspects and Implementation

Some users discussed the technical implementation and the potential for users to analyze their own data.

memming noted the use of contrastive loss: "Interesting to see contrastive loss instead of a reconstruction loss."
dyauspitr expressed a desire to run such analyses on personal data: "Is there a way to run this on your own data? I’ve been wearing my Apple Watch for years and would love to be able to use it better."
brandonb mentioned that similar tools might be available in the future, suggesting, "Feel free to email me at bmb@empirical.health and I can add you to a beta once we have it ready!"
aanet expressed a preference for DIY analysis due to data confidentiality and a desire to learn, stating, "some data is confidential (I'd hate for it to leave my devices) and wanna DIY / learn / iterate."