To develop schemas, we first identify widely used clinical measures and consult with clinical experts to identify the most important distinctions for their clinical use, and then apply these insights to the design of corresponding schemas. Then we define use cases that include these measures in new models of care enabled by mHealth technology. While there isn’t one “correct” schema for any given measure, our schemas aim to offer an ideal format and description of digital health data for supporting clinical and self care. Here are the design principles we follow for Open mHealth schemas, based in part on this Presidential health IT report.

1. Atomicity

Our schemas represent data at a granularity shown by clinical use cases to be most useful for preventive and chronic disease management, self-care and clinical care. Our driving use cases include new models of care enabled by mHealth technology, and are not restricted to traditional assumptions about clinical care models and roles. In many cases, this granularity is more atomic than in EHR data standards. For quantitative clinical measures, the schemas can be used to describe either a single measurement (e.g., one blood glucose value) or a descriptive statistic of aggregate measurements (e.g., an average, minimum or maximum blood glucose value over a period of time).

2. Balancing parsimony and complexity

Simplicity is good, but only as simple as the world is complex. Health data can be tremendously complex. We follow a rough 80/20 rule, designing schemas to be as comprehensive as needed for the majority of anticipated mHealth use cases, but not requiring complexity where it isn’t appropriate. For example, our blood glucose schema includes the timing of a glucose measurement in relationship to meals (e.g., fasting) or sleep (e.g., at bedtime), because these values are important for many diabetes use cases, but the schema does not include timing in relationship to physical activity (less commonly useful). The schemas follow the closed-world assumption, meaning that what is stated is true and what is not stated is false.

3. Balancing permissiveness and constraints

There is a tension between wanting all data to be clean, accurate and standardized, and the work that is required to get data into that state. Schemas that are too constrained and difficult to use won’t be adopted, but permissive and easy to use schemas may lead to clinically meaningless data. We try to be pragmatic in striking a balance between permissiveness and constraint. We enforce only the most direct integrity constraints at the schema level, e.g., correct units of measurement (such as mg/dL for blood glucose and mmHg for blood pressure) through permissible value sets, or cardinality (e.g., each blood pressure reading has one and only one of a systolic and a diastolic reading). Following JSON Schema conventions, schema properties are optional unless they are listed in the “required” array at the bottom of the schema definition. In this way, schemas allow toolmakers and data providers the flexibility to choose how much of the data schema they want to use while still ensuring data interoperability, so long as the most important constraints are met. We do not enforce the majority of clinically useful constraints at the schema level (e.g., that the end date of a time interval must be the same as or later than the start date). We assume that these constraints will be enforced programmatically by the system generating or consuming the data. For example, our blood glucose schema includes the timing of a glucose measurement in relationship to meals (e.g., fasting) or sleep (e.g., at bedtime) because these values are important for many diabetes use cases, but the schema does not include timing in relationship to physical activity (less commonly useful).

4. Designing for data liquidity

Open mHealth’s vision for the digital health ecosystem is that data will flow far and wide to be recombined and used for many different purposes. For data transport, data just needs to get from sender to receiver. For data interchange, in contrast, the meaning of the data needs to get from sender to receiver. Open mHealth schemas aim to preserve the most important clinical meaning as mHealth data is exchanged. To support proper interpretation by data recipients, the context of the original data point must be available along with the data. This context data, or metadata, is both operational and clinical. Operational metadata relates to the data payload and is captured in the header schema. Clinical metadata relates to context for clinical interpretation and is captured in the measure schema. We define our metadata schema against clinical data interchange use cases such as search, filtering, transport, identification, etc.

Header schema

All Open mHealth data points include information carried in the header schema. The header currently includes two broad categories of metadata: Data point creation and identification: this includes information on the schema for this data point, when the data point was created, and a unique data point ID. Acquisition provenance: this includes information about where the data came from, (e.g., the name of the device or app), when the data was collected at the source and how it was collected (via a sensor or reported by a person). Future categories of metadata that we are working on include specific device information (e.g., the Unique Device Identifier or serial number), processing provenance (or data lineage), quality of information (QOI) provenance (e.g., measurement accuracy, etc.), and privacy metadata. For processing and QOI provenance, our approach will be to point to a data structure that stores all the “hops” that a data point has traversed and its QOI, (e.g., from its source to a cloud aggregator to an analytic platform to transformation by an algorithm to a user-facing app). This work is being done in partnership with the MD2K project and other relevant partners, starting later in 2015.

4. Alignment with clinical data standards

One of the greatest challenges to using health-related data is its semantic complexity. Developers new to digital health often underestimate the complexity of digital health data, and can easily be overwhelmed by the need to use standard clinical vocabularies like SNOMED and RxNORM in order to interface with the electronic health record (EHR) and other health information technologies. We are supporting developers by handling the work of indexing mHealth schemas and their instantiated data points to existing clinical vocabularies.


Clinical measure schemas

We have associated each clinical measure schema with the most precise code we can identify from standard clinical vocabularies that are being used for EHR data (e.g., SNOMED for diagnoses, LOINC or SNOMED for lab tests, and RxNORM for medications). For example, the blood glucose schema is annotated with the SNOMED code 365812005, which is referenced within the schema to a persistent URL that is hosted by the BioPortal terminology server at Stanford. BioPortal provides a stable source of documentation and other terminology services for the most commonly used biomedical vocabularies. We look for an exact match between the schema’s meaning and the definition in a standard vocabulary. If none can be found, we choose the nearest parent term (i.e., the term that has the closest, more generic meaning so it is always correct, if not precise, rather than either more precise or just wrong). Schemas that can describe either a single or an aggregate measurement are annotated with the code for the measurement (e.g., body weight is annotated with SNOMED code 363808001, which describes body weight measure). For digital health data that are not yet represented in standard vocabularies, we do not reference a code. Various tools are available for searching standard vocabularies, either singly or as a set.
  • NLM Terminology Services (registration with NLM and login required):
    1. SNOMED browser (guide to using SNOMED )
    2. UMLS Metathesaurus browser: this allows a search of all the vocabularies included in UMLS, which is useful to get a sense of how the concept is described in general
  • LOINC browser
  • NCBO BioPortal allows browsing of various vocabularies, including SNOMED and LOINC
  • RxNORM browser is an app that can be downloaded from the NLM site: it allows search of drugs by trade name and generic name and provides a visual representation of the relationships between a drug, its active ingredient and the various forms in which it is made available

Permissible Value Sets

Some schemas enumerate a list of permissible values for certain data attributes. For example, the strength of a medication is restricted to units of measurement. To strike a balance between imposing inappropriate constraints and facilitating ease of reuse, we enumerate permissible values only where there is high consensus and where the values are unlikely to change (e.g., units of measurement). We draw from standard vocabularies or value lists where possible, rather than reinventing sets. For example, almost all of the units of measure used in the schemas come from the Common Synonym column of the Commonly Used UCUM Codes for Healthcare Units (an exception is the unit of measure of BMI, for which we use the UCUM code). Permissible value sets are declared as independent sets and referenced when used by multiple schemas unless we foresee the value set being used only by an individual schema, in which case it is defined within that schema.

5. Modeling of Time

Timing is one of the most important things to know about a piece of clinical data. Based on various use cases, we have defined a set of schemas that accommodate either point-based or interval-based descriptions of time. By providing a clean and expressive set of time description schemas, we hope to promote more careful and thorough descriptions of temporal context for mHealth data. Our modeling principles include:

Point and Interval Representations

Measurements (e.g., heart rate, blood glucose) are associated to either a single time point (e.g., a blood glucose reading taken at 7:32 AM Standard Pacific Time on March 7, 2015) or to a time interval (e.g., the average blood glucose from March 1 to March 31, 2015). Clinical schemas for such measures include a time frame data element to allow specification of a time interval or a point in time.

Effective Time Frame

Open mHealth clinical measure schemas include a property called “effective time frame,” which can be used to describe the timing of the reported value. Let’s take this example:Sara measures her body weight with a regular scale at 8 AM on March 5, 2015. This means clinically the weight is “effective” at 8 AM March 5, 2015. She enters the value into her app on March 7, two days later, at which time the data is sent to a cloud aggregator. Her physical activity app downloads that weight reading on March 10, for which the download timestamp is 5 days later than the actual reading and the “effective” time. A body weight’s “effective” date-time can be specified in the body weight schema using the time-frame property. The effective time frame is different from the acquisition time frame, which is the time the data is acquired by a sensor (device or app) and is specified in the header schema as metadata. The effective time frame is the same as the acquisition time frame for situations when the data is recorded in real time (e.g., a device tracking a user’s heart rate). The times may or may not be the same when the data is self-reported by the user (e.g., a user reporting an episode of chest pain after it has occurred). Schemas include the option to describe the effective time frame whenever relevant. If a particular instance of data does not include a description of effective time, developers can use the acquisition time from the header schema as an approximation.

Date, Time, and Time Zones

We distinguish between date, where the granularity of representation is to a calendar day, and a date-time timestamp, where the granularity of representation is to milliseconds in UTC time.

More information