Schema design principles
To develop schemas, we first identify widely used clinical measures and consult with clinical experts to identify the most important distinctions for their clinical use, and then apply these insights to the design of corresponding schemas. Then we define use cases that include these measures in new models of care enabled by mHealth technology. While there isn’t one “correct” schema for any given measure, our schemas aim to offer an ideal format and description of digital health data for supporting clinical and self care. Here are the design principles we follow for Open mHealth schemas, based in part on this Presidential health IT report.
1. Atomicity
Our schemas represent data at a granularity shown by clinical use cases to be most useful for preventive and chronic disease management, self-care and clinical care. Our driving use cases include new models of care enabled by mHealth technology, and are not restricted to traditional assumptions about clinical care models and roles. In many cases, this granularity is more atomic than in EHR data standards. For quantitative clinical measures, the schemas can be used to describe either a single measurement (e.g., one blood glucose value) or a descriptive statistic of aggregate measurements (e.g., an average, minimum or maximum blood glucose value over a period of time).2. Balancing parsimony and complexity
Simplicity is good, but only as simple as the world is complex. Health data can be tremendously complex. We follow a rough 80/20 rule, designing schemas to be as comprehensive as needed for the majority of anticipated mHealth use cases, but not requiring complexity where it isn’t appropriate. For example, our blood glucose schema includes the timing of a glucose measurement in relationship to meals (e.g., fasting) or sleep (e.g., at bedtime), because these values are important for many diabetes use cases, but the schema does not include timing in relationship to physical activity (less commonly useful). The schemas follow the closed-world assumption, meaning that what is stated is true and what is not stated is false.3. Balancing permissiveness and constraints
There is a tension between wanting all data to be clean, accurate and standardized, and the work that is required to get data into that state. Schemas that are too constrained and difficult to use won’t be adopted, but permissive and easy to use schemas may lead to clinically meaningless data. We try to be pragmatic in striking a balance between permissiveness and constraint. We enforce only the most direct integrity constraints at the schema level, e.g., correct units of measurement (such as mg/dL for blood glucose and mmHg for blood pressure) through permissible value sets, or cardinality (e.g., each blood pressure reading has one and only one of a systolic and a diastolic reading). Following JSON Schema conventions, schema properties are optional unless they are listed in the “required” array at the bottom of the schema definition. In this way, schemas allow toolmakers and data providers the flexibility to choose how much of the data schema they want to use while still ensuring data interoperability, so long as the most important constraints are met. We do not enforce the majority of clinically useful constraints at the schema level (e.g., that the end date of a time interval must be the same as or later than the start date). We assume that these constraints will be enforced programmatically by the system generating or consuming the data. For example, our blood glucose schema includes the timing of a glucose measurement in relationship to meals (e.g., fasting) or sleep (e.g., at bedtime) because these values are important for many diabetes use cases, but the schema does not include timing in relationship to physical activity (less commonly useful).4. Designing for data liquidity
Open mHealth’s vision for the digital health ecosystem is that data will flow far and wide to be recombined and used for many different purposes. For data transport, data just needs to get from sender to receiver. For data interchange, in contrast, the meaning of the data needs to get from sender to receiver. Open mHealth schemas aim to preserve the most important clinical meaning as mHealth data is exchanged. To support proper interpretation by data recipients, the context of the original data point must be available along with the data. This context data, or metadata, is both operational and clinical. Operational metadata relates to the data payload and is captured in the header schema. Clinical metadata relates to context for clinical interpretation and is captured in the measure schema. We define our metadata schema against clinical data interchange use cases such as search, filtering, transport, identification, etc.Header schema
All Open mHealth data points include information carried in the header schema. The header currently includes two broad categories of metadata: Data point creation and identification: this includes information on the schema for this data point, when the data point was created, and a unique data point ID. Acquisition provenance: this includes information about where the data came from, (e.g., the name of the device or app), when the data was collected at the source and how it was collected (via a sensor or reported by a person). Future categories of metadata that we are working on include specific device information (e.g., the Unique Device Identifier or serial number), processing provenance (or data lineage), quality of information (QOI) provenance (e.g., measurement accuracy, etc.), and privacy metadata. For processing and QOI provenance, our approach will be to point to a data structure that stores all the “hops” that a data point has traversed and its QOI, (e.g., from its source to a cloud aggregator to an analytic platform to transformation by an algorithm to a user-facing app). This work is being done in partnership with the MD2K project and other relevant partners, starting later in 2015.4. Alignment with clinical data standards
One of the greatest challenges to using health-related data is its semantic complexity. Developers new to digital health often underestimate the complexity of digital health data, and can easily be overwhelmed by the need to use standard clinical vocabularies like SNOMED and RxNORM in order to interface with the electronic health record (EHR) and other health information technologies. We are supporting developers by handling the work of indexing mHealth schemas and their instantiated data points to existing clinical vocabularies.Clinical measure schemas
We have associated each clinical measure schema with the most precise code we can identify from standard clinical vocabularies that are being used for EHR data (e.g., SNOMED for diagnoses, LOINC or SNOMED for lab tests, and RxNORM for medications). For example, the blood glucose schema is annotated with the SNOMED code 365812005, which is referenced within the schema to a persistent URL that is hosted by the BioPortal terminology server at Stanford. BioPortal provides a stable source of documentation and other terminology services for the most commonly used biomedical vocabularies. We look for an exact match between the schema’s meaning and the definition in a standard vocabulary. If none can be found, we choose the nearest parent term (i.e., the term that has the closest, more generic meaning so it is always correct, if not precise, rather than either more precise or just wrong). Schemas that can describe either a single or an aggregate measurement are annotated with the code for the measurement (e.g., body weight is annotated with SNOMED code 363808001, which describes body weight measure). For digital health data that are not yet represented in standard vocabularies, we do not reference a code. Various tools are available for searching standard vocabularies, either singly or as a set.- NLM Terminology Services (registration with NLM and login required):
- SNOMED browser (guide to using SNOMED )
- UMLS Metathesaurus browser: this allows a search of all the vocabularies included in UMLS, which is useful to get a sense of how the concept is described in general
- LOINC browser
- NCBO BioPortal allows browsing of various vocabularies, including SNOMED and LOINC
- RxNORM browser is an app that can be downloaded from the NLM site: it allows search of drugs by trade name and generic name and provides a visual representation of the relationships between a drug, its active ingredient and the various forms in which it is made available