A team of researchers from four universities – UCLA, UC San Francisco, University of Memphis and University of Pennsylvania – have been awarded a new data cyberinfrastructure grant by the National Science Foundation (NSF). The team will develop a new cyberinfrastructure called mProv to annotate high-frequency mobile sensor data with data source, quality, validity, and semantics to facilitate sharing of such data with the wider research community.
The project, mProv: Provenance-based Data Analytics Cyberinfrastructure for High-frequency Mobile Sensor Data, will be led by Dr. Santosh Kumar, a professor and Moss Chair of Excellence in Computer Science. Dr. Zachary Ives, another computer scientist, will lead the University of Pennsylvania team, Dr. Ida Sim, a professor of Medicine and medical informaticist, will lead the UCSF site, and Dr. Mani Srivastava, an electrical engineer and computer scientist, will lead the UCLA team. Other collaborators on the project include Open mHealth, Open Humans, and Quantified Self.
Mobile sensors (embedded in phones, vehicles, wearables, and the environment) continuously capture data in great detail, and hold tremendous potential to advance science, and to directly impact health, wellness, mobility/transportation, and energy. The open-source software developed by the NIH-funded Center of Excellence for Mobile Sensor Data-to-Knowledge (MD2K), also led by Kumar and involving UCLA and UCSF, allows any researcher to collect, analyze, and interpret high-frequency sensor data from the natural environment. However, research with such sensor data is still out of reach for most researchers; it involves significant resources and expertise to acquire sensors, obtain study approval, and recruit human subjects before collecting the data.
By developing a provenance cyberinfrastructure that will integrate metadata with streaming sensor data, the mProv project will enable sharing of mobile sensor data with third party researchers. It will accelerate research by tapping into the growing interest by the research community and unleash the potential of mobile sensor data to improve health and wellness on an individual level by developing computational models of human health and behavior.
Said Kumar: “With the mProv provenance cyberinfrastructure complementing MD2K’s software, investigators can collect, curate, analyze, and interpret mobile sensor data, as well as share data. Doing so can amplify the research utility of their data and, most importantly, help establish benchmarks and bring reproducibility, which are key to scientific rigor.”
“The mProv tools will make it convenient to generate and propagate metadata for streaming sensor data. This will let us ‘snapshot and replay’ such data and the outputs of streaming algorithms, and to compare alternatives on an apples-to-apples basis,” said Ives.
Said Sim: “The infrastructure will accommodate a wide variety of data types and will enable data discovery, analytics, and visualization from third parties, including researchers and industry.“
Srivastava also noted that “To address the privacy concerns associated with data from mobile personal sensors, our team will also investigate privacy mechanisms to ensure privacy of data contributors, while allowing research using their data.”
The mProv: Provenance-based Data Analytics Cyberinfrastructure for High-frequency Mobile Sensor Data project is part of the National Science Foundation (NSF) Data Infrastructure Building Blocks (DIBBs) program and Cyberinfrastructure Framework for 21st Century Science and Engineering (CIF21) initiative. For additional information on the mProv project, visit http://mprov.md2k.org/. For additional information on DIBBs, visit https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504776. For additional information on MD2K research, visit www.MD2K.org. or email info@md2k.org.