KU researcher proposes data exchange model


LAWRENCE — It's no secret that economic growth and quality-of-life improvements are fueled by scientific research and development. Of course, the advancement of R&D requires that researchers and policymakers can access scientific data such as grants, patents, publications and data sets.

But accessing and exchanging data is no easy task, thanks to the tangled mess of R&D-related data across countries, disciplines, data providers and industrial sectors. Burdened with data that are inconsistently specified, researchers and policymakers have few mechanisms or incentives to share or interlink data. And on top of this, access to data is limited by a patchwork of laws, regulations and practices that are unevenly applied and interpreted.

"Data are king," said Donna Ginther, professor of economics at the University of Kansas, "but sharing and exchanging them can be incredibly difficult. And when we're not able to efficiently share scientific data, it impedes our understanding of the determinants of economic growth, technological advancement and all kinds of societal benefits."

The solution? According to Ginther, a Web-based infrastructure for data sharing and analysis could help, and data exchange standards are a first step. Ginther and her colleagues outline these prescriptions in the October edition of Science magazine, the world's leading peer-reviewed general science journal.

To address the challenge of data exchange, Ginther said, you need to start with a distributed data infrastructure.

"There's no single database solution," said Ginther, who serves as director of the Center for Science Technology & Economic Policy at the Institute for Policy & Social Research at KU. "Data sets are too big, there are confidentiality issues, and parties with proprietary components are unlikely to participate in a single-provider solution. Security and licensing require flexible access. Users must be able to attach and integrate new information."

Unified standards for exchanging data could enable a web-based distributed network, she said, combining local and cloud storage and providing public-access data and tools, private workspace "sandboxes" and versions of data to support parallel analysis. This infrastructure would likely concentrate existing resources, attract new ones and maximize benefits from coordination and interoperability while minimizing resource drain and top-down control.

Another key to addressing the data exchange challenge is to encourage broad-based participation. Of course, this is easier said than done.

"There are many players with different objectives – for example, multinational corporations, nonprofits, government agencies, academic researchers –but they have potentially complementary roles," she said. "What is lacking is coordination in establishing and adopting standards and a data exchange platform."

Ginther suggests that government funding agencies – through power of the purse – can encourage the development of coordinated standards in grant and reporting systems and can require that data produced with their support mesh with the infrastructure. Many corporations are already creating and curating data that support research on research. And participation can also be encouraged by tweaking the incentives for individual researchers – for example, by sharing data in a way that ensures a researcher is cited or attributed more often and across various publications.

Open data standards are also part of the solution, Ginther said. The nonprofit sector is well-positioned for defining data interoperability standards because it can bring players together with minimal conflicts of interest. In fact, several exchange standards are already in use, particularly in Europe and the United States. With these underlying exchange standards, the first step is to create a web-based registry for data, or expand an existing one, to meet the needs of a global, multidisciplinary community.

And then there's the issue of data security and privacy. As Ginther points out, data in any proposed infrastructure should be categorized in terms of level of sensitivity. For example, nonsensitive data can be made public. But it can get tricky from there. Privacy laws vary between countries. Person-level data has the potential to become sensitive if linked or otherwise enhanced. Processes for managing such data need to be implemented.
Despite all the potential hurdles, Ginther said she's optimistic the challenge of data exchange can be met, if for no other reason than the cost of not exchanging data is simply too high.

"Most stakeholders recognize the necessity of sharing data and recognize that we don't currently have a good infrastructure for doing so," she said. "Our proposed model offers potential benefits from combining and mining the vast data already available. The first step is to coordinate existing data exchange efforts, the foundation on which the entire effort relies."


Mon, 10/22/2012

author

Joe Monaco

Media Contacts

Joe Monaco

KU Office of Public Affairs

785-864-7100