In science, a deluge of data
Updated: 2013-08-25 09:20
By John Markoff(The New York Times)
Vinton Cerf of Google calls for sharing the costs in making scientific data widely available. Andrew Federman for Google
The torrents of digital data from scientific research have spawned a debate over who should have access to it, how it can be stored and who will pay to do so.
Vinton Cerf, the vice president of Google, said the issue has become crucial for public and private institutions.
And Alan Blatecky, the director of advanced cyberinfrastructure at the National Science Foundation in Virginia, said: "Data is the new currency for research. The question is how do you address the cost issues, because there is no new money."
There is a growing international recognition of the scope of the problem. The Research Data Alliance, begun last August with just eight researchers, now has more than 750 academic, corporate and government scientists and information technology specialists in 50 countries.
Agencies in the United States are proposing to "support increased public access to the results of research funded by the federal government."
Dr. Cerf and Francine Berman, a computer scientist at Rensselaer Polytechnic Institute in Troy, New York, argue in a paper published in the journal Science that companies and colleges must invest in new computer data centers so that crucial research data is not irretrievably lost.
"There is no economic 'magic bullet' that does not require someone, somewhere, to pay," they wrote.
Dr. Berman leads the United States branch of the Research Data Alliance, an organization of academic, government and corporate researchers attempting to build new storage systems. "Publicly accessible data requires a stable home and someone to pay the mortgage," she said.
Google initially promised to host large data sets for scientists for free, then killed the program in 2008 after just a year, for unspecified business reasons. It may have been that the company was taken aback by the size of scientific data sets.
Dr. Berman and Dr. Cerf argue that coping with the explosion of data would require a cultural shift on the part of individual scientists.
"The casual approach for many scientists has been to 'stick it on my disk drive and make it available to anyone who wants to use it,' " Dr. Cerf said.
They argued that the costs need not be prohibitive. "If you want to download a song from iTunes, it's not free, but it doesn't break the bank," Dr. Berman said.
Dr. Berman said there were models that could provide ideas for the new infrastructures needed to store the data and make it accessible. The social science database Longitudinal Study of American Youth, which is maintained by the Inter-University Consortium for Political and Social Research at the University of Michigan, charges users a subscription fee.
Some scientists argue that there would be advantages to charging for data. Bernardo A. Huberman, a physicist at Hewlett-Packard Laboratories, said, "Paying a small fee for downloads in the aggregate would also act as an incentive for providing the needed infrastructure."
The New York Times
(China Daily 08/25/2013 page11)