Publications

Creating Thriving Data-Centric Communities from Basic Research to Commercial Applications

Abstract

The ability to accumulate and analyze large quantities of data is rapidly becoming a competitive advantage not only in science but in the broader economy as well. Advances such as AlphaFold, the AI-based protein prediction tool and ChatGPT the large language model-based chat bot, have ignited enormous excitement in science and industry for leveraging data and computational techniques to solve important problems. However, what is typically lost in all the excitement is the fact that such startling achievements were only possible after a critical mass of high quality data existed to train models using machine learning algorithms. Both examples relied on open data sources that were generated painstakingly by user communities over the course of decades. We argue that in order to unlock future high impact data science achievements like these will require a culture of and skill set for data management, sharing …

Metadata

publication
2024 IEEE 20th International Conference on e-Science (e-Science), 1-3, 2024
year
2024
publication date
2024/9/16
authors
Robert Schuler, Carl Kesselman
link
https://ieeexplore.ieee.org/abstract/document/10678730/
conference
2024 IEEE 20th International Conference on e-Science (e-Science)
pages
1-3
publisher
IEEE