Publications
Creating Thriving Data-Centric Communities from Basic Research to Commercial Applications
Abstract
The ability to accumulate and analyze large quantities of data is rapidly becoming a competitive advantage not only in science but in the broader economy as well. Advances such as AlphaFold, the AI-based protein prediction tool and ChatGPT the large language model-based chat bot, have ignited enormous excitement in science and industry for leveraging data and computational techniques to solve important problems. However, what is typically lost in all the excitement is the fact that such startling achievements were only possible after a critical mass of high quality data existed to train models using machine learning algorithms. Both examples relied on open data sources that were generated painstakingly by user communities over the course of decades. We argue that in order to unlock future high impact data science achievements like these will require a culture of and skill set for data management, sharing …
Metadata
- publication
- 2024 IEEE 20th International Conference on e-Science (e-Science), 1-3, 2024
- year
- 2024
- publication date
- 2024/9/16
- authors
- Robert Schuler, Carl Kesselman
- link
- https://ieeexplore.ieee.org/abstract/document/10678730/
- conference
- 2024 IEEE 20th International Conference on e-Science (e-Science)
- pages
- 1-3
- publisher
- IEEE