-
Decades of Transformation: Evolution of the NASA Astrophysics Data System's Infrastructure
Authors:
Alberto Accomazzi
Abstract:
The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technolog…
▽ More
The NASA Astrophysics Data System (ADS) is the primary Digital Library portal for researchers in astronomy and astrophysics. Over the past 30 years, the ADS has gone from being an astronomy-focused bibliographic database to an open digital library system supporting research in space and (soon) earth sciences. This paper describes the evolution of the ADS system, its capabilities, and the technological infrastructure underpinning it.
We give an overview of the ADS's original architecture, constructed primarily around simple database models. This bespoke system allowed for the efficient indexing of metadata and citations, the digitization and archival of full-text articles, and the rapid development of discipline-specific capabilities running on commodity hardware. The move towards a cloud-based microservices architecture and an open-source search engine in the late 2010s marked a significant shift, bringing full-text search capabilities, a modern API, higher uptime, more reliable data retrieval, and integration of advanced visualizations and analytics.
Another crucial evolution came with the gradual and ongoing incorporation of Machine Learning and Natural Language Processing algorithms in our data pipelines. Originally used for information extraction and classification tasks, NLP and ML techniques are now being developed to improve metadata enrichment, search, notifications, and recommendations. we describe how these computational techniques are being embedded into our software infrastructure, the challenges faced, and the benefits reaped.
Finally, we conclude by describing the future prospects of ADS and its ongoing expansion, discussing the challenges of managing an interdisciplinary information system in the era of AI and Open Science, where information is abundant, technology is transformative, but their trustworthiness can be elusive.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Improving the visibility and citability of exoplanet research software
Authors:
Alice Allen,
Alberto Accomazzi,
Joe P. Renaud
Abstract:
The Astrophysics Source Code Library (ASCL) is a free online registry for source codes of interest to astronomers, astrophysicists, and planetary scientists. It lists, and in some cases houses, software that has been used in research appearing in or submitted to peer-reviewed publications. As of December 2023, it has over 3300 software entries and is indexed by NASA's Astrophysics Data System (ADS…
▽ More
The Astrophysics Source Code Library (ASCL) is a free online registry for source codes of interest to astronomers, astrophysicists, and planetary scientists. It lists, and in some cases houses, software that has been used in research appearing in or submitted to peer-reviewed publications. As of December 2023, it has over 3300 software entries and is indexed by NASA's Astrophysics Data System (ADS) and Clarivate's Web of Science.
In 2020, NASA created the Exoplanet Modeling and Analysis Center (EMAC). Housed at the Goddard Space Flight Center, EMAC serves, in part, as a catalog and repository for exoplanet research resources. EMAC has 240 entries (as of December 2023), 78% of which are for downloadable software.
This oral presentation covered the collaborative work the ASCL, EMAC, and ADS are doing to increase the discoverability and citability of EMAC's software entries and to strengthen the ASCL's ability to serve the planetary science community. It also introduced two new projects, Virtual Astronomy Software Talks (VAST) and Exoplanet Virtual Astronomy Software Talks (exoVAST), that provide additional opportunities for discoverability of EMAC software resources.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Experimenting with Large Language Models and vector embeddings in NASA SciX
Authors:
Sergi Blanco-Cuaresma,
Ioana Ciucă,
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Kelly E. Lockhart,
Felix Grezes,
Thomas Allen,
Golnaz Shapurian,
Carolyn S. Grant,
Donna M. Thompson,
Timothy W. Hostetler,
Matthew R. Templeton,
Shinyi Chen,
Jennifer Koch,
Taylor Jacovich,
Daniel Chivvis,
Fernanda de Macedo Alves,
Jean-Claude Paquin,
Jennifer Bartlett,
Mugdha Polimera,
Stephanie Jarmak
Abstract:
Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed a…
▽ More
Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when using Retrieval Augmented Generation. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Identifying Planetary Names in Astronomy Papers: A Multi-Step Approach
Authors:
Golnaz Shapurian,
Michael J Kurtz,
Alberto Accomazzi
Abstract:
The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Na…
▽ More
The automatic identification of planetary feature names in astronomy publications presents numerous challenges. These features include craters, defined as roughly circular depressions resulting from impact or volcanic activity; dorsas, which are elongate raised structures or wrinkle ridges; and lacus, small irregular patches of dark, smooth material on the Moon, referred to as "lake" (Planetary Names Working Group, n.d.). Many feature names overlap with places or people's names that they are named after, for example, Syria, Tempe, Einstein, and Sagan, to name a few (U.S. Geological Survey, n.d.). Some feature names have been used in many contexts, for instance, Apollo, which can refer to mission, program, sample, astronaut, seismic, seismometers, core, era, data, collection, instrument, and station, in addition to the crater on the Moon. Some feature names can appear in the text as adjectives, like the lunar craters Black, Green, and White. Some feature names in other contexts serve as directions, like craters West and South on the Moon. Additionally, some features share identical names across different celestial bodies, requiring disambiguation, such as the Adams crater, which exists on both the Moon and Mars. We present a multi-step pipeline combining rule-based filtering, statistical relevance analysis, part-of-speech (POS) tagging, named entity recognition (NER) model, hybrid keyword harvesting, knowledge graph (KG) matching, and inference with a locally installed large language model (LLM) to reliably identify planetary names despite these challenges. When evaluated on a dataset of astronomy papers from the Astrophysics Data System (ADS), this methodology achieves an F1-score over 0.97 in disambiguating planetary feature names.
△ Less
Submitted 17 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
The Future of Astronomical Data Infrastructure: Meeting Report
Authors:
Michael R. Blanton,
Janet D. Evans,
Dara Norman,
William O'Mullane,
Adrian Price-Whelan,
Luca Rizzi,
Alberto Accomazzi,
Megan Ansdell,
Stephen Bailey,
Paul Barrett,
Steven Berukoff,
Adam Bolton,
Julian Borrill,
Kelle Cruz,
Julianne Dalcanton,
Vandana Desai,
Gregory P. Dubois-Felsmann,
Frossie Economou,
Henry Ferguson,
Bryan Field,
Dan Foreman-Mackey,
Jaime Forero-Romero,
Niall Gaffney,
Kim Gillies,
Matthew J. Graham
, et al. (47 additional authors not shown)
Abstract:
The astronomical community is grappling with the increasing volume and complexity of data produced by modern telescopes, due to difficulties in reducing, accessing, analyzing, and combining archives of data. To address this challenge, we propose the establishment of a coordinating body, an "entity," with the specific mission of enhancing the interoperability, archiving, distribution, and productio…
▽ More
The astronomical community is grappling with the increasing volume and complexity of data produced by modern telescopes, due to difficulties in reducing, accessing, analyzing, and combining archives of data. To address this challenge, we propose the establishment of a coordinating body, an "entity," with the specific mission of enhancing the interoperability, archiving, distribution, and production of both astronomical data and software. This report is the culmination of a workshop held in February 2023 on the Future of Astronomical Data Infrastructure. Attended by 70 scientists and software professionals from ground-based and space-based missions and archives spanning the entire spectrum of astronomical research, the group deliberated on the prevailing state of software and data infrastructure in astronomy, identified pressing issues, and explored potential solutions. In this report, we describe the ecosystem of astronomical data, its existing flaws, and the many gaps, duplication, inconsistencies, barriers to access, drags on productivity, missed opportunities, and risks to the long-term integrity of essential data sets. We also highlight the successes and failures in a set of deep dives into several different illustrative components of the ecosystem, included as an appendix.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
AstroLLaMA: Towards Specialized Foundation Models in Astronomy
Authors:
Tuan Dung Nguyen,
Yuan-Sen Ting,
Ioana Ciucă,
Charlie O'Neill,
Ze-Chang Sun,
Maja Jabłońska,
Sandor Kruk,
Ernest Perkowski,
Jack Miller,
Jason Li,
Josh Peek,
Kartheik Iyer,
Tomasz Różański,
Pranav Khetarpal,
Sharaf Zaman,
David Brodrick,
Sergio J. Rodríguez Méndez,
Thang Bui,
Alyssa Goodman,
Alberto Accomazzi,
Jill Naiman,
Jesse Cranney,
Kevin Schawinski,
UniverseTBD
Abstract:
Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marke…
▽ More
Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astronomy abstracts from arXiv. Optimized for traditional causal language modeling, AstroLLaMA achieves a 30% lower perplexity than Llama-2, showing marked domain adaptation. Our model generates more insightful and scientifically relevant text completions and embedding extraction than state-of-the-arts foundation models despite having significantly fewer parameters. AstroLLaMA serves as a robust, domain-specific model with broad fine-tuning potential. Its public release aims to spur astronomy-focused research, including automatic paper summarization and conversational agent development.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Improving astroBERT using Semantic Textual Similarity
Authors:
Felix Grezes,
Thomas Allen,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Golnaz Shapurian,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Shinyi Chen,
Jennifer Koch,
Taylor Jacovich,
Pavlos Protopapas
Abstract:
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we:
- announce the first…
▽ More
The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we:
- announce the first public release of the astroBERT language model;
- show how astroBERT improves over existing public language models on astrophysics specific tasks;
- and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT.
△ Less
Submitted 29 November, 2022;
originally announced December 2022.
-
Web accessibility trends and implementation in dynamic web applications
Authors:
Timothy W. Hostetler,
Shinyi Chen,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Carolyn S. Grant,
Edwin Henneken,
Donna M. Thompson,
Roman Chyla,
Golnaz Shapurian,
Matthew R. Templeton,
Kelly E. Lockhart,
Nemanja Martinovic,
Stephen McDonald,
Felix Grezes
Abstract:
The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to pre…
▽ More
The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to present the information provided by the website. NASA ADS follows the official Web Content Accessibility Guidelines (WCAG) standard for ensuring accessibility of all its applications, striving to exceed this standard where possible. Through the use of both internal audits and external expert review based on these guidelines, we have identified many areas for improving accessibility in our current web application, and have implemented a number of updates to the UI as a result of this. We present an overview of some current web accessibility trends, discuss our experience incorporating these trends in our web application, and discuss the lessons learned and recommendations for future projects.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Building astroBERT, a language model for Astronomy & Astrophysics
Authors:
Felix Grezes,
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Golnaz Shapurian,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Stephen McDonald,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Nemanja Martinovic,
Shinyi Chen,
Chris Tanner,
Pavlos Protopapas
Abstract:
The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and…
▽ More
The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are developing our own named entity recognition tool. We present here our preliminary results and lessons learned.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Best Practices for Data Publication in the Astronomical Literature
Authors:
Tracy X. Chen,
Marion Schmitz,
Joseph M. Mazzarella,
Xiuqin Wu,
Julian C. van Eyken,
Alberto Accomazzi,
Rachel L. Akeson,
Mark Allen,
Rachael Beaton,
G. Bruce Berriman,
Andrew W. Boyle,
Marianne Brouty,
Ben Chan,
Jessie L. Christiansen,
David R. Ciardi,
David Cook,
Raffaele D'Abrusco,
Rick Ebert,
Cren Frayer,
Benjamin J. Fulton,
Christopher Gelino,
George Helou,
Calen B. Henderson,
Justin Howell,
Joyce Kim
, et al. (20 additional authors not shown)
Abstract:
We present an overview of best practices for publishing data in astronomy and astrophysics journals. These recommendations are intended as a reference for authors to help prepare and publish data in a way that will better represent and support science results, enable better data sharing, improve reproducibility, and enhance the reusability of data. Observance of these guidelines will also help to…
▽ More
We present an overview of best practices for publishing data in astronomy and astrophysics journals. These recommendations are intended as a reference for authors to help prepare and publish data in a way that will better represent and support science results, enable better data sharing, improve reproducibility, and enhance the reusability of data. Observance of these guidelines will also help to streamline the extraction, preservation, integration and cross-linking of valuable data from astrophysics literature into major astronomical databases, and consequently facilitate new modes of science discovery that will better exploit the vast quantities of panchromatic and multi-dimensional data associated with the literature. We encourage authors, journal editors, referees, and publishers to implement the best practices reviewed here, as well as related recommendations from international astronomical organizations such as the International Astronomical Union (IAU) for publication of nomenclature, data, and metadata. A convenient Checklist of Recommendations for Publishing Data in the Literature is included for authors to consult before the submission of the final version of their journal articles and associated data files. We recommend that publishers of journals in astronomy and astrophysics incorporate a link to this document in their Instructions to Authors.
△ Less
Submitted 16 April, 2022; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Enabling Synergy: Improving the Information Infrastructure for Planetary Science
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin A. Henneken
Abstract:
In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information…
▽ More
In this whitepaper we advocate that the Planetary Science (PS) community build a discipline-specific digital library, in collaboration with the existing astronomy digital library, ADS. We suggest that the PS data archives increase their level of curation to allow for direct linking between the archival data and the derived journal articles. And we suggest that a new component of the PS information infrastructure be created to collate and curate information on features and objects in our solar system, beginning with the USGS/IAU Gazetteer of Planetary Nomenclature.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Agile methodologies in teams with highly creative and autonomous members
Authors:
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Stephen McDonald,
Golnaz Shapurian,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Kris Bukovi
Abstract:
The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented c…
▽ More
The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented correctly and not because Scrum may have its own flaws. This grants irrefutability to the methodology, discouraging deviations to fit the actual needs and peculiarities of the developers. In particular, the members of the NASA ADS team are highly creative and autonomous whose motivation can be affected if their freedom is too strongly constrained. We present our experience following Agile principles, reusing certain Scrum elements and seeking the satisfaction of the team members, while rapidly reacting/keeping the project in line with our stakeholders expectations.
△ Less
Submitted 10 September, 2020;
originally announced September 2020.
-
Enabling Effective Exoplanet / Planetary Collaborative Science
Authors:
Mark S. Marley,
Chester Harman,
Heidi B. Hammel,
Paul Byrne,
Jonathan Fortney,
Alberto Accomazzi,
Sarah E. Moran,
Michael Way,
Jessie Christiansen,
Noam Izenberg,
Timothy Holt,
Sanaz Vahidinia,
Erika Kohler,
Karalee Brugman
Abstract:
The field of exoplanetary science has emerged over the past two decades, rising up alongside traditional solar system planetary science. Both fields focus on understanding the processes which form and sculpt planets through time, yet there has been less scientific exchange between the two communities than is ideal. This white paper explores some of the institutional and cultural barriers which imp…
▽ More
The field of exoplanetary science has emerged over the past two decades, rising up alongside traditional solar system planetary science. Both fields focus on understanding the processes which form and sculpt planets through time, yet there has been less scientific exchange between the two communities than is ideal. This white paper explores some of the institutional and cultural barriers which impede cross-discipline collaborations and suggests solutions that would foster greater collaboration. Some solutions require structural or policy changes within NASA itself, while others are directed towards other institutions, including academic publishers, that can also facilitate greater interdisciplinarity.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
Increasing the Discovery Space in Astrophysics - A Collation of Six Submitted White Papers
Authors:
G. Fabbiano,
M. Elvis,
A. Accomazzi,
G. B. Berriman,
N. Brickhouse,
S. Bose,
D. Carrera,
I. Chilingarian,
F. Civano,
B. Czerny,
R. D'Abrusco,
B. Diemer,
J. Drake,
R. Emami Meibody,
J. R. Farah,
G. G. Fazio,
E. Feigelson,
F. Fornasini,
Jay Gallagher,
J. Grindlay,
L. Hernquist,
D. J. James,
M. Karovska,
V. Kashyap,
D. -W. Kim
, et al. (24 additional authors not shown)
Abstract:
We write in response to the call from the 2020 Decadal Survey to submit white papers illustrating the most pressing scientific questions in astrophysics for the coming decade. We propose exploration as the central question for the Decadal Committee's discussions.The history of astronomy shows that paradigm changing discoveries are not driven by well formulated scientific questions, based on the kn…
▽ More
We write in response to the call from the 2020 Decadal Survey to submit white papers illustrating the most pressing scientific questions in astrophysics for the coming decade. We propose exploration as the central question for the Decadal Committee's discussions.The history of astronomy shows that paradigm changing discoveries are not driven by well formulated scientific questions, based on the knowledge of the time. They were instead the result of the increase in discovery space fostered by new telescopes and instruments. An additional tool for increasing the discovery space is provided by the analysis and mining of the increasingly larger amount of archival data available to astronomers. Revolutionary observing facilities, and the state of the art astronomy archives needed to support these facilities, will open up the universe to new discovery. Here we focus on exploration for compact objects and multi messenger science. This white paper includes science examples of the power of the discovery approach, encompassing all the areas of astrophysics covered by the 2020 Decadal Survey.
△ Less
Submitted 18 March, 2019; v1 submitted 15 March, 2019;
originally announced March 2019.
-
From Dark Energy to Exolife: Improving the Digital Information Infrastructure for Astrophysics
Authors:
Michael J. Kurtz,
Alberto Accomazzi
Abstract:
Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served b…
▽ More
Some of the most exciting and promising areas of Astronomy research today are found at the boundaries of the discipline: the search for Exoplanets and Multi-Messenger Astronomy. In order to achieve breakthroughs in these research fields over the next decade, innovation and expansion of the digital information infrastructure which supports this research is required. Astronomy has been well-served by the existence of an open, distributed network of data centers and archives. However, institutional barriers and differing research cultures have prevented cross-disciplinary collaborations, creating fragmented knowledge and stove-piped research activities. This must change in order for the broader community of scientists to work together and solve our most ambitious decadal challenges. Interdisciplinary inquiry is best supported by bringing researchers together at the information discovery level. In order to cross the traditional disciplinary silos we must allow scientists both to explore new ideas and to gain access to new data and knowledge. This is best enabled by providing discovery platforms which allow them to explore and connect different research threads in the literature, identify communities of experts, access and analyze the related published datasets, measurements and catalogs.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Fundamentals of effective cloud management for the new NASA Astrophysics Data System
Authors:
Sergi Blanco-Cuaresma,
Alberto Accomazzi,
Michael J. Kurtz,
Edwin Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Stephen McDonald,
Golnaz Shapurian,
Timothy W. Hostetler,
Matthew R. Templeton,
Kelly E. Lockhart,
Kris Bukovi,
Nathan Rapport
Abstract:
The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient s…
▽ More
The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient system if some fundamental questions are addressed. After having experimented with different AWS environments and deployment methods, we decided in December 2017 to go with Kubernetes as our container orchestration. Defining the best strategy to properly setup Kubernetes has shown to be challenging: automatic scaling services and load balancing traffic can lead to errors whose origin is difficult to identify, monitoring and logging the activity that happens across multiple layers for a single request needs to be carefully addressed, and the best workflow for a Continuous Integration and Delivery (CI/CD) system is not self-evident. We present here how we tackle these challenges and our plans for the future.
△ Less
Submitted 16 January, 2019;
originally announced January 2019.
-
Merging the Astrophysics and Planetary Science Information Systems
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin A. Henneken
Abstract:
Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the…
▽ More
Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the Astrophysics Data System (ADS). The information system for Planetary Science does not have a central component linking the literature with the observational and theoretical data. Here we propose that the Committee on an Exoplanet Science Strategy recommend that this linkage be built, with the ADS playing the role in Planetary Science which it already plays in Astrophysics. This will require additional resources for the ADS, and the Planetary Data System (PDS), as well as other international collaborators
△ Less
Submitted 9 March, 2018;
originally announced March 2018.
-
The Unified Astronomy Thesaurus: Semantic Metadata for Astronomy and Astrophysics
Authors:
Katie Frey,
Alberto Accomazzi
Abstract:
Several different controlled vocabularies have been developed and used by the astronomical community, each designed to serve a specific need and a specific group. The Unified Astronomy Thesaurus (UAT) attempts to provide a highly structured controlled vocabulary that will be relevant and useful across the entire discipline, regardless of content or platform. As two major use cases for the UAT incl…
▽ More
Several different controlled vocabularies have been developed and used by the astronomical community, each designed to serve a specific need and a specific group. The Unified Astronomy Thesaurus (UAT) attempts to provide a highly structured controlled vocabulary that will be relevant and useful across the entire discipline, regardless of content or platform. As two major use cases for the UAT include classifying articles and data, we examine the UAT in comparison with the Astronomical Subject Keywords used by major publications and the JWST Science Keywords used by STScI's Astronomer's Proposal Tool.
△ Less
Submitted 3 January, 2018;
originally announced January 2018.
-
New ADS Functionality for the Curator
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Steven McDonald,
Taylor J. Shaulis,
Sergi Blanco-Cuaresma,
Golnaz Shapurian,
Timothy W. Hostetler,
Matthew R. Templeton
Abstract:
In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community…
▽ More
In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community to support data and software citations, we discuss what steps the ADS is taking to provide the needed infrastructure in collaboration with publishers and data providers. A new API provides access to the ADS search interface, metrics, and libraries allowing users to programmatically automate discovery and curation tasks. The new ADS interface supports a greater integration of content and services with a variety of partners, including ORCID claiming, indexing of SIMBAD objects, and article graphics from a variety of publishers. Finally, we highlight how librarians can facilitate the ingest of gray literature that they curate into our system.
△ Less
Submitted 23 October, 2017;
originally announced October 2017.
-
NASA's Long-Term Astrophysics Data Archives
Authors:
L. M. Rebull,
V. Desai,
H. Teplitz,
S. Groom,
R. Akeson,
G. B. Berriman,
G. Helou,
D. Imel,
J. M. Mazzarella,
A. Accomazzi,
T. McGlynn,
A. Smale,
R. White
Abstract:
NASA regards data handling and archiving as an integral part of space missions, and has a strong track record of serving astrophysics data to the public, beginning with the the IRAS satellite in 1983. Archives enable a major science return on the significant investment required to develop a space mission. In fact, the presence and accessibility of an archive can more than double the number of pape…
▽ More
NASA regards data handling and archiving as an integral part of space missions, and has a strong track record of serving astrophysics data to the public, beginning with the the IRAS satellite in 1983. Archives enable a major science return on the significant investment required to develop a space mission. In fact, the presence and accessibility of an archive can more than double the number of papers resulting from the data. In order for the community to be able to use the data, they have to be able to find the data (ease of access) and interpret the data (ease of use). Funding of archival research (e.g., the ADAP program) is also important not only for making scientific progress, but also for encouraging authors to deliver data products back to the archives to be used in future studies. NASA has also enabled a robust system that can be maintained over the long term, through technical innovation and careful attention to resource allocation. This article provides a brief overview of some of NASA's major astrophysics archive systems, including IRSA, MAST, HEASARC, KOA, NED, the Exoplanet Archive, and ADS.
△ Less
Submitted 27 September, 2017;
originally announced September 2017.
-
Aggregation and Linking of Observational Metadata in the ADS
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Carolyn S. Grant,
Donna M. Thompson,
Roman Chyla,
Alexandra Holachek,
Jonathan Elliott
Abstract:
We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched b…
▽ More
We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched by ADS via the generation of citations and usage data, and through the aggregation of external resources from astronomy data archives and libraries. Important sources of such additional information are the metadata describing observing proposals and high level data products, which, once ingested in ADS, become easily discoverable and citeable by the science community. Bibliographic studies have shown that the integration of links between data archives and the ADS provides greater visibility to data products and increased citations to the literature associated with them.
△ Less
Submitted 28 January, 2016;
originally announced January 2016.
-
ADS: The Next Generation Search Platform
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Edwin A. Henneken,
Roman Chyla,
James Luker,
Carolyn S. Grant,
Donna M. Thompson,
Alexandra Holachek,
Rahul Dave,
Stephen S. Murray
Abstract:
Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse…
▽ More
Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse and index full-text documents for all the major publications in Physics and Astronomy as well as many smaller Astronomy journals and arXiv e-prints, for a total of over 3.5 million papers. Our citation coverage has doubled since 2010 and now consists of over 70 million citations. We are normalizing the affiliation information in our records and, in collaboration with the CfA library and NASA, we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform which affect all aspects of the system and its operations. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language which supports fielded, unfielded and functional searches. We are currently able to index acknowledgments, affiliations, citations, funding sources, and to the extent that these metadata are available to us they are now searchable under our new platform. The ADS private library system is being enhanced to support reading groups, collaborative editing of lists of papers, tagging, and a variety of privacy settings when managing one's paper collection. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/
△ Less
Submitted 13 March, 2015;
originally announced March 2015.
-
Computing and Using Metrics in the ADS
Authors:
Edwin A. Henneken,
Alberto Accomazzi,
Michael J. Kurtz,
Carolyn S. Grant,
Donna Thompson,
Jay Luker,
Roman Chyla,
Alexandra Holachek,
Stephen S. Murray
Abstract:
Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication…
▽ More
Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication metrics have even been used in at least one recent court case. To support this need for research impact indicators, the SAO/NASA Astrophysics Data System (ADS) has developed a service which provides a broad overview of various impact measures. In this presentation we discuss how the ADS can be used to quench the thirst for impact measures. We will also discuss a couple of the lesser known indicators in the metrics overview and the main issues to be aware of when compiling publication-based metrics in the ADS, namely author name ambiguity and citation incompleteness.
△ Less
Submitted 17 June, 2014;
originally announced June 2014.
-
The Unified Astronomy Thesaurus
Authors:
Alberto Accomazzi,
Norman Gray,
Chris Erdmann,
Chris Biemesderfer,
Katie Frey,
Justin Soles
Abstract:
The Unified Astronomy Thesaurus (UAT) is an open, interoperable and community-supported thesaurus which unifies the existing divergent and isolated Astronomy & Astrophysics vocabularies into a single high-quality, freely-available open thesaurus formalizing astronomical concepts and their inter-relationships. The UAT builds upon the existing IAU Thesaurus with major contributions from the astronom…
▽ More
The Unified Astronomy Thesaurus (UAT) is an open, interoperable and community-supported thesaurus which unifies the existing divergent and isolated Astronomy & Astrophysics vocabularies into a single high-quality, freely-available open thesaurus formalizing astronomical concepts and their inter-relationships. The UAT builds upon the existing IAU Thesaurus with major contributions from the astronomy portions of the thesauri developed by the Institute of Physics Publishing, the American Institute of Physics, and SPIE. We describe the effort behind the creation of the UAT and the process through which we plan to maintain the document updated through broad community participation.
△ Less
Submitted 26 March, 2014;
originally announced March 2014.
-
Astronomy and Computing: a New Journal for the Astronomical Computing Community
Authors:
Alberto Accomazzi,
Tamás Budavári,
Christopher Fluke,
Norman Gray,
Robert G Mann,
William O'Mullane,
Andreas Wicenec,
Michael Wise
Abstract:
We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In…
▽ More
We introduce \emph{Astronomy and Computing}, a new journal for the growing population of people working in the domain where astronomy overlaps with computer science and information technology. The journal aims to provide a new communication channel within that community, which is not well served by current journals, and to help secure recognition of its true importance within modern astronomy. In this inaugural editorial, we describe the rationale for creating the journal, outline its scope and ambitions, and seek input from the community in defining in detail how the journal should work towards its high-level goals.
△ Less
Submitted 30 October, 2012;
originally announced October 2012.
-
Telescope Bibliographies: an Essential Component of Archival Data Management and Operations
Authors:
Alberto Accomazzi,
Edwin Henneken,
Christopher Erdmann,
Arnold Rots
Abstract:
Assessing the impact of astronomical facilities rests upon an evaluation of the scientific discoveries which their data have enabled. Telescope bibliographies, which link data products with the literature, provide a way to use bibliometrics as an impact measure for the underlying data. In this paper we argue that the creation and maintenance of telescope bibliographies should be considered an inte…
▽ More
Assessing the impact of astronomical facilities rests upon an evaluation of the scientific discoveries which their data have enabled. Telescope bibliographies, which link data products with the literature, provide a way to use bibliometrics as an impact measure for the underlying data. In this paper we argue that the creation and maintenance of telescope bibliographies should be considered an integral part of an observatory's operations. We review the existing tools, services, and workflows which support these curation activities, giving an estimate of the effort and expertise required to maintain an archive-based telescope bibliography.
△ Less
Submitted 30 July, 2012; v1 submitted 27 June, 2012;
originally announced June 2012.
-
Why don't we already have an Integrated Framework for the Publication and Preservation of all Data Products?
Authors:
Alberto Accomazzi,
Sebastien Derriere,
Chris Biemesderfer,
Norman Gray
Abstract:
Astronomy has long had a working network of archives supporting the curation of publications and data. The discipline has already created many of the features which perplex other areas of science: (1) data repositories: (supra)national institutes, dedicated to large projects; a culture of user-contributed data; practical experience of long-term data preservation; (2) dataset identifiers: the commu…
▽ More
Astronomy has long had a working network of archives supporting the curation of publications and data. The discipline has already created many of the features which perplex other areas of science: (1) data repositories: (supra)national institutes, dedicated to large projects; a culture of user-contributed data; practical experience of long-term data preservation; (2) dataset identifiers: the community has already piloted experiments, knows what can undermine these efforts, and is participating in the development of next-generation standards; (3) citation of datasets in papers: the community has an innovative and expanding infrastructure for the curation of data and bibliographic resources, and through them a community of author s and editors familiar with such electronic publication efforts; as well, it has experimented with next-generation web standards (e.g. the Semantic Web); (4) publisher buy-in: publishers in this area have been willing to innovate within the constraints of their commercial imperatives. What can possibly be missing? Why don't we have an integrated framework for the publication and preservation of all data products already? Are there technical barriers? We don't believe so. Are there cultural or commercial forces inhibiting this? We aren't aware of any. This Birds of a Feather session (BoF) attempted to identify existing barriers to the creation of such a framework, and attempted to identify the parties or groups which can contribute to the creation of a VO-powered data-publishing framework.
△ Less
Submitted 7 December, 2011;
originally announced December 2011.
-
Linking to Data - Effect on Citation Rates in Astronomy
Authors:
Edwin A. Henneken,
Alberto Accomazzi
Abstract:
Is there a difference in citation rates between articles that were published with links to data and articles that were not? Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of furthering science. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. However, li…
▽ More
Is there a difference in citation rates between articles that were published with links to data and articles that were not? Besides being interesting from a purely academic point of view, this question is also highly relevant for the process of furthering science. Data sharing not only helps the process of verification of claims, but also the discovery of new findings in archival data. However, linking to data still is a far cry away from being a "practice", especially where it comes to authors providing these links during the writing and submission process. You need to have both a willingness and a publication mechanism in order to create such a practice. Showing that articles with links to data get higher citation rates might increase the willingness of scientists to take the extra steps of linking data sources to their publications. In this presentation we will show this is indeed the case: articles with links to data result in higher citation rates than articles without such links. The ADS is funded by NASA Grant NNX09AB39G.
△ Less
Submitted 15 November, 2011;
originally announced November 2011.
-
The ADS in the Information Age - Impact on Discovery
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Alberto Accomazzi
Abstract:
The SAO/NASA Astrophysics Data System (ADS) grew up with and has been riding the waves of the Information Age, closely monitoring and anticipating the needs of its end-users. By now, all professional astronomers are using the ADS on a daily basis, and a substantial fraction have been using it for their entire professional career. In addition to being an indispensable tool for professional scientis…
▽ More
The SAO/NASA Astrophysics Data System (ADS) grew up with and has been riding the waves of the Information Age, closely monitoring and anticipating the needs of its end-users. By now, all professional astronomers are using the ADS on a daily basis, and a substantial fraction have been using it for their entire professional career. In addition to being an indispensable tool for professional scientists, the ADS also moved into the public domain, as a tool for science education. In this paper we will highlight and discuss some aspects indicative of the impact the ADS has had on research and the access to scholarly publications.
The ADS is funded by NASA Grant NNX09AB39G
△ Less
Submitted 28 June, 2011;
originally announced June 2011.
-
Semantic Interlinking of Resources in the Virtual Observatory Era
Authors:
Alberto Accomazzi,
Rahul Dave
Abstract:
In the coming era of data-intensive science, it will be increasingly important to be able to seamlessly move between scientific results, the data analyzed in them, and the processes used to produce them. As observations, derived data products, publications, and object metadata are curated by different projects and archived in different locations, establishing the proper linkages between these reso…
▽ More
In the coming era of data-intensive science, it will be increasingly important to be able to seamlessly move between scientific results, the data analyzed in them, and the processes used to produce them. As observations, derived data products, publications, and object metadata are curated by different projects and archived in different locations, establishing the proper linkages between these resources and describing their relationships becomes an essential activity in their curation and preservation. In this paper we describe initial efforts to create a semantic knowledge base allowing easier integration and linking of the body of heterogeneous astronomical resources which we call the Virtual Observatory (VO). The ultimate goal of this effort is the creation of a semantic layer over existing resources, allowing applications to cross boundaries between archives. The proposed approach follows the current best practices in Semantic Computing and the architecture of the web, allowing the use of off-the-shelf technologies and providing a path for VO resources to become part of the global web of linked data.
△ Less
Submitted 30 March, 2011;
originally announced March 2011.
-
Linking Literature and Data: Status Report and Future Efforts
Authors:
Alberto Accomazzi
Abstract:
In the current era of data-intensive science, it is increasingly important for researchers to be able to have access to published results, the supporting data, and the processes used to produce them. Six years ago, recognizing this need, the American Astronomical Society and the Astrophysics Data Centers Executive Committee (ADEC) sponsored an effort to facilitate the annotation and linking of dat…
▽ More
In the current era of data-intensive science, it is increasingly important for researchers to be able to have access to published results, the supporting data, and the processes used to produce them. Six years ago, recognizing this need, the American Astronomical Society and the Astrophysics Data Centers Executive Committee (ADEC) sponsored an effort to facilitate the annotation and linking of datasets during the publishing process, with limited success. I will review the status of this effort and describe a new, more general one now being considered in the context of the Virtual Astronomical Observatory.
△ Less
Submitted 22 March, 2011;
originally announced March 2011.
-
Astronomy 3.0 Style
Authors:
Alberto Accomazzi
Abstract:
Over the next decade we will witness the development of a new infrastructure in support of data-intensive scientific research, which includes Astronomy. This new networked environment will offer both challenges and opportunities to our community and has the potential to transform the way data are described, curated and preserved. Based on the lessons learned during the development and management o…
▽ More
Over the next decade we will witness the development of a new infrastructure in support of data-intensive scientific research, which includes Astronomy. This new networked environment will offer both challenges and opportunities to our community and has the potential to transform the way data are described, curated and preserved. Based on the lessons learned during the development and management of the ADS, a case is made for adopting the emerging technologies and practices of the Semantic Web to support the way Astronomy research will be conducted. Examples of how small, incremental steps can, in the aggregate, make a significant difference in the provision and repurposing of astronomical data are provided.
△ Less
Submitted 3 June, 2010;
originally announced June 2010.
-
Towards a Resource-Centric Data Network for Astronomy
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
Over the past decade, astronomers have been using an increasingly larger number of web-based applications and archives to conduct their research. However, despite the early success in creating links across projects and data centers, the promise of a single integrated digital library environment supporting e-science in astronomy has proven elusive. While some of the issues hampering progress in t…
▽ More
Over the past decade, astronomers have been using an increasingly larger number of web-based applications and archives to conduct their research. However, despite the early success in creating links across projects and data centers, the promise of a single integrated digital library environment supporting e-science in astronomy has proven elusive. While some of the issues hampering progress in this area are of technical nature, others are rooted in existing policies which should be re-analyzed if further rapid progress is to be made in this area. This paper describes a proposal that the NASA Astrophysics Data System project has put forth in order to improve its role as one of the primary discovery portals for astronomers, focusing on those aspects which could benefit from an increased level of involvement from the community, namely the effort to expose astronomy resources as linked data, and the harvesting of observational metadata.
△ Less
Submitted 11 May, 2010;
originally announced May 2010.
-
Using Multipartite Graphs for Recommendation and Discovery
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin Henneken,
Giovanni Di Milia,
Carolyn S. Grant
Abstract:
The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user.
The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
The Smithsonian/NASA Astrophysics Data System (ADS) Decennial Report
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Stephen S. Murray
Abstract:
Eight years after the ADS first appeared the last decadal survey wrote: "NASA's initiative for the Astrophysics Data System has vastly increased the accessibility of the scientific literature for astronomers. NASA deserves credit for this valuable initiative and is urged to continue it." Here we summarize some of the changes concerning the ADS which have occurred in the past ten years, and we de…
▽ More
Eight years after the ADS first appeared the last decadal survey wrote: "NASA's initiative for the Astrophysics Data System has vastly increased the accessibility of the scientific literature for astronomers. NASA deserves credit for this valuable initiative and is urged to continue it." Here we summarize some of the changes concerning the ADS which have occurred in the past ten years, and we describe the current status of the ADS. We then point out two areas where the ADS is building an improved capability which could benefit from a policy statement of support in the ASTRO2010 report. These are: The Semantic Interlinking of Astronomy Observations and Datasets and The Indexing of the Full Text of Astronomy Research Publications.
△ Less
Submitted 18 March, 2009;
originally announced March 2009.
-
Use of Astronomical Literature - A Report on Usage Patterns
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways…
▽ More
In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways. Finally, we show how use of the ADS has evolved over the years in various regions of the world.
The ADS is funded by NASA Grant NNG06GG68G.
△ Less
Submitted 3 October, 2008; v1 submitted 1 August, 2008;
originally announced August 2008.
-
Finding Astronomical Communities Through Co-readership Analysis
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the r…
▽ More
Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the relationship ``person i and person j have read the same paper''. Using the data logs of the NASA/Smithsonian Astrophysics Data System (ADS), we first determine the population that will participate by requiring that a user queries the ADS at a certain rate. Next, we apply the relationship to this population. The result of this will be an abstract ``relationship space'', which we will describe in terms of various ``representations''. Examples of such ``representations'' are the projection of co-read vectors onto Principal Components and the spectral density of the co-read network. We will show that the co-read relationship results in structure, we will describe this structure and we will provide a first attempt in the classification of this structure in terms of astronomical communities.
The ADS is funded by NASA Grant NNG06GG68G.
△ Less
Submitted 5 January, 2007;
originally announced January 2007.
-
Closing the loop: Linking Datasets to Publications and Back
Authors:
Alberto Accomazzi,
Guenther Eichhorn,
Arnold Rots
Abstract:
With the mainstream adoption of references to datasets in astronomical manuscripts, researchers today are able to provide direct links from their papers to the original data that were used in their study. Following a process similar to the verification of references in manuscripts, publishers have been working with the NASA Astrophysics Data System (ADS) to validate and maintain links to these d…
▽ More
With the mainstream adoption of references to datasets in astronomical manuscripts, researchers today are able to provide direct links from their papers to the original data that were used in their study. Following a process similar to the verification of references in manuscripts, publishers have been working with the NASA Astrophysics Data System (ADS) to validate and maintain links to these datasets.
Similarly, many astronomical data centers have been tracking publications based on the observations that they archive, and have been working with the ADS to maintain links between their datasets and the bibliographic records in question. In addition to providing a valuable service to ADS users, maintaining these correlations allows the data centers to evaluate the scientific impact of their missions.
Until recently, these two activities have evolved in parallel on independent tracks, with ADS playing a central role in bridging the connection between publishers and data centers. However, the ADS is now implementing the capability for all parties involved to find out which data links have been published with which manuscripts, and vice versa. This will allow data centers to periodically harvest the ADS to find out if there are new papers which reference datasets available in their archives. In this paper we summarize the state of the dataset linking project and describe the new harvesting interface.
△ Less
Submitted 17 November, 2006;
originally announced November 2006.
-
Creation and use of Citations in the ADS
Authors:
Alberto Accomazzi,
Gunther Eichhorn,
Michael J. Kurtz,
Carolyn S. Grant,
Edwin Henneken,
Markus Demleitner,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data u…
▽ More
With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data used in the creation of citations, the policies and procedures that we follow to avoid double-counting and to eliminate contributions which may not be scholarly in nature. Finally, we describe how users and institutions can easily obtain quantitative citation data from the ADS, both interactively and via web-based programming tools.
The ADS is available at http://ads.harvard.edu.
△ Less
Submitted 3 October, 2006;
originally announced October 2006.
-
Connectivity in the Astronomy Digital Library
Authors:
Günther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Edwin A. Henneken,
Donna M. Thompson,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
The Astrophysics Data System (ADS) provides an extensive system of links between the literature and other on-line information. Recently, the journals of the American Astronomical Society (AAS) and a group of NASA data centers have collaborated to provide more links between on-line data obtained by space missions and the on-line journals. Authors can now specify which data sets they have used in…
▽ More
The Astrophysics Data System (ADS) provides an extensive system of links between the literature and other on-line information. Recently, the journals of the American Astronomical Society (AAS) and a group of NASA data centers have collaborated to provide more links between on-line data obtained by space missions and the on-line journals. Authors can now specify which data sets they have used in their article. This information is used by the participants to provide the links between the literature and the data.
The ADS is available at: http://ads.harvard.edu
△ Less
Submitted 2 October, 2006;
originally announced October 2006.
-
Full Text Searching in the Astrophysics Data System
Authors:
Günther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Edwin A. Henneken,
Donna M. Thompson,
Michael J. Kurtz,
Stephen S. Murray
Abstract:
The Smithsonian/NASA Astrophysics Data System (ADS) provides a search system for the astronomy and physics scholarly literature. All major and many smaller astronomy journals that were published on paper have been scanned back to volume 1 and are available through the ADS free of charge. All scanned pages have been converted to text and can be searched through the ADS Full Text Search System. In…
▽ More
The Smithsonian/NASA Astrophysics Data System (ADS) provides a search system for the astronomy and physics scholarly literature. All major and many smaller astronomy journals that were published on paper have been scanned back to volume 1 and are available through the ADS free of charge. All scanned pages have been converted to text and can be searched through the ADS Full Text Search System. In addition, searches can be fanned out to several external search systems to include the literature published in electronic form. Results from the different search systems are combined into one results list.
The ADS Full Text Search System is available at: http://adsabs.harvard.edu/fulltext_service.html
△ Less
Submitted 5 October, 2006; v1 submitted 2 October, 2006;
originally announced October 2006.
-
E-prints and Journal Articles in Astronomy: a Productive Co-existence
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Simeon Warner,
Paul Ginsparg,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data Sy…
▽ More
Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data System drops to zero. This suggests that the majority of astronomers have access to institutional subscriptions and that they choose to read the journal article when given the choice. Within the NASA Astrophysics Data System they are given this choice, because the e-print and the journal article are treated equally, since both are just one click away. In other words, the e-prints have not undermined journal use in the astrophysics community and thus currently do not pose a financial threat to the publishers. We present readership data for the arXiv category "astro-ph" and the 4 core journals in astronomy (Astrophysical Journal, Astronomical Journal, Monthly Notices of the Royal Astronomical Society and Astronomy & Astrophysics). Furthermore, we show that the half-life (the point where the use of an article drops to half the use of a newly published article) for an e-print is shorter than for a journal paper.
The ADS is funded by NASA Grant NNG06GG68G. arXiv receives funding from NSF award #0404553
△ Less
Submitted 22 September, 2006;
originally announced September 2006.
-
The Future of Technical Libraries
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Edwin Henneken,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
Technical libraries are currently experiencing very rapid change. In the near future their mission will change, their physical nature will change, and the skills of their employees will change. While some will not be able to make these changes, and will fail, others will lead us into a new era.
Technical libraries are currently experiencing very rapid change. In the near future their mission will change, their physical nature will change, and the skills of their employees will change. While some will not be able to make these changes, and will fail, others will lead us into a new era.
△ Less
Submitted 28 September, 2006;
originally announced September 2006.
-
myADS-arXiv - a Tailor-Made, Open Access, Virtual Journal
Authors:
E. Henneken,
M. J. Kurtz,
G. Eichhorn,
A. Accomazzi,
C. S. Grant,
D. Thompson,
E. Bohlen,
S. S. Murray
Abstract:
The myADS-arXiv service provides the scientific community with a one stop shop for staying up-to-date with a researcher's field of interest. The service provides a powerful and unique filter on the enormous amount of bibliographic information added to the ADS on a daily basis. It also provides a complete view with the most relevant papers available in the subscriber's field of interest. With thi…
▽ More
The myADS-arXiv service provides the scientific community with a one stop shop for staying up-to-date with a researcher's field of interest. The service provides a powerful and unique filter on the enormous amount of bibliographic information added to the ADS on a daily basis. It also provides a complete view with the most relevant papers available in the subscriber's field of interest. With this service, the subscriber will get to know the lastest developments, popular trends and the most important papers. This makes the service not only unique from a technical point of view, but also from a content point of view. On this poster we will argue why myADS-arXiv is a tailor-made, open access, virtual journal and we will illustrate its unique character.
△ Less
Submitted 4 August, 2006;
originally announced August 2006.
-
Effect of E-printing on Citation Rates in Astronomy and Physics
Authors:
Edwin A. Henneken,
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Donna Thompson,
Stephen S. Murray
Abstract:
In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsoni…
▽ More
In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsonian Astrophysics Data System (ADS; Kurtz et al., 1993, 2000), we confirm the findings from other studies, we examine the average citation rate to e-printed papers in the Astrophysical Journal, and we show that for a number of major astronomy and physics journals the most important papers are submitted to the arXiv e-print repository first.
△ Less
Submitted 5 June, 2006; v1 submitted 13 April, 2006;
originally announced April 2006.
-
Intelligent Information Retrieval
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Edwin Henneken,
Stephen S. Murray
Abstract:
Since it was first announced at ADASS 2 the Smithsonian/NASA Astrophysics System Abstract Service (ADS) has played a central role in the information seeking behavior of astronomers. Central to the ability of the ADS to act as a search and discovery tool is its role as metadata agregator. Over the past 13 years the ADS has introduced many new techniques to facilitate information retrieval, broadl…
▽ More
Since it was first announced at ADASS 2 the Smithsonian/NASA Astrophysics System Abstract Service (ADS) has played a central role in the information seeking behavior of astronomers. Central to the ability of the ADS to act as a search and discovery tool is its role as metadata agregator. Over the past 13 years the ADS has introduced many new techniques to facilitate information retrieval, broadly defined. We discuss some of these developments; with particular attention to how the ADS might interact with the virtual observatory, and to the new myADS-arXiv customized open access virtual journal.
The ADS is at http://ads.harvard.edu
△ Less
Submitted 31 October, 2005;
originally announced October 2005.
-
The NASA Astrophysics Data System: Architecture
Authors:
A. Accomazzi,
G. Eichhorn,
M. J. Kurtz,
C. S. Grant,
S. S. Murray
Abstract:
The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliograp…
▽ More
The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself.
The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms.
The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.edu
△ Less
Submitted 4 February, 2000;
originally announced February 2000.
-
The NASA Astrophysics Data System: Overview
Authors:
M. J. Kurtz,
G. Eichhorn,
A. Accomazzi,
C. Grant,
S. S. Murray,
J. M. Watson
Abstract:
The NASA Astrophysics Data System Abstract Service has become a key component of astronomical research. It provides bibliographic information daily, or near daily, to a majority of astronomical researchers worldwide.
We describe the history of the development of the system and its current status.
We show several examples of how to use the ADS, and we show how ADS use has increased as a funct…
▽ More
The NASA Astrophysics Data System Abstract Service has become a key component of astronomical research. It provides bibliographic information daily, or near daily, to a majority of astronomical researchers worldwide.
We describe the history of the development of the system and its current status.
We show several examples of how to use the ADS, and we show how ADS use has increased as a function of time. Currently it is still increasing exponentially, with a doubling time for number of queries of 17 months.
Using the ADS logs we make the first detailed model of how scientific journals are read as a function of time since publication.
The impact of the ADS on astronomy can be calculated after making some simple assumptions. We find that the ADS increases the efficiency of astronomical research by 333 Full Time Equivalent (2000 hour) research years per year, and that the value of the early development of the ADS for astronomy, compared with waiting for mature technologies to be adopted, is 2332 FTE research years.
The ADS is available at http://adswww.harvard.edu/.
△ Less
Submitted 4 February, 2000;
originally announced February 2000.
-
The NASA Astrophysics Data System: Data Holdings
Authors:
C. Grant,
A. Accomazzi,
G. Eichhorn,
M. J. Kurtz,
S. S. Murray
Abstract:
Since its inception in 1993, the ADS Abstract Service has become an indispensable research tool for astronomers and astrophysicists worldwide. In those seven years, much effort has been directed toward improving both the quantity and the quality of references in the database. From the original database of approximately 160,000 astronomy abstracts, our dataset has grown almost tenfold to approxim…
▽ More
Since its inception in 1993, the ADS Abstract Service has become an indispensable research tool for astronomers and astrophysicists worldwide. In those seven years, much effort has been directed toward improving both the quantity and the quality of references in the database. From the original database of approximately 160,000 astronomy abstracts, our dataset has grown almost tenfold to approximately 1.5 million references covering astronomy, astrophysics, planetary sciences, physics, optics, and engineering. We collect and standardize data from approximately 200 journals and present the resulting information in a uniform, coherent manner. With the cooperation of journal publishers worldwide, we have been able to place scans of full journal articles on-line back to the first volumes of many astronomical journals, and we are able to link to current version of articles, abstracts, and datasets for essentially all of the current astronomy literature. The trend toward electronic publishing in the field, the use of electronic submission of abstracts for journal articles and conference proceedings, and the increasingly prominent use of the World Wide Web to disseminate information have enabled the ADS to build a database unparalleled in other disciplines.
The ADS can be accessed at http://adswww.harvard.edu
△ Less
Submitted 4 February, 2000;
originally announced February 2000.
-
The NASA Astrophysics Data System: The Search Engine and its User Interface
Authors:
G. Eichhorn,
M. J. Kurtz,
A. Accomazzi,
C. S. Grant,
S. S. Murray
Abstract:
The ADS Abstract and Article Services provide access to the astronomical literature through the World Wide Web (WWW). The forms based user interface provides access to sophisticated searching capabilities that allow our users to find references in the fields of Astronomy, Physics/Geophysics, and astronomical Instrumentation and Engineering. The returned information includes links to other on-lin…
▽ More
The ADS Abstract and Article Services provide access to the astronomical literature through the World Wide Web (WWW). The forms based user interface provides access to sophisticated searching capabilities that allow our users to find references in the fields of Astronomy, Physics/Geophysics, and astronomical Instrumentation and Engineering. The returned information includes links to other on-line information sources, creating an extensive astronomical digital library. Other interfaces to the ADS databases provide direct access to the ADS data to allow developers of other data systems to integrate our data into their system.
The search engine is a custom-built software system that is specifically tailored to search astronomical references. It includes an extensive synonym list that contains discipline specific knowledge about search term equivalences.
Search request logs show the usage pattern of the various search system capabilities. Access logs show the world-wide distribution of ADS users.
The ADS can be accessed at http://adswww.harvard.edu
△ Less
Submitted 4 February, 2000;
originally announced February 2000.