Advancing Ontology Development and Use in the Behavioral and Social Sciences

This blog is co-authored with Dr. Janine Simmons, Chief, Individual Behavioral Processes Branch, National Institute on Aging

In this month’s blog, we dive into the case for ontology building and use in the behavioral and social sciences and describe some NIH activities in this area. OBSSR’s Strategic Plan emphasizes the need to “Enhance and promote the research infrastructure, methods, and measures needed to support a more cumulative and integrated approach to behavioral and social sciences research” and calls out ontologies as one way forward. As noted in the January blog, OBSSR’s priorities continue to include building a cumulative knowledge base across the behavioral and social sciences. This includes facilitating better integration of BSSR into biomedical disciplines through the development and use of ontologies.

What is an ontology? At a practical (rather than philosophical level), an ontology is “… a systematic method for articulating a ‘‘controlled vocabulary’’ of agreed-upon terms and their inter-relationships. It involves three core elements: (1) a controlled vocabulary specifying and defining existing classes; (2) specification of the inter-relationships between classes; and (3) codification in a computer-readable format to enable knowledge generation, organization, reuse, integration, and analysis” (Larsen et al., 2016).

How can ontology development be useful for the behavioral sciences? OBSSR has taken on the multi-year challenge of ontology building for three primary reasons: (1) Rigorous and reproducible behavioral science requires clear and consistent definitions of social and behavioral phenotypes, outcomes, and intervention components to allow effective communication between scientists and across scientific disciplines; 2) Understanding the biological, psychological, and social mechanisms underlying any given behavior requires compatible levels of precision in measurement of these factors; 3) Identifying targets for behavioral interventions requires the ability to measure and describe a wide range of social and behavioral factors and outcomes across disciplines and diseases.

Additionally, the biomedical sciences increasingly use controlled vocabularies, taxonomies, and ontologies to reflect and represent the current state of knowledge. In fact, the National Library of Medicine is home to: the Medical Subject Headings (MeSH) thesaurus, which is used to index, catalog, and search biomedical and health-related information in MEDLINE/PubMed; the U.S. edition of the Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT®), a clinical terminology used as the standard for electronic health information exchange; and the Unified Medical Language System (UMLS) meta-thesaurus that integrates over 200 commonly used medical vocabularies. The National Human Genome Research Institute (NHGRI) has been funding the now-indispensable Gene Ontology for 17 years (grant number HG002273).

If we search within these critical biomedical information repositories, how much behavioral and social science research (BSSR) information has been included? How up-to-date are the terms and definitions that are included? The answers, respectively, are shockingly little and surprisingly dated. If we want to move BSSR fields forward and if we want to ensure that behavioral and social science are more fully integrated into the larger biomedical enterprise, we must ensure that our terminology is more fully and accurately represented. Not only do the ontologies we build need to be linkable across BSSR domains, but they also need to be integrated with existing biomedical ontologies.

What are the challenges we face? As with all biomedical research, BSSR findings have exploded over the past 25 years, and the challenge of assessing the quality of the evidence is exacerbated by its quantity and complexity. In the behavioral sciences, we are also faced with some relatively unique challenges. Both within and across sub-fields, challenges include professional incentives for investigators to create new theories, the resultant proliferation of separate theories, constructs, and measures, and a significant jingle/jangle problem. That is, constructs with the same name may actually be different (jingle), constructs with different names may actually be the same (jangle), or most often, the relationships between constructs and terms are difficult to disentangle. For example, in some fields, the terms self-regulation, emotion regulation, and cognitive control may be used interchangeably; in other areas, distinctions between these constructs may be critical.

How has OBSSR taken on these challenges so far? In 2016, the BSSR-Coordinating Committee established a Behavioral Ontology Development working group with members from OBSSR, NCI, NIA, NIMH, and NLM. First, with the NIH Office of Portfolio Analysis, we explored the possibility of building taxonomies by applying natural language processing tools to the corpus of BSSR literature using key terminology searches within a given domain (e.g., self-regulation). We have found that this bottom-up, relatively unsupervised approach is useful but cannot, in and of itself, refine and define terminology structures. Subject matter experts (SMEs) must be included at each iteration for the eventual ontology to have real utility for end users (researchers).

Therefore, with the goals of more fully engaging SMEs in the consideration of terminology and potential consensus-building for controlled vocabularies, we began to integrate calls for ontology development into Funding Opportunity Announcements (FOAs). These included FOAs focused on Social Connection and Isolation (PAR-19-373, PAR-19-384), Emotional Well Being (RFA-AT-20-003), and the Science of Behavior Change (SOBC) (RFA-AG-20-024). We have also engaged the National Academies of Science, Engineering and Medicine, along with multiple other sponsors, to complete a Consensus Study on “Accelerating Behavioral Science Through Ontology Development and Use.” Their work began in March 2021, and a full report is expected later this year. We are hopeful that this report will help inform next steps for NIH to consider in moving this area forward.

Additionally, OBSSR has initiated a project to enhance the integration of up-to-date BSSR terms into MeSH, which, as noted above, serves as a critical resource for cumulative and integrated biomedical knowledge extraction and reuse. Through a contract with Lexical Intelligence and with teams of NIH-based SME’s, we have identified and prioritized missing terms, sourced standard definitions and synonyms, and proposed additions to the MeSH database. As of its latest 2021 update, MeSH now includes 261 additional terms/synonyms within the domain of attention, learning, and memory and 46 for social determinants of health (SDoH). Thirty more SDoH terms will be added in 2022 and 2023. Later this year, we plan to request that MeSH add mental health terminology that will align with the National Institute of Metal Health’s Research Domain Criteria (RDoC) initiative, as well as a list of more than 1000 names for ethnic and racial groups within and outside of the U.S. Currently, we are also working on approaches to evaluate the effect of these terminology additions on the quality and quantity of literature retrieval.

These efforts, serve as an important base for future efforts and enhanced engagement in ontology development and use in BSSR. Stay tuned for further updates about our progress and new areas of ontology development we are pursuing.