Secondary data for mental health research: A Primer

Secondary data are a valuable resource for researchers, broadly speaking, and offer particular value in the field of mental health. Yet, they are often underutilized. Here, we offer a primer on what secondary data is, the value these sources offer, and address some misconceptions about their utility.

Secondary data: What is it?

Within the Data & Design Core, our goal is to connect researchers to valuable secondary sources of mental health data, and increase the use of such data sources in depression and mental health research. But, what do we mean by secondary data? Secondary data refers to data that is collected by someone other than the researcher, often on an ongoing basis, and often with the goal of encouraging broad use by multiple research teams to answer many different questions. It differs from primary data collection, which is collected directly by the research team to answer a more specific, narrow research question.

While there are several forms of secondary data, our team primarily works with: large, population-based datasets, that are nationally representative and/or longitudinal in nature (such as the Health and Retirement Study, the National Survey on Drug Use and Health and additional examples on our website), as well as clinical, administrative data (such as Michigan Medicine’s DataDirect). We often use the terms “secondary data” and “existing data” interchangeably.

Why bother?

Secondary data may be overlooked by researchers who don’t have hands-on experience, often seeming too complicated or time-consuming. The truth is, utilizing secondary data drastically cuts down the costs and time involved with primary data collection. It is not unusual for primary data collection to take five years or longer, including time spent securing funding; secondary data projects can often be completed within one to two years or less, depending on scope. In particular, secondary data are especially valuable because:

  • The work of collecting data has already been done! This eliminates several years of work and significant costs from a project’s timeline and budget.
  • Many data sources are easily accessible and downloadable online for no cost
  • There is a huge breadth & depth of secondary data available for mental health research on a range of diverse topics, including other mental health co-morbidities, physical health co-morbidities, social determinants of health, disease prevention, health across the lifespan, among others. Explore available data by topic using the filters on our website.
  • Most secondary data sources have very robust sample sizes, into the tens or hundreds of thousands of participants and more.
  • As many secondary data sources are nationally representative and/or longitudinal (or both) in nature, they allow the researcher to gain insight into national trends and/or longitudinal trends that is often not possible in primary data collection
  • Working with secondary data does not typically require full IRB review or newly-required data sharing plans, reducing start-up time
  • Due to the reduced cost and time burdens, secondary data offers a lower-risk to test preliminary hypotheses, and identify areas of need for additional research
  • Secondary data are particularly valuable for trainees and early-career faculty, who often face many obstacles in getting research work completed, including limited funding, protected time, bandwidth, research staff, collaborators and others.

Dispelling misconceptions

Despite the value that secondary data offers to researchers, it is underutilized, especially in the mental health field, and that may be due in part to some commonly held misconceptions. These might include:

  • Secondary data is too complicated to figure out: Oftentimes, researchers who have not used secondary data before may feel overwhelmed or intimidated by the prospect. While some data sources vary in their readiness and ease of use, there are many high-quality sources of data that have excellent documentation and are very user-friendly. If you have questions about which data sources to use, or the data cleaning process, we encourage you to contact our team.
  • Secondary data is easy: On the other hand, some may consider secondary data taking the “easy way out” or not “real” research; while secondary data does certainly reduce many of the barriers and challenges related to original data collection, it still does require skill and knowledge to utilize.
  • You can’t build an academic career without collecting original data: Most researchers would say that you collect original data in order to have a successful academic career, due to the need to get funding and publish. Some may think that they won’t be able to get funding, publish or find collaborators working with secondary data - this is a misconception! There are many funding opportunities available through federal agencies to support secondary data analysis, opening the door for opportunities to find collaborators and publish extensively. There are many examples of prolific researchers who have made significant and innovative advancements using secondary data in the field of mental health.
  • Secondary data isn’t precise enough: Some may hesitate to use secondary data question because they don’t think the dataset will have exactly the variables that they are looking for, or that secondary data is just for “fishing expeditions”. It is true that secondary data analysis limits you to the data that are available, so at times may require some creativity and flexibility. It can also illuminate the need for additional primary data collection. While there is potential to use secondary data for fishing expeditions, our team avoids this by publicly pre-registering research questions and analysis plans, and we recommend others to do the same.

We hope this article provided a helpful overview for working with secondary data. If you have questions or need help getting started, please contact our team at efdc-datadesign@umich.edu.

This article was inspired by multiple sources, including this article and this article, and a presentation given by Amy Byers, PhD

About the Author

Meghan Seewald, MA is the manager for the Data & Design Core. She oversees the daily operations and programming of the Core, which seeks to increase the use of secondary and existing data in mental health research. The Core provides hands-on staffing for mental health research projects utilizing secondary data, as well as consultation and guidance on secondary data sources and applications. Meghan has over ten years’ experience in research project management and administration.

 | 

 
 
Print Article

Related Articles (3)

Secondary data is an extremely valuable tool for mental health research; however choosing a secondary data source can be a complicated task. This visual guide summarizes the major domains of 21 valuable data sources for mental health research to help you get started.
Direct links to Depression Center consultation request forms, program applications, and program information contacts.
The University of Michigan's DataDirect tool offers U-M researchers customized, user-friendly access to Michigan Medicine clinical data. With options for cohort discovery, recruitment, and de-identified data output, DataDirect is a highly valuable resource for mental health research. In this article, we will walk through an example project showcasing how DataDirect can be used for clinical research at U-M.