Three ways to learn about text and data mining at the University Libraries

January 24, 2022

vector illustration of lines and zeros with the icon of a pickaxe

Text and data mining use computer algorithms to detect patterns and trends in large volumes of text. This powerful technique can bring new insight to research in the social sciences and humanities.

The University Libraries is your source of guidance, assistance and content for text and data mining. Here are three ways we can partner with you:

  1. The Library’s text and data mining guide is a good way to get started. The guide describes best practices for obtaining clean data. It provides information about licensing and vendor requirements, and it includes contact information for Library experts.
  2. Many data sources are available through the University Libraries. One of the newest additions is the full corpus of Washington Post daily editions from 1977 through 2021. (ONYEN required for off-campus access.) Other available sources include Congressional hearings and the Congressional record, movie texts and even the transcriptions of soap operas.
  3. Library-sponsored short courses can help you learn the coding and programs you need to conduct text and data mining research. Individual courses are available throughout the year. A workshop dedicated to text and data mining is being developed for fall 2022.

Still have questions? The University Libraries team of experts is here to work with you on all aspects of your learning and research – from understanding how data and text mining can help you, to identifying and obtaining the source data you need. The subject liaison librarian for your discipline is a great place to start, or contact Michele Hayslett, librarian for numeric data services and data management (michele_hayslett@unc.edu).

Tagged with: , , ,