At the bus station in Durham with segregated waiting rooms, North Carolina. May, 1940. By Jack Delano. Public Domain.
How do we identify racist language in legal documents? Instead of proliferating racist ideas, can algorithms help us better study the history of race and advocate for justice? An interdisciplinary team of UNC researchers, scholars, and experts–including several Library Data Services staff–developed a text mining project to answer these questions.
On the Books: Jim Crow and Algorithms of Resistance is a project of the University of North Carolina at Chapel Hill Libraries that used text mining and machine learning to discover Jim Crow and racially-based legislation signed into law in North Carolina between Reconstruction and the Civil Rights Movement. The team developed:
- A publicly accessible, plain-text corpus of North Carolina Session Laws from 1866-1967 for general legal and historical research, and a list of Jim Crow laws discovered.
- A public git repository containing general scripts, open source software, and documentation for the benefit of similar projects.
- A short white paper describing their methods and workflows for accurate, large-scale OCR text conversion and text analysis for future teams seeking to create large-scale digital corpora and/or experiment with data-driven discovery.
- A website for educators and researchers interested in Southern and African American History that lists and contextualizes the North Carolina segregation laws identified.
Phase 1 of On the Books: Jim Crow and Algorithms of Resistance was funded by the Andrew W. Mellon Foundation as part of the first cohort for Collections as Data: Part to Whole. Phase 2 is being funded by the ARL Venture Fund and an IDEA Action Grant, part of The Reckoning Initiative at the University Libraries. For more information about the project, its contributors, and the inspiration behind it, visit the project website above.