logo
ResearchBunny Logo
The Criminal Justice Administrative Records System: A next-generation research data platform

Social Work

The Criminal Justice Administrative Records System: A next-generation research data platform

K. Finlay, M. Mueller-smith, et al.

Explore the groundbreaking Criminal Justice Administrative Records System (CJARS) developed by authors Keith Finlay, Michael Mueller-Smith, and Jordan Papp. This innovative project merges longitudinal criminal justice data with social, demographic, and economic records to revolutionize research and policymaking in the U.S. criminal justice system!... show more
Introduction

The United States bears substantial social costs from crime, affecting victims, individuals involved in the justice system, families, and communities. However, no unified national data infrastructure exists to measure system performance, evaluate policies, or understand populations interacting with the criminal justice system due to decentralized data across thousands of jurisdictions. Existing national programs (e.g., NIBRS, NCRP) do not connect processes across stages of justice involvement. To address these gaps, CJARS was founded in 2016 as a partnership between the University of Michigan and the U.S. Census Bureau to collect, harmonize, and integrate administrative data across five primary domains of the justice system—arrest, adjudication, incarceration, probation, and parole—and to enable linkages to non-justice socioeconomic outcomes. CJARS organizes data in a relational schema with a person-level identifier (cjars_id) and event identifiers for each domain, supporting linkage across stages of a criminal episode. As of publication, CJARS contains over 2 billion lines of raw data identifying approximately 175 million events involving 37 million individuals across 30 states, with some series extending back four decades. Data access for qualified researchers occurs through the secure Federal Statistical Research Data Centers (FSRDCs), enabling person-level linkage with Census survey and administrative data (e.g., decennial census, ACS, employment, benefits, mortality). CJARS releases new vintages approximately bi-annually and is expanding geographic and procedural coverage.

Literature Review
Methodology

The CJARS platform integrates disparate administrative records into a unified, linkable research dataset through coordinated data collection, processing, and secure dissemination.

  • Data acquisition: CJARS obtains data from police, sheriffs, prosecutors, criminal courts, departments of corrections, and state criminal history repositories via prioritized data-use agreements, supplemented by public records requests, web scraping, bulk downloads, and data donations. Acquisition emphasizes statewide systems to efficiently expand coverage, with a cost benchmark of ≤$0.01 per acquired row and opportunistic collection from local agencies where feasible.
  • Data schema: A relational database structure includes a roster file with a unique, anonymous person identifier (cjars_id) and five procedural databases for arrest, adjudication, incarceration, probation, and parole. Each procedural table includes cjars_id and an event-level primary key (e.g., arr_id, adj_id) to support both person-level and episode/event-level linkage. Associative tables map events across domains to reconstruct sequences for a criminal episode.
  • Processing at the University of Michigan: Six-step pipeline: (1) Localization of native formats into a common database format. (2) Standardization and harmonization of personally identifying information (PII), including imputation of gender and race/ethnicity when needed. (3) Entity resolution using a biometrically trained probabilistic matching model to assign unique cjars_id and track individuals across jurisdictions and time. PII is removed and data are moved to an anonymized partition. (4) Harmonization to a national schema with standardized variables and codes; notable components include machine-learning-based Text-based Offense Classification (TOC) translating millions of textual offense descriptions into unified offense codes. (5) Event deduplication to handle overlapping sources and repeated extracts, preventing over-counting contacts. (6) Episode resolution to create crosswalks linking procedural stages using timing, offense, and sentencing information.
  • Integration at the U.S. Census Bureau: The CJARS roster is processed through the Person Identification Validation System (PVS) to assign Protected Identification Keys (PIKs), enabling confidential individual-level linkage to non-CJARS Census survey and administrative data within the FSRDC secure environment. All research access occurs in FSRDCs (with many projects having virtual access) to protect confidentiality.
  • Ethics and governance: The University of Michigan IRB approved the CJARS repository (REP00000094) with prisoner advocate oversight; informed consent was waived due to use of existing records. Validation analyses were separately approved (HUM00208278). Data are disseminated only through FSRDCs, with extensive documentation, proposal guidance, and code resources available publicly.
Key Findings
  • Validation against federal statistical series: • Adjudication vs. SCPS (felony defendants in large urban counties): Across 609 jurisdiction-year cells, differences in means between CJARS and SCPS for demographics, offense types, processing times, and sentencing outcomes are not statistically significant at p<0.05 in weighted tests; one metric (violent offenses) shows marginal significance at p≈0.059 (weighted) and p≈0.029 (unweighted). Scatter plots of comparable statistics cluster around the 45-degree line, indicating close alignment. • Incarceration entries vs. NPS and NCRP: State-level annual prison entry counts from CJARS closely track NPS and/or NCRP series across states. Average absolute differences: CJARS vs. NPS = 15.8%; CJARS vs. NCRP = 11.9%. NPS vs. NCRP differ by 16.1%, indicating CJARS aligns at least as well with each series as they align with each other. • Probation entries vs. Annual Probation Survey: State-level annual probation entry counts show good alignment, with CJARS often exhibiting more stable year-to-year patterns; average absolute difference = 14.4%. • Parole entries vs. Annual Parole Survey: Strong alignment across nearly all CJARS-covered states; average absolute difference = 15.7%. Where differences exist (e.g., Nebraska), gaps are small and consistent over time, preserving trends.
  • Representativeness despite incomplete coverage: Comparing CJARS-covered vs. non-covered states (2000–2018 averages) shows no statistically significant differences in weighted means for violent crime rate, property crime rate, or imprisonment rate. Mean differences are modest: -2.2% (violent crime), +6.6% (property crime), +5.6% (imprisonment). CJARS-covered states span the distributions of these measures, supporting representativeness for national estimation while coverage expands.
Discussion

The CJARS platform successfully reproduces key aggregate statistics from established federal series across adjudication, incarceration, probation, and parole, supporting the validity of its underlying person-level microdata and linkage procedures. By integrating records across all major criminal justice domains and enabling secure linkage to rich socioeconomic data within the FSRDCs, CJARS allows researchers to examine the full pathway from arrest to sanction and to quantify spillovers on employment, benefits, family structure, mobility, and mortality. Despite not yet achieving nationwide coverage, benchmarking indicates that CJARS-covered states are representative along key crime and incarceration metrics, suggesting the platform can support credible national insights while continuing to grow. The demonstrated alignment with SCPS, NPS, NCRP, and annual probation/parole surveys underscores CJARS’s capacity to inform policy by providing detailed microdata for dynamic cost-benefit analyses and system performance evaluations.

Conclusion

CJARS delivers a next-generation, linkable national infrastructure for U.S. criminal justice research by harmonizing multi-jurisdictional administrative records across arrest, adjudication, incarceration, probation, and parole and enabling confidential person-level linkage to socioeconomic data through the FSRDC network. Validation shows close correspondence with federal statistical series, supporting the platform’s accuracy and utility. Ongoing work will expand geographic and procedural coverage, release new data vintages approximately bi-annually, and provide public tools and documentation to facilitate coverage assessment and research use. Future research can leverage CJARS to study causal effects of justice contact, heterogeneity across jurisdictions and populations, intergenerational and community spillovers, and the efficacy of policy reforms across the full justice pipeline.

Limitations
  • Incomplete national coverage at present may limit some population-level inferences and create variation in coverage across time and states.
  • Administrative data are not designed for research; records can be updated, overwritten, or deleted, and documentation of operational changes may be limited.
  • Heterogeneity in privacy rules and data access across jurisdictions complicates acquisition and may lead to uneven data quality or gaps.
  • Necessity of deduplication and episode resolution introduces potential for linkage or classification error, despite robust probabilistic methods and ML tools.
  • Some state-level differences persist in comparisons with federal series (e.g., alignment varies between NPS and NCRP across states), and certain metrics show marginal discrepancies.
  • Access constraints (FSRDC secure environment) may limit researcher access relative to open data, though they are essential for confidentiality.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny