Understanding limitations and quality issues in our data
The NJFOG Court Data project aggregates data from multiple government sources, each with its own formatting conventions and limitations. This page documents known data quality issues to help users interpret the data correctly.
We display data as-is from source systems rather than attempting to guess or reconstruct missing information. This preserves accuracy while making limitations visible.
Court case data comes from the NJ Judiciary's Automated Case Management System (ACMS) via the PAB0231 fixed-width export format. This mainframe-era format has significant limitations.
Case captions and party names are stored in fixed-width fields that truncate long names. For example, "TOWNSHIP OF MOUNT LAUREL" might appear as "TOWNSHIP OF MOUNT LA". This affects approximately 15-20% of case titles.
Party names may appear in different formats (e.g., "JOHN SMITH" vs "SMITH, JOHN"). Government agency names may use abbreviations or full names inconsistently.
Some fields may be blank or contain placeholder values. Disposition dates are only available for closed cases.
GRC complaint data is scraped from the Government Records Council's public decision search system. Names and details are extracted from HTML pages and PDF documents.
The same person or agency may appear with different spellings or abbreviations across complaints (e.g., "Newark PD" vs "Newark Police Department" vs "City of Newark Police").
Automated extraction from PDF documents may occasionally miss or misparse information, particularly for older complaints with poor OCR quality.
Some older GRC complaints have broken PDF links. We've identified and flagged these cases (e.g., 2005-69, 2005-125).
GRC data is scraped daily via automated GitHub Actions. New decisions are typically added within 24 hours of publication.
Office of Administrative Law data was obtained via OPRA request and covers matters closed since January 1, 2019.
Party names in OAL cases may be abbreviated or redacted for privacy, particularly in education, child welfare, and medical cases.
Only cases closed since January 1, 2019 are included. Pending cases and older closed cases are not in this dataset.
We extract and normalize entity names from case titles and GRC complaints to enable cross-referencing and analysis. This is an imperfect process.
Party names are extracted from case titles using pattern matching (e.g., splitting on "VS"). Government entities are identified using keyword patterns.
Names are cleaned by removing titles, standardizing spacing, and handling common abbreviations.
Similar names are merged using fuzzy matching and manual rules. Known truncated names are mapped to complete versions where possible.
If you notice data quality issues, incorrect information, or have corrections to suggest, please let us know: