Data sources

All findings are derived from publicly available government records. We do not use anonymous tips, leaked documents, or unverifiable sources.

Our primary data sources include:

  • NYS Authorities Budget Office – IDA project reports covering 34,348 projects across 104 agencies
  • NYS Board of Elections – 12.49 million campaign contribution records
  • NYS Department of State – Corporate filings, LLC registrations
  • FEC Individual Contributions – Federal campaign donations (2020-2024 cycles)
  • Senate Lobbying Disclosure Act (LDA) – Federal lobbying filings
  • SEC EDGAR – Public company filings and officer data
  • Census Bureau – County population and business pattern data for per-household calculations
  • NYS Comptroller – Audit reports and PARIS procurement data
  • Good Jobs First – Subsidy Tracker database of economic development awards

Matching methodology

Cross-referencing requires matching entities across datasets that use different naming conventions. Our approach:

  1. Entity normalization – Strip business suffixes (LLC, Inc, Corp), standardize punctuation, convert to uppercase for comparison
  2. Multi-tier matching – Exact match, then word-overlap (75% threshold), with manual verification of ambiguous cases
  3. False positive control – Geographic spread analysis, document frequency scoring, confidence tiers (HIGH, MEDIUM, REVIEW, LOW)
  4. Conservative counts – When reporting totals, we use exact and LLC-parent matches only (the “conservative core”) unless otherwise noted

Statistical caveats

  • Correlation is not causation. A company that donates to a politician and receives a tax break may have received that break regardless of the donation. We document the financial relationships; we do not prove that one caused the other.
  • Name matching has limits. Common names produce false positives. We flag these with confidence scores and exclude LOW-confidence matches from headline numbers.
  • Timing analysis. 54% of IDA beneficiary donations came before project approval – but Monte Carlo simulation (10,000 iterations) shows the random baseline is 76.7%, meaning pre-approval donation rates are actually lower than chance. Individual suspicious timing patterns are flagged separately.

Reproducibility

All analysis is conducted using Python scripts operating on publicly downloadable datasets. Scripts are available on request. No proprietary tools or paywalled data sources are required to reproduce our findings.

Corrections

If we get something wrong, we fix it. Corrections are noted at the top of the affected article with the date and nature of the correction. Contact editor@thepublicledgers.org with corrections.