Add South Carolina dataset exploration#120
Add South Carolina dataset exploration#120DTrim99 wants to merge 12 commits intoPolicyEngine:mainfrom
Conversation
Adds data exploration notebook and summary CSV for South Carolina (SC) dataset: - Household and person counts (weighted) - AGI distribution (median, average, percentiles) at household and person level - Households with children breakdown - Children by age group demographics - Income bracket analysis Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add H.4216 reform analysis notebook using PolicyEngine microsimulation - Include RFA official analysis data for comparison - Add detailed comparison markdown explaining $159M difference: - PE shows +$40M revenue vs RFA's -$119M - Key difference: SCIAD phase-out treatment for upper-middle income - Implementation uses AGI - SCIAD vs federal taxable income Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Key findings: - PE has 7.85x more $0 income returns vs RFA - PE has ~50% fewer returns in $100k-$300k brackets - PE has 1.9x more millionaire returns paying 78% higher avg tax - Total baseline revenue similar ($6.52B vs $6.40B) but composition differs - PE derives 48% of SC income tax from millionaires vs RFA's 15% Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
PE includes non-filers which explains 540k extra returns in $0 bracket Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add implementation note about sc_additions bug fix - Add RFA comparison section to notebook - Update comparison markdown with post-fix accuracy (~93%) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add data_exploration_staging.ipynb for staging SC dataset - Add sc_h4216_budget_impact.py for quick budget impact calculation - Add staging dataset summary CSV - Update reform analysis notebook with RFA comparison fixes - Update tax impact CSV with corrected results (staging data) Staging vs Production dataset comparison: - Staging has 17% fewer households (more focused on filers) - Staging median AGI is 39% higher (0k vs 3k) - Budget impact with staging: -46.6M (5.21%) / -10.9M (5.39%) - RFA estimate: -19.1M (93% accuracy with 5.39% rate) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update: Staging Dataset Analysis & PR #7514 FixChanges
Results with PR #7514 Fix
Staging vs Production Dataset
The staging dataset better represents actual tax filers (fewer zero/low income units), which explains the improved alignment with RFA estimates. |
- Remove staging dataset files (broken data) - Add data_exploration_test.ipynb for test dataset (hf://policyengine/test/mar/SC.h5) - Update all notebooks to use .values for raw arrays (avoid double-weighting) - Update sc_h4216_budget_impact.py to use test dataset and correct RFA estimate - Update sc_h4216_reform_analysis.ipynb to use test dataset - Add sc_h4216_dataset_comparison.py comparing production vs test datasets RFA estimates: - 5.21% rate: -$309M - 5.39% rate: -$119.1M Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Produces output in exact RFA format for direct comparison - Uses test dataset (hf://policyengine/test/mar/SC.h5) - Uses 5.39% top rate (RFA version) - Exports to pe_h4216_test_analysis.csv - Includes side-by-side comparison with RFA data Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add detailed analysis explaining why Production overestimates and Test underestimates - Core issue: baseline revenue calibration ($6.5B Production vs $4.0B Test vs $6.4B RFA) - Add test dataset exploration notebook and summary CSV - Update comparison markdown with recommendations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Dataset Comparison FindingsUpdated analysis comparing Production and Test datasets against RFA fiscal notes for SC H.4216. Budget Impact Results
Key Finding: Baseline Revenue Calibration
Production overestimates because it has higher average incomes ($104k vs $74k Test) and more tax units affected by rate cuts. Test underestimates because it has 37% less baseline revenue than RFA despite better return counts (2.71M vs 2.76M). Ideal Dataset Would Have:
See |
- Restructure into h4216_analysis/ folder with rate-specific subfolders - Add analysis notebooks for both State and Test datasets at each rate - Add comprehensive comparison markdown with bracket-by-bracket analysis - Remove unused intermediate scripts and notebooks Key findings: - 5.21% rate: State -$393M, Test -$212M vs RFA -$309M - 5.39% rate: State -$198M, Test -$93M vs RFA -$119M - Primary driver: millionaire distribution (State has 90% more, Test has 41% fewer) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SC H.4216 Analysis Update - Comprehensive RFA ComparisonReorganized analysis with full bracket-by-bracket comparison against RFA fiscal notes. RFA Fiscal Notes5.21% Rate: https://legiscan.com/SC/supplement/H4216/id/682946/South_Carolina-2025-H4216-H4216_2026-02-24_Amended.pdf Budget Impact Summary
Key Finding: Millionaire DistributionThe primary driver of discrepancies is the millionaire bracket:
File StructureConclusionPolicy encoding is correct. All discrepancies stem from dataset characteristics, primarily millionaire weighting. See |
Summary
Key SC Statistics
H.4216 Tax Reform Analysis
Compares PolicyEngine microsimulation results against official RFA (Revenue & Fiscal Affairs) analysis.
Key Differences
The $159M discrepancy is primarily due to:
See
h4216_analysis_comparison.mdfor detailed analysis.Files Added
us/states/sc/data_exploration.ipynb- SC dataset explorationus/states/sc/sc_dataset_summary_weighted.csv- Dataset summaryus/states/sc/sc_h4216_reform_analysis.ipynb- H.4216 reform analysisus/states/sc/sc_h4216_tax_impact_analysis.csv- PE analysis resultsus/states/sc/rfa_h4216_analysis.csv- RFA official analysisus/states/sc/h4216_analysis_comparison.md- Comparison analysisTest plan
🤖 Generated with Claude Code