15 Free Datasets Every Hospital CFO Should Know About
Consultants typically charge $15K per benchmarking report. Turns out most of the data they use is free.
The same cost-per-case comparisons, the same payer mix analyses, the same readmission penalty lookups — all of it comes from public datasets that CMS publishes on a regular schedule. The consultants download CSVs, run them through Excel, and charge for the privilege.
There are about 15 relevant datasets. Here are the eight that matter the most.
1. Hospital Cost Reports (HCRIS)
This is the benchmarking gold standard. Every Medicare-participating hospital files an annual cost report with CMS. It is public. Anyone can download every hospital's operating costs, revenue, bed count, FTEs, cost-to-charge ratios, and margins.
The dataset is called the Healthcare Cost Report Information System, and it lives on data.cms.gov. It covers fiscal years going back decades. Comparing cost-per-adjusted-discharge against a peer 200-bed community hospital two counties over starts here. It is the same data the consultants use. The difference is they charge to look at it.
2. Medicare Inpatient by Provider and Service
CMS publishes DRG-level data for every Medicare-participating hospital — average charges, average Medicare payments, and discharge counts, broken down by diagnosis group. It is on data.cms.gov and updated annually.
This is how to answer the question: "Is a hospital charging more or less than peer hospitals for the same procedure?" Pick a DRG, filter to a state or bed-size cohort, and compare. Useful for service line profitability analysis and for contract negotiations with payers. If every other 300-bed hospital in the region is getting paid 20% more for the same DRG, that is useful to know before sitting down at the table.
3. DOGE Medicaid Provider Spending
This one is new. In early 2026, the Department of Government Efficiency released a large Medicaid claims dataset — outpatient and professional claims aggregated by provider NPI, with service counts and payment amounts. The data was published on Hugging Face and has been mirrored across several repositories.
A few caveats. This is aggregated data, not raw claims. It is missing inpatient claims, which represent roughly 37% of Medicaid spending. It does not include pharmacy. And because it is aggregated by NPI, episode-level analysis is not possible. What it is useful for: comparing Medicaid billing patterns to peers, identifying service categories where volumes look unusual, and getting a rough picture of Medicaid payment rates in a given market. What it is not useful for: fraud detection, despite what some of the headlines suggested.
4. Hospital Price Transparency Machine-Readable Files
Since 2021, hospitals have been required to post their negotiated payer rates in machine-readable format. That means competitor rates are theoretically public. CMS requires hospitals to publish standard charges for all items and services, including payer-specific negotiated rates, discounted cash prices, and de-identified minimum and maximum rates.
The files are posted on each hospital's own website, which means there is no single central download. Quality varies wildly. Some hospitals post clean, well-structured JSON. Others post 500MB files that crash a browser. A few are still not compliant at all, though CMS enforcement has tightened — fines can reach $300 per bed per day as of the most recent enforcement update. Compliance guidance and requirements are at cms.gov/hospital-price-transparency. When the data is usable, it is incredibly valuable for competitive positioning and payer contract negotiations.
5. Hospital Readmissions Reduction Program (HRRP) Data
CMS penalizes hospitals with excess readmissions for six conditions: heart attack, heart failure, pneumonia, COPD, hip/knee replacement, and coronary artery bypass graft. The penalty can be up to 3% of a hospital's total Medicare payments. That is real money.
The HRRP data — including each hospital's excess readmission ratios and payment adjustment factors — is public. Program details and data are at cms.gov. Monitoring penalty exposure and benchmarking against peers is straightforward with this data. For readmission reduction initiatives, the baseline data shows where to focus.
6. Hospital Value-Based Purchasing (VBP) Program Data
VBP adjusts Medicare payments based on clinical outcomes, patient experience (HCAHPS), safety, and efficiency. CMS withholds 2% of participating hospitals' base operating DRG payments and redistributes it based on performance scores. Hospitals either get a bonus or take a cut.
The program data — including domain scores and total performance scores for every participating hospital — is published by CMS at cms.gov/medicare/quality/value-based-programs/hospital-purchasing. Cross-referencing this with cost report data reveals which hospitals are earning bonuses while keeping costs low, and which are getting penalized on both quality and margins.
7. IPPS Impact File
Every year when CMS finalizes the Inpatient Prospective Payment System rule, they publish an impact file that shows the estimated payment change for every hospital. Case mix index, wage index, disproportionate share percentage, and projected payment rate change — all in one file.
The FY 2026 IPPS final rule and associated data files are available at cms.gov/medicare/payment/prospective-payment-systems/acute-inpatient-pps. This is essential for budget season. If CMS is projecting a 2.9% rate increase but a hospital's wage index is dropping, the net effect might be flat or negative. The impact file shows exactly what to expect.
8. Geographic Variation Public Use File
CMS publishes per-capita Medicare spending data broken down by county and state, covering inpatient, outpatient, physician, post-acute, and other service categories. It is on data.cms.gov.
This is a market assessment tool. For evaluating whether to expand a service line or open a new outpatient facility, this data shows where Medicare spending is concentrated and where utilization patterns look different from national averages. Paired with Census data on population growth and aging demographics, it provides the foundation for a service line business case.
How the platform handles this
The platform can run scheduled checks against these datasets. When new data drops on data.cms.gov, an agent downloads it, loads it into a structured format, and runs a standard set of comparisons against configured benchmarks. A separate agent handles the Medicaid data. Another monitors price transparency files from competitor hospitals in a given market.
All of it runs through Curate-Me as a governed agent fleet — one agent per dataset, cross-referencing results, with cost tracking on every AI call and sensitive data protection on anything that touches provider-linked data. The Medicaid analysis agent costs $0.47 per run. Consultants typically charge $5,000 for the same analysis annually.
The total cost for a full benchmarking refresh across all eight datasets: under $4 in API calls. It runs on a schedule. Results arrive in an inbox. Nobody sends a $15,000 invoice.
What the experiments revealed
Cost report data requires skepticism. Hospitals file these reports themselves, and the accounting can be creative. Depreciation schedules, overhead allocation methods, and cost center definitions vary. Always cross-reference cost reports with other sources before drawing conclusions. A hospital that looks wildly efficient on paper might just be allocating costs differently.
The DOGE data is not what the headlines said. It is not raw claims. It is aggregated. It is missing inpatient and pharmacy. It is useful for Medicaid billing pattern analysis, but it is not the fraud-detection tool some commentators made it out to be. Treat it as one input, not the answer.
Price transparency files are a mess. The requirement is clear. The implementation is uneven. Some hospitals post well-structured data. Others post files so large and poorly formatted that they are effectively unusable without custom parsing. Budget time for data cleaning when going down this path.
data.cms.gov has a free API. No need to manually download CSVs. The API supports filtering, pagination, and direct data access. It is not well-documented, but it works. For automated pipelines, the API is preferable to scraping download pages.
The bottom line
The data is free. The analysis used to cost a fortune. The platform runs it for $0.47 on a scheduled automated check.
Every dataset listed above is public, maintained by CMS, and updated on a regular cadence. The barrier was never access — it was the tooling to make sense of it. That barrier is gone.
This post was researched and written with AI assistance through the Curate-Me platform. Total cost: tracked and auditable.
More from Margin Mandy