How I Loaded 8 Million Property Records from 12 Texas Counties for Free
Two months ago I had property data for four Texas counties. Today I have 8.1 million parcels across twelve, covering the state's major metros from El Paso to the Gulf Coast. The total cost was four dollars — and that was for a single GIS shapefile I didn't strictly need. Everything else was free.
I want to show you exactly what the pipeline looks like, county by county, because the details are where the value hides. The thesis is simple — public data, practically free, ugly on arrival, valuable when cleaned — but executing it means dealing with twelve different file formats, four different download methods, and a few counties that make you work for it. That's the part nobody talks about, and it's the part that matters.
The county-by-county breakdown
Here's every county in my dataset, how I got the data, what format it arrived in, and how many parcels it produced.
The free downloads (no email, no waiting):
| County | Pop. | Parcels | Format | Source |
|---|---|---|---|---|
| Harris (Houston) | 4.7M | 1,600,000 | Tab-delimited flat files (owners.txt + real_acct.txt) | hcad.org free public download |
| Tarrant (Fort Worth) | 2.1M | 2,300,000 | Standard TDCA appraisal export | tad.org bulk data page |
| Collin (Plano) | 1.1M | 351,000 | CSV export | collincad.org data downloads |
| Fort Bend (Sugar Land) | 850K | ~400,000 | Standard appraisal file | fbcad.org public data |
These four you can download tonight. No person involved. Click, download, unzip. Harris alone is 1.6 million parcels — the third-largest county in America — sitting on a public web server.
The one-email counties:
| County | Pop. | Parcels | Format | How |
|---|---|---|---|---|
| Travis (Austin) | 1.3M | 485,000 | Fixed-width PROP.TXT (528MB ZIP, 17.7GB uncompressed) | Public information request to [email protected]. Free. |
| Dallas | 2.6M | 860,000 | Quoted CSV (ACCOUNT_INFO.CSV + ACCOUNT_APPRL_YEAR.CSV) | [email protected]. Free. |
| Bexar (San Antonio) | 2.0M | 801,000 | Standard appraisal export | [email protected]. Free. |
One email per county. The template from my earlier post works verbatim. Travis responded in three days; Dallas in two; Bexar in five. Total spend: $0. The four dollars went to Travis for an optional GIS shapefile I asked for on top of the appraisal roll.
The dig-for-it counties (no bulk page, but the data's there):
| County | Pop. | Parcels | Format | Story |
|---|---|---|---|---|
| El Paso | 870K | 452,000 | Tilde-delimited SQL Server dump (three joined files) | Found on the CAD's data page after navigating a JS-rendered SPA. The format looked like nothing I'd seen before — tildes as delimiters, three separate files (Properties, Owners, Values) joined on a database ID. |
| Galveston | 350K | 210,000 | Fixed-width TDCA APPRAISAL_INFO (9,716-character lines) | Published as part of the state's TDCA standard format. Each line is nearly ten thousand characters wide, with field positions mapped in a layout document. Owner name starts at position 608; situs street at 1038; market value at 1920, stored in cents. |
| Nueces (Corpus Christi) | 360K | 219,000 | Fixed-width PACS format | Free download from nuecescad.net. Another fixed-width format with its own field positions — prop_id, owner, situs, market_value all at different offsets than Galveston. |
| Cameron (Brownsville) | 420K | 222,000 | XLSX spreadsheet | Free download from cameroncad.org. The only county that handed me a clean, columnar spreadsheet. Column headers, structured data, no parsing needed. I almost didn't trust it. |
These four were the interesting ones. Each required a custom parser. None of them were hard, exactly — but each one was a different kind of annoying, which is exactly why most people stop at the easy downloads and never get here.
What "different format" actually means
Let me make the parsing concrete, because "I wrote a parser" sounds trivial until you're staring at a 9,716-character line with no headers.
Travis (fixed-width): The PROP.TXT file is 17.7 gigabytes uncompressed. No delimiters, no headers. You get a layout document — a PDF that says "owner name starts at byte 608, length 40; situs number starts at byte 1038, length 6" — and you write code that slices each line at those exact positions. Off by one byte and you're reading half an owner name and half a legal description. I pulled 422,000 real-property records from 492,000 total (filtering out personal property, mineral rights, and utility accounts).
El Paso (tilde-delimited, multi-file join): Three files — Properties, Owners, Values — each delimited by tildes, each keyed on a Property_dbId. To get one complete parcel record (owner + address + value) you join all three on that ID. It's a SQL Server dump reformatted as flat files. The first batch load hit a server error at 360,000 records; I added row-level retry with idempotent upserts and the full 452,000 loaded clean on the second pass.
Galveston (extreme fixed-width): TDCA standard format, but the lines are 9,716 characters wide. The layout document maps twelve fields I care about across those ten thousand characters. Market value is stored at position 1920 in cents — so the raw value 00000025000000 means $250,000. Miss the "in cents" note in the layout doc and every property in the county looks like it's worth a hundred times what it is.
Cameron (XLSX): Clean columns, real headers, no parsing drama. I loaded 222,000 records in about forty seconds. Every other county made me earn it; Cameron just handed me a spreadsheet. If every CAD worked like Cameron, this post would be three paragraphs.
The cross-reference — where free data becomes valuable data
Eight million parcels in a database is a nice number. But the parcels alone aren't the product. The product is what happens when you cross-reference the parcels against other public signals.
Take pre-foreclosures. A foreclosure notice filed with the county clerk gives you a case number, an owner name, maybe a legal description — but often no street address. The address is the thing an investor actually needs. Without it, the notice is a name floating in space.
But I have the appraisal roll. I have every owner name tied to every situs address in the county.
So the pipeline is: normalize the owner name from the foreclosure notice, normalize every owner name in the appraisal roll, match them, and pull the situs address across. One cross-reference, and a foreclosure record that was useless becomes a lead with a street address, a property value, and an owner you can find.
Here's what that looked like in practice:
- Houston (Harris): 2,160 pre-foreclosure records enriched with real addresses. Address coverage went from 71.6% to 85.0%.
- Austin (Travis): 88 records enriched. Null addresses dropped from 218 to 131 — a 40% reduction.
- Dallas: 24 records enriched. Small number, but Dallas pre-foreclosures are genuinely data-limited (84% of records have no owner name at all, just case numbers).
- Cash buyers across three counties: 354 records enriched with addresses. Dallas cash-buyer address coverage jumped from 63% to 92%.
Total: 2,626 records turned from incomplete scraps into actionable leads, using nothing but free public data cross-referenced against other free public data.
The enrichment isn't magic. It's a join. But nobody does it because it requires having both datasets locally, cleaned and normalized, with a matching key. That's the work. That's the moat.
What it costs, honestly
Let me lay out the real numbers so you can see the economics:
- Data acquisition: $4 (one GIS file from Travis). Everything else: $0.
- Storage: The raw files total about 25 GB uncompressed. Fits on any laptop.
- Compute: A standard database. I use Supabase (free tier handles it; paid tier for production).
- Time: The first county (Travis) took a full day — writing the parser, debugging the fixed-width offsets, building the pipeline. County twelve (Cameron) took forty minutes. The work compounds. Each new format gets easier because you've already solved most of the problems in a slightly different shape.
A data vendor selling property records for these twelve counties would charge you roughly $200–$500 per month per county, depending on the vendor and the refresh rate. That's $2,400 to $6,000 a month for the same data I'm holding for four dollars.
The vendor's value is real — they've already done the parsing, the cleaning, the normalization, the cross-referencing. I'm not pretending that work is trivial. I just did it. And now that it's done, the marginal cost of adding county thirteen is close to zero. The economics get better with every county, not worse.
What's next (and what's capped)
I'm not going to pretend every county is free and easy. Denton County runs a JavaScript SPA with no bulk download; I filed a public information request (ticket #632780) and I'm waiting. Some smaller counties charge $50–$100 for their data. A few have no digital export at all — you'd have to scan paper records, and that's a different business.
And the cross-reference has limits. Dallas pre-foreclosures grade an F not because my data is bad, but because 84% of the source records — the foreclosure notices themselves — contain no owner name. Just a case number and a court date. You can't match what doesn't exist. Knowing where the ceiling is matters as much as knowing where the floor is.
But twelve counties covering Houston, Dallas, Austin, San Antonio, Fort Worth, El Paso, and the Gulf Coast — that's the bulk of the Texas real estate market in one database, refreshable for free, cross-referenced against every public signal I can find.
The data was always sitting there. It just needed someone willing to parse a 9,716-character line and figure out that the value was stored in cents.
If you're building a data business — or thinking about it — the newsletter is where I share the playbooks, the real numbers, and the formats that made me question my career choices. County thirteen is coming.
Everything in this blog is built on the same data that powers Texas Signals: 8M+ property records, pre-foreclosures, tax delinquencies, and distress signals across 12+ Texas counties. Free 7-day trial.
Try Texas Signals FreeCheck out my books on Amazon and Gumroad — same operator voice, deeper frameworks.