What Data Arbitrage Actually Is
In Homo Deus, Yuval Harari makes an argument that stuck with me for years. Every era of human history, he says, runs on a key resource — and power belongs to whoever controls it. For most of history that resource was land. Then, with the industrial revolution, it became machines and capital. And now, he argues, it's data. Whoever owns the data owns the leverage.
I read that as a member of the World Future Society, surrounded by people who thought about this stuff for a living. It felt big and abstract at the time. Important, but far away — the kind of thing you nod at over coffee.
Then a friend showed me a website, and it stopped being abstract.
The realization
My friend Andy Martinec pulled up a site called AlcoholSalesTracker. It takes alcohol sales data — which is publicly available, filed with state agencies, sitting in databases nobody enjoys using — and it does three things to it: collects it, cleans it up, and puts a genuinely good interface on top. That's the whole product.
And people pay for it.
I remember the exact thought: wait — that's it? You take data that's already public, market it better, and customers buy it?
Yes. That's it. That's the business.
It's not a trick and it's not new. It's the oldest business there is — arbitrage — pointed at the one resource Harari says is still being repriced in real time. Buy low, sell high. Except here you're not buying low. The data is effectively free. You're buying access and effort low, and selling clarity high.
Why the gap exists (and why it isn't closing)
Here's the part that makes this a real opportunity instead of a clever observation: the gap is structural, and the people who could close it won't.
The government is slow. County records, court filings, appraisal rolls, permits, licenses — it's all public by law, and almost all of it is trapped behind portals built in 2004, PDFs that fight you, and "data products" priced and formatted like it's still 1998. The agencies that hold this data have no incentive and no urgency to make it usable. That's not a criticism. It's just what bureaucracies are.
Private individuals with a little vision are the opposite. Fast, motivated, allergic to friction. If you can see that a clean interface on top of an ugly public dataset is worth money to someone, you can build it before a government committee finishes scheduling its first meeting.
That speed-and-packaging gap is the arbitrage. It's wide open precisely because closing it requires the one combination most people don't have: the patience to go get ugly data, and the taste to make it not-ugly.
What it looks like in practice
Let me make it concrete, because abstraction is where these ideas go to die.
A Texas appraisal district will email you its entire property roll — every owner, every property address, every assessed value, the legal description, all of it — for about four dollars. Some of them hand it over for free. I've done it. One short, polite public-information request and a county's worth of property data lands in your inbox.
On its own, that file is close to useless to a normal person. It's a pipe-delimited dump with cryptic column names and no interface. That ugliness is exactly the point. Clean it, join it to the other public signals around it — foreclosure notices, permits, probate filings, tax delinquencies — make it searchable, put a real interface on it, and suddenly investors, lenders, and agents will pay every month to use it. They're not paying for the data. They could technically get it themselves. They're paying because you went and got it, and made it make sense.
That's the entire model: public data that's technically free but practically locked up, turned into something people will gladly pay for.
Why now
A few years ago, doing this at scale was genuinely hard. You needed to write and maintain a small army of scrapers, wrangle a dozen incompatible county systems, clean millions of messy rows, and build the software to serve it. It was possible, but it was a slog that filtered out most people.
That filter is dissolving. The tools to collect, clean, and serve data have gotten dramatically better and cheaper — and AI has collapsed the cost of the boring middle: the parsing, the matching, the wrangling that used to eat all your time. The moat was never the data. Anyone can get the data. The moat was the willingness to do the unglamorous work — and that work just got a lot smaller.
So you have a rare alignment: a resource Harari calls the most important of our age, sitting in plain sight, mispriced because the people holding it are slow — and the cost of arbitraging that gap has never been lower.
The data is just sitting there. Most people won't go get it.
You could.
This is the thesis behind everything I'm building and writing here. I run a Texas property-data company, and I'm going to show you exactly how this works — the real files, the real costs, the real numbers, what's possible and what's structurally capped. If that's your kind of thing, the newsletter is where the playbooks land first.