How we built a free Zillow-tier real estate data platform

How we built a free Zillow-tier real estate data platform

Every serious real estate investor has hit the same wall: the data you need to underwrite a deal costs thousands of dollars a month, and even then it comes locked inside a platform that wants to sell you leads.

We built Scouq on a different premise. Every piece of property data that matters for deal analysis is already public. County assessors publish it. GIS portals host it. The federal government funds the collection. The only thing missing was the engineering to normalize it and make it searchable.

Here is how we did it.

The data sources

County assessor files are the foundation. Every county in the US is required to maintain a public record of assessed property values, ownership information, and sales history. The catch is that the format differs by county. Some publish CSVs updated daily. Others post PDFs annually. A few still require a written request and a check.

We wrote connectors for the most common formats first: fixed-width text files from legacy CAMA systems, CSV dumps from modern county portals, and JSON feeds from a handful of counties that have modernized. Each connector normalizes the output into a consistent schema: parcel ID, address, owner, assessed value, land use code, last sale price, and last sale date.

OpenAddresses solved the address normalization problem. It is an open-source project that collects and standardizes address data from governments worldwide. For the US, it covers the majority of the country with geocoded address points. We use it to resolve raw assessor addresses to lat/lng coordinates and to correct common transcription errors in county records.

HUD and USPS vacancy data gives us a signal for motivated sellers. HUD publishes quarterly vacancy estimates at the census tract level. USPS publishes address-level vacancy flags based on mail delivery status. Neither is perfect, but combined they identify neighborhoods with above-average vacancy rates, which correlates with off-market deal density.

Foreclosure notices are public records filed in county courts. The format varies widely. We parse PACER feeds, state court XML exports, and county recorder feeds where available. Coverage is partial but expanding.

The engineering

The data pipeline runs in Rust. We chose Rust for three reasons: memory safety without a garbage collector, deterministic performance for parsing millions of records, and native concurrency primitives that let us saturate network bandwidth without managing thread pools manually.

Each county connector is a separate Rust crate that implements a shared trait. The trait has three methods: fetch (downloads the latest data), parse (returns an iterator of normalized records), and diff (compares the new batch against the previous one and returns only changed records). The diff step keeps ingestion cheap: on most days, fewer than 0.1% of records change.

Normalized records land in ClickHouse. We chose ClickHouse over Postgres for this layer because property queries are always analytical: "find me all 3-bedroom single-family homes in this ZIP code with assessed value under $200k and no sale in the last 10 years." ClickHouse handles those filters over hundreds of millions of rows in milliseconds. Postgres would need careful indexing and would still be 10 to 100 times slower for the full-scan queries that power the deal feed.

The Supabase Postgres layer handles everything user-facing: accounts, saved deals, portfolio entries, and watch lists. That data is small, relational, and benefits from row-level security and real-time subscriptions. We use the right database for each job.

What we did not do

We did not scrape Zillow, Redfin, Realtor.com, or any MLS feed. Beyond the legal exposure, scraped listing data is a snapshot, not a record. It goes stale the moment the listing is updated. Public county records are the authoritative source for ownership and transaction history. We build on the original, not the copy.

We did not pay for a data license. The going rate for a national parcel dataset from a commercial provider is between $5,000 and $50,000 per year depending on coverage and update frequency. We cover our operational costs on the compute side, not the data side. That cost structure lets us offer a genuinely free tier.

What is next

We are adding more counties every week. The connectors are open-source. If your county is not covered yet, you can file an issue or submit a pull request. We review every contribution.

The next major data layer is public building permit records, which give us renovation signals, and public code enforcement records, which identify properties with deferred maintenance. Both are public. Both require custom connectors. Both are on the roadmap.

If you are a county data officer reading this: please publish a machine-readable feed. You are already collecting the data. Making it accessible is a public service.

Miles Dirmann

Miles is the founder of Scouq. He spent a decade in data engineering before turning his attention to real estate investing and building the tools he wished existed.

Stay in the loop

Get new posts on real estate data, deal strategy, and product updates. No spam.