Zillow is one of the most popular real-estate property trading websites due to its vast coverage, verified information, and user-friendliness. So, if you need to extract updated information on real estate properties, using Zillow.com would be the best option.
In this tutorial, you’ll learn how to scrape the Zillow website with 2 different approaches: creating a web scraper using Python from scratch and using easy-to-use APIs to access Zillow's data. This tutorial requires no prior experience in web scraping, and as long as you follow the steps, you can build a web scraper for Zillow.
How to scrape Zillow in Python
You can follow the following steps to scrape Zillow using Python:
Set up your environment
Understand Zillow’s web structure
Write the scraper code
Deploy on Apify
1. Setting up your environment
Here’s what you'll need to scrape the Zillow website:
Python 3.5+: Make sure you have installed Python 3.5 or higher and set up the environment.
Install required libraries: Run the command below on the terminal to install the required libraries— httpx, lxml, pandas, and requests.
pip install requests httpx pandas lxml
2. Understanding Zillow’s web structure
📌
Note: The Zillow website has 2 sections dedicated to houses for sale and for rent. For this section of the tutorial (i.e. scraping Zillow with Python), we will focus on houses for sale.
Before scraping any website, especially dynamic ones like Zillow, it’s important to understand the web structure.
To look at the website's web structure/HTML structure, right-click on any element and click “Inspect”. Or else, you can press the F12 key to open Developer Tools.
Fortunately, the Zillow website allows a simple structure as follows:
Property Price is within a span element, with the data-test attribute of "property-card-price".
Property details, such as the number of bedrooms, baths, and square feet, are within one unordered list (<ul>) element with the class StyledPropertyCardHomeDetailsList-c11n-8-105-0__sc-1j0som5-0 ldtVy. Each detail is a list item (<li>) containing bold text and an abbreviation. Please note that since all the above important information is in one element, you’ll have to split them to extract them into separate columns in the dataset.
The address or location of the property is also within a span element, with the data-test attribute of property-card-addr.
Now that you have an understanding of how the Zillow website is structured, you can continue writing the Python code.
You can begin by importing the installed libraries, and then writing a function to fetch the HTML content of Zillow property listings. Zillow uses pagination, so you’ll have to pass the page URL dynamically to get multiple pages.
# Importing the libraries
import httpx
from lxml import html
import pandas as pd
import asyncio
async def fetch_properties(url, headers):
async with httpx.AsyncClient() as client:
response = await client.get(url, headers=headers)
if response.status_code != 200:
print(f"Failed to fetch the HTML content. Status code: {response.status_code}")
return []
# Parsing the response content using lxml
tree = html.fromstring(response.content)
properties = []
# Using XPath to select property cards
property_cards = tree.xpath('//li[contains(@class, "ListItem-c11n-8-105-0")]')
for card in property_cards:
obj = {}
try:
# Address
obj["Address"] = card.xpath('.//a/address/text()')[0].strip()
except IndexError:
obj["Address"] = None
try:
# Price
obj["Price"] = card.xpath('.//span[@data-test="property-card-price"]/text()')[0].strip()
except IndexError:
obj["Price"] = None
# Extracting and splitting Bds, Baths, and Sqft data
try:
details = card.xpath('.//ul[contains(@class, "StyledPropertyCardHomeDetailsList-c11n-8-105-0__sc-1j0som5-0")]')
if details:
details_list = details[0].xpath('.//li/b/text()')
obj["Bds"] = details_list[0].strip() if len(details_list) > 0 else None
obj["Baths"] = details_list[1].strip() if len(details_list) > 1 else None
obj["Sqft"] = details_list[2].strip() if len(details_list) > 2 else None
else:
obj["Bds"] = obj["Baths"] = obj["Sqft"] = None
except IndexError:
obj["Bds"] = obj["Baths"] = obj["Sqft"] = None
properties.append(obj)
return properties
The above fetch_properties function handles sending HTTP requests (to URLs using the headers that will be declared in the next function) and parsing the HTML response to extract property details using the relevant classes and attributes.
It also separates the <ul> list of property details of bedrooms, baths, and square footage into three separate items.
However, the above code isn’t executable yet. To be able to do so, you’ll have to write the next part of the code declaring variables and scraping process.
async def main():
base_url = "<https://www.zillow.com/new-york-ny/>"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"upgrade-insecure-requests": "1"
}
all_properties = []
page_number = 1
properties_to_collect = 20
# You can change this value to the min. number of listings needed
while True:
url = f"{base_url}?page={page_number}"
print(f"Fetching page {page_number}...")
properties = await fetch_properties(url, headers)
if not properties: # If properties is empty, break the loop
print("No more properties found or unable to fetch page.")
break
valid_properties = [p for p in properties if
p["Address"] and p["Price"] and p["Bds"] and p["Baths"] and p["Sqft"]]
all_properties.extend(valid_properties)
# Exit if enough properties are collected
if len(all_properties) >= properties_to_collect:
break
page_number += 1
await asyncio.sleep(2)
# Converting to DataFrame and saving to CSV
df = pd.DataFrame(all_properties)
df.to_csv('zillow_properties.csv', index=False)
print(f"Successfully scraped {len(all_properties)} properties and saved to 'zillow_properties.csv'.")
# Running the script
if __name__ == "__main__":
asyncio.run(main())
The main function starts by defining a base URL (https://www.zillow.com/new-york-ny/) and HTTP headers to simulate a real browser request. It continues scraping pages until 20 valid listings are gathered, with a short delay between requests to avoid overwhelming the server.
Once run, it will result in a data frame and a CSV file like this:
4. Deploying to Apify
There are several reasons for deploying your scraper code to a platform like Apify. In this specific case, deploying code to Apify helps you automate the data extraction process efficiently by regulating data collection and it also offers a convenient way to store and download your data, in various formats.
When you sign up for a free account, you get immediate access to all the features of the Apify platform and $5 worth of credit every month. There's no time limit on the free plan.
The above command creates a boilerplate Actor with main.py (found in my-zillow-actor folder > src) and other required files with it so that you would not have to set up the Actor from scratch, manually.
#3. Edit main.py
Note that you would have to make some changes to the previous script to make it Apify-friendly. Such changes are:
Removing or replacing print statements with logging to handle output properly in the Apify environment.
To view and download output, click “Export Output”.
According to your needs, you can select/omit sections, and download the data in different formats such as CSV, JSON, Excel, etc.
And that’s all! You have successfully built and deployed a Zillow web scraper on Apify.
How to use your Zillow scraper with the Python Apify SDK
Here’s how you can use the Apify Zillow scraping Actor easily using the Python Apify SDK.
You can find your API token in Apify Console > Settings > Integrations.
Install apify-client SDK (unless already done).
pip install apify-client
Import Apify SDK and initialize the client.
from apify_client import ApifyClient
client = ApifyClient('YOUR_API_TOKEN')
Give the Actor input and create a new run for the Actor.
# Preparing the Actor input with max results set to 100
run_input = {
"searchUrls": [
{
# Feel free to change the target URL
"url": "<https://www.zillow.com/homes/for_sale/?searchQueryState=%7B%22isMapVisible%22%3Atrue%2C%22mapBounds%22%3A%7B%22west%22%3A-124.61572460426518%2C%22east%22%3A-120.37225536598393%2C%22south%22%3A36.71199595991113%2C%22north%22%3A38.74934086729303%7D%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22days%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22customRegionId%22%3A%227d43965436X1-CRmxlqyi837u11_1fi65c%22%7D>"
}
],
"maxResults": 100 # Limit to 100 results
}
Run the Actor and print the results
# Run the Actor and wait for it to finish
run = client.actor("maxcopell/zillow-scraper").call(run_input=run_input)
# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: <https://console.apify.com/storage/datasets/>" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
When run, the above code would scrape Zillow listings in the URL given and display the output as below:
Conclusion
This tutorial guided you through scraping Zillow using 2 ways: a) Writing Python code from scratch, and b) Using an Apify Actor API. Either method lets you successfully scrape and store the extracted information easily and accurately.
Deploy your scraping code to the cloud
Headless browsers, infrastructure scaling, sophisticated blocking. Meet Apify - the full-stack web scraping and browser automation platform that makes it all easy.
Yes, when the right tools and methods are used. You can write Python code to extract information like the price, number of bedrooms, square footage, and seller information from the Zillow website. Finally, you can create a script that navigates the Zillow website, finds information needed, and saves the data for later analysis.
Does Zillow have a Python API?
Zillow does provide a Python API, but it comes with some restrictions. Notable limitations of the Zillow API are rate limits, restrictions for commercial use, and outdated information. However, using a ready-made Apify Actor lets you scrape data easily, with no limitations on access or outdated information.