How to scrape Expedia with Python (2025 guide)

Endless network calls, lazy-loaded content, frequent layout changes: scraping Expedia isn’t a beginner-level task. We show you how to build an Expedia scraping script in Python using the Apify CLI, and unlock cloud execution, API access, and scheduling capabilities.

Want to extract hotel listings from Expedia without hitting a wall of JavaScript and anti-scraping protections? If you’ve tried scraping dynamic travel sites before, you know how challenging it can be. Between endless network calls, lazy-loaded content, and frequent layout changes, scraping Expedia isn’t a beginner-level task.

In this guide, you’ll learn how to build an Expedia scraping script in Python using the Apify CLI. We’ll walk you through the entire process, from setting up the project to deploying your Expedia scraper on the Apify platform!

Deploying your Expedia web scraping script to Apify unlocks cloud execution, API access, and scheduling capabilities. This lets you run your scraper reliably at scale, without having to manage any infrastructure yourself.

In the end, we'll also cover an alternative approach: using a ready-made Expedia scraper. This doesn't allow as much customization as building your own, but all the hard work has been done for you.

How to scrape Expedia using Python

In this step-by-step tutorial, you’ll see how to scrape the following Expedia hotel search page for Madrid Centro:

The target Expedia search page
The target Expedia search page

We'll guide you through the Expedia scraping process with these steps:

  1. Select the scraping libraries
  2. Create the Python project
  3. Configure input reading
  4. Initialize PlaywrightCrawler
  5. Complete the scraper infrastructure
  6. Prepare to scrape hotel listings
  7. Implement the hotel listing scraping logic
  8. Store the scraped data
  9. Integrate residential proxies
  10. Put it all together
  11. Bonus step: Deploy to Apify

We'll then show you the no-code approach using an off-the-shelf Expedia Scraper on Apify Store.

Prerequisites

To follow along with this tutorial, make sure you meet the following prerequisites:

  • A solid grasp of how the web works, including HTTP requests, status codes, and JavaScript-driven pages
  • Familiarity with the DOM, HTML, and CSS selectors
  • An understanding of the difference between static and dynamic sites

Since Python is the programming language used for this guide, you’ll need:

Finally, you'll deploy your Expedia scraper to Apify. So, you'll also require:

  • Node.js set up locally (to install the Apify CLI)
  • An Apify account
  • A basic understanding of how Apify works

Step 1: Select the scraping libraries

Before jumping into coding, take a moment to familiarize yourself with the target Expedia page. The goal here is to understand how the page loads its data so you can choose the right web scraping tools.

Open the target page in your browser (preferably in incognito mode to ensure a fresh session), and you'll see something like this:

The target page loading in the browser
The target page loading in the browser

That initial loading screen already suggests that hotel data may be loaded dynamically. Now, scroll down to the end of the hotel listings. There, you should see a “Show more” button. Click it to see what happens:

The interaction with the “Show more” button
The interaction with the “Show more” button

At this point, it’s clear that the page loads data directly in the browser. You can confirm that by right-clicking anywhere on the page and selecting "Inspect" to open your browser’s Developer Tools. Then, reload the page and watch the network activity:

The AJAX calls made by the Expedia target page
The AJAX calls made by the Expedia target page

You’ll notice several GraphQL requests being made during page rendering to fetch hotel data. This confirms that you need a browser automation tool (like Selenium or Playwright) to scrape Expedia. That’s because the scraper must be able to fully render the page (JavaScript included), not just parse the static HTML.

However, using a vanilla Playwright approach to scraping can be challenging. The reason is that it requires you to manually handle things like proxy rotation, retries, concurrency, and more. That’s why using Playwright through Crawlee is faster and easier.

Crawlee is a web scraping framework for Python (and JavaScript) that handles blocking detection, crawling logic, proxies, and browser orchestration out of the box. That way, you can focus on scraping, not infrastructure. Plus, it's deeply integrated with Apify.

Step 2: Create the Python project

Before building your Expedia scraper, you need to set up your environment. Since the script will be deployed on Apify, start by installing the Apify CLI globally using the following command:

npm install -g apify-cli

This installs the Apify CLI globally on your system so you can create and manage Actors easily.

Run the following command to create a new Apify Actor:

apify create

You’ll be prompted to answer a few configuration questions. Use the following answers:

✔ Name of your new Actor: expedia-scraper
✔ Choose the programming language of your new Actor: Python
✔ Choose a template for your new Actor. Detailed information about the template will be shown in the next step. Crawlee + Playwright + Chrome
✔ Do you want to install the following template? Y

This will generate a Python-based Apify Actor inside the expedia-scraper folder, using the “Crawlee + Playwright + Chrome” template. This template sets you up with everything you need to scrape dynamic websites using Playwright.

📌
Note: The template installation step may take a few moments to complete, so be patient.

Once created, your expedia-scraper folder should contain the following file structure:

The file structure generated by the Apify CLI
The file structure generated by the Apify CLI

The Apify CLI automatically creates a .venv folder containing a Python virtual environment. Activate it with:

source .venv/bin/activate

Or, equivalently, on Windows, run:

.venv\\Scripts\\activate

Once the virtual environment is active, launch the command below to download the browsers required by Playwright:

python -m playwright install

This downloads the necessary browser binaries so Playwright can automate browser interactions.

Now, navigate to the src/ folder and open the main.py file. You’ll find a section that starts with async with Actor, which currently contains sample Playwright automation logic. Clear that section and prepare to replace it with your custom logic to scrape Expedia:

# src/main.py

from __future__ import annotations

from apify import Actor
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext

async def main() -> None:
    async with Actor:
        # Your Expedia Actor scraping logic...

Don’t forget that you can launch your Actor locally for testing purposes with:

apify run

Step 3: Configure input reading

To read the target URL from the Actor’s input arguments, add the following code under the async with Actor block:

# Retrieve the Actor input, and use default values if not provided
actor_input = await Actor.get_input() or {}

# Extract URLs from the start_urls inout array
urls = [
    url_object.get("url") for url_object in actor_input.get("start_urls")
]

# Exit if no URLs are provided
if not urls:
    Actor.log.info("No URLs specified in Actor input, exiting...")
    await Actor.exit()

If the start_urls field is missing or empty, the Actor exits immediately using Actor.exit(). Otherwise, it retrieves and extracts the list of provided URLs.

To set the target Expedia scraping page, update the INPUT.json file located at /storage/key_value_store/default/with the following JSON content:

{
  "start_urls": [
    {
      "url": "https://www.expedia.com/Hotel-Search?destination=Madrid%20Centro%2C%20Madrid%2C%20Community%20of%20Madrid%2C%20Spain&regionId=553248631822337314&latLong&d1=2025-07-25&startDate=2025-07-25&d2=2025-07-28&endDate=2025-07-28&adults=2&rooms=1&children=&pwaDialog=&daysInFuture&stayLength&sort=RECOMMENDED&upsellingNumNightsAdded=&theme=&userIntent=&semdtl=&upsellingDiscountTypeAdded=&categorySearch=&useRewards=false"
    }
  ]
}

With this configuration, the urls array will contain a single URL (the URL that points to the desired Expedia hotel search page for Madrid).

Step 4: Initialize PlaywrightCrawler

Now that you have access to the target scraping URLs in your code, initialize a PlaywrightCrawler instance as below:

crawler = PlaywrightCrawler(
    headless=True, # Set to False during development to observe scraper behavior
    browser_launch_options={
        "args": ["--disable-gpu", "--no-sandbox"],  # Required for Playwright on Apify
    },
    max_request_retries=10,
    concurrency_settings = ConcurrencySettings(
        min_concurrency=1,
        max_concurrency=1,
    ),
)

This sets up a browser-based Crawlee crawler that runs in headless mode, retries failed requests up to 5 times, and enforces a concurrency of 1 (which is suitable in this example, since there's only one page to scrape). The remaining argspassed to the browser are necessary for running Playwright in Apify’s cloud environment.


Note: While developing locally, you should set headless=False to visually debug and better understand how your Expedia scraper interacts with the target page.

Don’t forget to import ConcurrencySettings with:

from crawlee import ConcurrencySettings

Step 5: Complete the scraper infrastructure

Crawlee relies on request handlers, which are user-defined functions responsible for processing individual requests and their corresponding responses. To scrape the target Expedia page, add a handler like this:

@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
    # Retrieve the page URL from the Playwright context
    url = context.request.url
    Actor.log.info(f"Scraping {url}...")

    # Expedia scraping logic...

Then, after defining the function but still inside the async with Actor block, write the following line to start the crawler:

await crawler.run(urls)

The run() function launches the crawler and processes all the URLs provided. In other words, it passes the target URLs to the Playwright crawler, which handles them as specified in request_handler().

Step 6: Prepare to scrape hotel listings

It's time to start writing the actual scraping logic in request_handler(). Before diving into code, take a moment to understand the behavior and structure of the target Expedia page by opening it in your browser.

If that's your first visit, you'll likely encounter a modal like this:

The Expedia blocking modal
The Expedia blocking modal

This modal blocks interaction with the rest of the page, interfering with your Playwright scraper. If it shows up, you need to close it. Right-click on the “×" (close button) and select the “Inspect” option:

Inspecting the “×” button
Inspecting the “×” button

You'll see that the close button can be selected using the following CSS selector:

#close-fee-inclusive-pricing-sheet

So, in your request_handler(), add logic to close the modal if it appears:

try:
    # Wait up to 10 seconds for the modal to appear and close it if it shows up
    modal_close_button_element = context.page.locator("#close-fee-inclusive-pricing-sheet")
    await modal_close_button_element.click(timeout=10000)
except TimeoutError:
    print("Modal close button not found or not clickable, continuing...")

When using click() or similar actions, Playwright applies an auto-wait mechanism by default (with a 30-second timeout). In this case, you can reduce the timeout to 10 seconds so the scraper doesn't wait too long for a modal that might not appear.

If the modal appears, it’ll be closed by clicking the close button. If it doesn’t, Playwright will throw a TimeoutError, which is caught to avoid stopping your scraper.

Don’t forget to import the TimeoutError class with:

from playwright.async_api import TimeoutError

With the modal out of the way, you can now access the hotel listing cards beneath it. Right-click and inspect one of the hotel cards:

Inspecting one of the hotel HTML cards on the page
Inspecting one of the hotel HTML cards on the page

From inspection, you’ll notice that each hotel card can be selected using the following CSS selector:

[data-stid="property-listing-results"] [data-stid="lodging-card-responsive"]

Targeting data-* attributes is a best practice in scraping because they tend to remain stable across deployments. These attributes are often added specifically for testing and automation, unlike CSS class names that may change frequently.

Select the hotel listing cards and prepare them for scraping:

# Select all HTML hotel card elements on the page
card_elements = context.page.locator("[data-stid=\\"property-listing-results\\"] [data-stid=\\"lodging-card-responsive\\"]")
# Wait for the elements to be rendered on the page
await card_elements.nth(5).wait_for(state="attached", timeout=10000)

# Iterate over all card elements and apply the scraping logic
for card_element in await card_elements.all():
    # Hotel card scraping logic...

Since the hotel listing cards are loaded dynamically, you must wait for them to be attached to the DOM. Expedia initially shows around 3 hotel listings and then renders more shortly afterward. Waiting for the 5th card is enough to verify that all listings are attached to the DOM before scraping.

Currently, Expedia loads up to 100 listings by default, with an option to load more using the “Show more” button at the bottom. So, the expected result is a dataset of hotel listing data with 100 entries.

Step 7: Implement the hotel listing scraping logic

Inside the for loop, you now have to select all relevant HTML elements from each hotel listing card and extract the data.

Start by inspecting a single hotel card, focusing on the link element:

Inspecting the hotel card link element
Inspecting the hotel card link element

You can select the link and extract the hotel listing URL with:

url_element = card_element.locator("[data-stid=\\"open-hotel-information\\"]")
hotel_url = "https://expedia.com" + await url_element.get_attribute("href")

Note that the <a> tag contains a relative URL, so you must prepend the base domain (https://expedia.com) to convert it into an absolute URL.

Next, inspect the hotel name and location:

Inspecting the hotel name and location elements
Inspecting the hotel name and location elements

Collect them both with:

name_element = card_element.locator("h3.uitk-heading")
name = await name_element.first.text_content()

location_element = card_element.locator("h3.uitk-heading + div")
location = await location_element.first.text_content()

Since there's no unique selector for the location, use the + div CSS selector to grab the next sibling <div> following the hotel name heading.

Then, extract the review score and number of reviews:

Inspecting the review elements
Inspecting the review elements

Selecting the element that contains the number of reviews is a bit challenging, as there isn’t a direct and reliable CSS selector. So, you can select all .uitk-type-200 elements and filter the one that contains the word “reviews” like this:

review_score_element = card_element.locator(".uitk-badge-base-text[aria-hidden=\\"true\\"]")
review_score = await review_score_element.first.text_content()

review_number = None
# Locate all elements that might contain the number of reviews
possible_review_number_elements = card_element.locator(".uitk-type-200")
for possible_review_number_element in await possible_review_number_elements.all():
    possible_review_number_element_text = await possible_review_number_element.first.text_content()
    # If the text contains "reviews", extract the number
    if "reviews" in possible_review_number_element_text:
        # Extract the number, remove commas, and convert to int
        review_number = int(possible_review_number_element_text.split()[0].replace(",", ""))
        break

Finally, tackle the price elements:

Inspecting the nightly price element
Inspecting the nightly price element
Inspecting the total price element
Inspecting the total price element

Again, you can apply a similar scraping logic as we did before:

price_summary_element = card_element.locator("[data-test-id=\\"price-summary-message-line\\"]")
price_summary = await price_summary_element.first.text_content()

total_price = None
# Locate all elements that might contain the total price
possible_total_price_elements = card_element.locator(".uitk-type-400")
for possible_total_price_element in await possible_total_price_elements.all():
    possible_total_price_element_text = await possible_total_price_element.first.text_content()
    # If the text contains "total", extract the price
    if "total" in possible_total_price_element_text:
        total_price = possible_total_price_element_text.split()[0]
        break

Step 8: Store the scraped data

Inside the for loop, create a new hotel object with the scraped data and append it to the Actor dataset through the push_data() method:

hotel = {
    "url": hotel_url,
    "name": name,
    "location": location,
    "review_score": review_score,
    "review_number": review_number,
    "price_summary": price_summary,
    "total_price": total_price,
}
await context.push_data(hotel)

Once the for loop completes, the Actor dataset should contain all 100 hotel listings from the target page. After deploying the Actor to Apify, you’ll be able to explore the scraped data directly from your account. Additionally, you’ll be able to access it via the API.

Step 9: Integrate residential proxies

If you deploy your Expedia scraper on a server and run it, you’ll likely encounter an Expedia error page. Alternatively, after just a few requests, you might start receiving 429 Too Many Requests errors.

To overcome this limitation, you can route your requests through residential proxies. These allow you to make your traffic appear as if it’s coming from real users, while also providing IP rotation capabilities to avoid rate limits and blocks.

Keep in mind that Apify offers residential proxies, even on the free plan. Alternatively, you can also bring your own proxies, as explained in the documentation.

To use Apify's residential proxies, create a proxy configuration with create_proxy_configuration() as below:

proxy_configuration = await Actor.create_proxy_configuration(
    groups=["RESIDENTIAL"],
    country_code="US",
)

This sets up a proxy integration using US-based residential IPs. Based on our tests, US residential proxies tend to work best when scraping Expedia.

If you’re wondering, the "RESIDENTIAL" group name corresponds to the residential proxy group in your Apify dashboard. You can verify it under the “Proxy > Groups” page in the Apify dashboard:

Note the group name of the “Residential” proxy section
Note the group name of the “Residential” proxy section

Next, pass the proxy_configuration object in the PlaywrightCrawler constructor:

crawler = PlaywrightCrawler(
    # other congifs..
    proxy_configuration=proxy_configuration
)

PlaywrightCrawler will now route requests through the configured residential proxies.

Step 10: Put it all together

The main.py file should contain this code:

# src/main.py

from __future__ import annotations

from apify import Actor
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from crawlee import ConcurrencySettings
from playwright.async_api import TimeoutError

async def main() -> None:
    async with Actor:
        # Retrieve the Actor input, and use default values if not provided
        actor_input = await Actor.get_input() or {}

        # Extract URLs from the start_urls inout array
        urls = [
            url_object.get("url") for url_object in actor_input.get("start_urls")
        ]

        # Exit if no URLs are provided
        if not urls:
            Actor.log.info("No URLs specified in Actor input, exiting...")
            await Actor.exit()

				# Residential proxy integration
				proxy_configuration = await Actor.create_proxy_configuration(
				    groups=["RESIDENTIAL"],
				    country_code="US",
				)

        # Create a Playwright crawler through Crawlee
        crawler = PlaywrightCrawler(
            headless=True, # Set to False during development to observe scraper behavior
            browser_launch_options={
                "args": ["--disable-gpu", "--no-sandbox"],  # Required for Playwright on Apify
            },
            max_request_retries=10,
            concurrency_settings = ConcurrencySettings(
                min_concurrency=1,
                max_concurrency=1,
            ),
            proxy_configuration=proxy_configuration
        )

        # Define a request handler, which will be called for every URL
        @crawler.router.default_handler
        async def request_handler(context: PlaywrightCrawlingContext) -> None:
            # Retrieve the page URL from the Playwright context
            url = context.request.url
            Actor.log.info(f"Scraping {url}...")

            try:
                # Wait up to 10 seconds for the modal to appear and close it if it shows up
                modal_close_button_element = context.page.locator("#close-fee-inclusive-pricing-sheet")
                await modal_close_button_element.click(timeout=10000)
            except TimeoutError:
                print("Modal close button not found or not clickable, continuing...")

            # Select all HTML hotel card elements on the page
            card_elements = context.page.locator("[data-stid=\\"property-listing-results\\"] [data-stid=\\"lodging-card-responsive\\"]")
            # Wait for the elements to be rendered on the page
            await card_elements.nth(5).wait_for(state="attached", timeout=10000)

            # Iterate over all card elements and apply the scraping logic
            for card_element in await card_elements.all():
                url_element = card_element.locator("[data-stid=\\"open-hotel-information\\"]")
                hotel_url = "<https://expedia.com>" + await url_element.get_attribute("href")

                name_element = card_element.locator("h3.uitk-heading")
                name = await name_element.first.text_content()

                location_element = card_element.locator("h3.uitk-heading + div")
                location = await location_element.first.text_content()

                review_score_element = card_element.locator(".uitk-badge-base-text[aria-hidden=\\"true\\"]")
                review_score = await review_score_element.first.text_content()

                review_number = None
                # Locate all elements that might contain the number of reviews
                possible_review_number_elements = card_element.locator(".uitk-type-200")
                for possible_review_number_element in await possible_review_number_elements.all():
                    possible_review_number_element_text = await possible_review_number_element.first.text_content()
                    # If the text contains "reviews", extract the number
                    if "reviews" in possible_review_number_element_text:
                        # Extract the number, remove commas, and convert to int
                        review_number = int(possible_review_number_element_text.split()[0].replace(",", ""))
                        break

                price_summary_element = card_element.locator("[data-test-id=\\"price-summary-message-line\\"]")
                price_summary = await price_summary_element.first.text_content()

                total_price = None
                # Locate all elements that might contain the total price
                possible_total_price_elements = card_element.locator(".uitk-type-400")
                for possible_total_price_element in await possible_total_price_elements.all():
                    possible_total_price_element_text = await possible_total_price_element.first.text_content()
                    # If the text contains "total", extract the price
                    if "total" in possible_total_price_element_text:
                        total_price = possible_total_price_element_text.split()[0]
                        break

                # Initialize a new hotel entry with the scraped data
                hotel = {
                    "url": hotel_url,
                    "name": name,
                    "location": location,
                    "review_score": review_score,
                    "review_number": review_number,
                    "price_summary": price_summary,
                    "total_price": total_price,
                }
                # Store the extracted data in the Actor dataset
                await context.push_data(hotel)

        # Run the Playwright crawler on the input URLs
        await crawler.run(urls)

In around 100 lines, you've built a Python-based hotel listing Actor to scrape Expedia.

Run it locally to verify that it works with:

apify run

Note that on a free Apify plan, you can only use residential proxies on deployed Actors. If you try to run the Actor locally, the connection to the proxy server will fail, and you’ll get this error:

The "Proxy external access" feature is not enabled for your account. Please upgrade your plan or contact support@apify.com

So, if you're on a free plan, you need to deploy your Actor to test it (we’ll show you how in the next step).


If you configure the Playwright crawler to start in headed mode, you’ll see:

The Chromium instance controlled by Playwright
The Chromium instance controlled by Playwright

In the terminal, you’ll notice an output like this:

[apify] INFO  Initializing Actor...
[crawlee.crawlers._playwright._playwright_crawler] INFO  Current request statistics:
┌───────────────────────────────┬──────────┐
│ requests_finished             │ 0        │
│ requests_failed               │ 0        │
│ retry_histogram               │ [0]      │
│ request_avg_failed_duration   │ None     │
│ request_avg_finished_duration │ None     │
│ requests_finished_per_minute  │ 0        │
│ requests_failed_per_minute    │ 0        │
│ request_total_duration        │ 0.0      │
│ requests_total                │ 0        │
│ crawler_runtime               │ 0.011538 │
└───────────────────────────────┴──────────┘
[crawlee._autoscaling.autoscaled_pool] INFO  current_concurrency = 0; desired_concurrency = 2; cpu = 0; mem = 0; event_loop = 0.0; client_info = 0.0
[apify] INFO  Scraping https://www.expedia.com/Hotel-Search?destination=Madrid%20Centro%2C%20Madrid%2C%20Community%20of%20Madrid%2C%20Spain&regionId=553248631822337314&latLong&d1=2025-07-25&startDate=2025-07-25&d2=2025-07-28&endDate=2025-07-28&adults=2&rooms=1&children=&pwaDialog=&daysInFuture&stayLength&sort=RECOMMENDED&upsellingNumNightsAdded=&theme=&userIntent=&semdtl=&upsellingDiscountTypeAdded=&categorySearch=&useRewards=false...
[crawlee.crawlers._playwright._playwright_crawler] INFO  Final request statistics:
┌───────────────────────────────┬────────────┐
│ requests_finished             │ 1          │
│ requests_failed               │ 0          │
│ retry_histogram               │ [0, 1]     │
│ request_avg_failed_duration   │ None       │
│ request_avg_finished_duration │ 30.639641  │
│ requests_finished_per_minute  │ 1          │
│ requests_failed_per_minute    │ 0          │
│ request_total_duration        │ 30.639641  │
│ requests_total                │ 1          │
│ crawler_runtime               │ 75.402857  │
└───────────────────────────────┴────────────┘
[apify] INFO  Exiting Actor ({"exit_code": 0})

These logs confirm that your Expedia scraper ran successfully. In the datasets/default/ folder, you should find 100 JSON files:

The output produced by the Expedia scraper
The output produced by the Expedia scraper

Each contains the data scraped from an individual hotel listing.

Great! It only remains to deploy your Expedia scraping script to Apify and run it in the cloud.

Bonus step: Deploy to Apify

Now we'll show you how to deploy your scraper to the Apify platform. This unlocks cloud execution, API access, and scheduling capabilities, so you can run the scraper at scale without managing infrastructure.

First, log in to your Apify account in the CLI with this command:

apify login

Then, you can deploy your Actor to Apify with:

apify push

Access your Apify dashboard and navigate to the Actors page. You’ll see your newly deployed “Expedia Scraper“ Actor:

Note the “Expedia Scraper” Actor
Note the “Expedia Scraper” Actor

Click on it, and in the Actor page, open the “Input” tab to configure it. In the “Start URLs” field, replace the default URL with the target Expedia URL:

Setting the input URL
Setting the input URL

Once done, press the “Save & Start” button to run your Expedia scraping Actor in the cloud. After a few seconds of running, that’s what you should see:

The output produced by the “Expedia Scraper” Actor
The output produced by the “Expedia Scraper” Actor
📌
Note: The output shown here may differ from the screenshots because Expedia updates frequently. In detail, hotel listings are constantly changing.

The “Output” tab now contains 100 hotel listings in structured format, just as expected. To download the scraped data, click the “Export” button in the top right corner. You can export the dataset in various formats, including:

  • JSON
  • CSV
  • XML
  • Excel
  • HTML Table
  • RSS
  • JSONL

For example, export it as a CSV file, open it, and you’ll see the extracted hotel listings neatly organized:

The output produced by your Actor, in tabular form
The output produced by your Actor, in tabular form

Et voilà! You’ve successfully scraped hotel data from Expedia.

Next steps

This tutorial covered the basics of scraping Expedia. To level up your script, explore these advanced techniques:

  • Automated interaction: Use Playwright to interact with optional filters (e.g., star ratings or amenities) to refine hotel listings before scraping.
  • Load More: Automate clicking the "Show more" button to load additional listings and ensure you're capturing the full dataset.

Using a ready-made Expedia scraper

Building a production-ready travel data scraper for Expedia is no easy task. One of the main challenges is that the Expedia site frequently changes its pages, forcing you to update your parsing logic. Also, too many requests can trigger anti-scraping mechanisms.

The easiest way to overcome them is by using a pre-made scraper that handles everything for you. Apify offers over 6,000 web scraping tools (Actors) for various websites, including around 40 specifically for Expedia.

If you want to scrape Expedia listings without building a scraper from scratch, just go to Apify Store and search for the "Expedia” keyword:

The Expedia Actors on Apify Store
The Expedia Actors on Apify Store

Select the “Expedia Hotels 3.0a" Actor and click "Try for free" on its public page:

Clicking the “Try for free” button on the “Expedia Hotels 3.0a” Actor
Clicking the “Try for free” button on the “Expedia Hotels 3.0a” Actor

The Actor will be added to your Apify dashboard. Configure it based on your needs and click "Start" to launch it:

Clicking the “Start” button
Clicking the “Start” button

To test it for free, you must rent the Actor by clicking the “Rent” button:

Clicking the “Rent Actor” button
Clicking the “Rent Actor” button

Then, press the “Save & Start” button:

Clicking the “Save & Start” button
Clicking the “Save & Start” button

Within seconds, you’ll see hotel listing data like this:

The output produced by the “Expedia Hotels 3.0a” Actor
The output produced by the “Expedia Hotels 3.0a” Actor

And that's it! You’ve successfully scraped Expedia hotel listings in just a few clicks. Note that you can also call any Actor programmatically via API integration.

Conclusion

In this tutorial, you used Apify’s “Crawlee + Playwright + Chrome” template to build an Expedia web scraper. With this, you extracted hotel listings from the platform and deployed the scraper on Apify.

This project demonstrated how Apify enables efficient, scalable web scraping while saving development time. You can explore other templates and SDKs to try additional scraping and automation features.

As shown here, using a pre-built Expedia Actor is the recommended way to simplify travel data extraction.

Frequently asked questions

Why scrape Expedia?

You want to scrape Expedia because it hosts millions of hotel listings worldwide, along with support for car rentals and flight bookings. This makes it a treasure trove of travel data that helps you build market analysis insights on pricing, availability, reviews, and more.

Can you scrape Expedia?

Yes, you can scrape Expedia using browser automation tools like Playwright, Selenium, and Puppeteer. That's required because Expedia loads data dynamically via JavaScript, so you must render its page in a browser. Also, you need tools that can support user interaction simulation.

Yes, it is legal to scrape Expedia as long as you focus only on public data (data that isn’t behind a login wall). You can learn more in this detailed guide to the legality of web scraping.

How to scrape Expedia?

To scrape Expedia, use browser automation tools like Playwright. This helps you handle dynamic content and JavaScript rendering. Navigate to the target page, handle potential pop-ups, and apply the parsing logic to collect hotel listings. Integrate proxies to avoid blocks.

On this page

Build the scraper you want

No credit card required

Start building