Scraping web data from Yahoo Finance can be challenging for three reasons:
- It has a complex HTML structure
- It's updated frequently
- It requires precise CSS or XPath selectors
This guide will show you how to overcome these challenges using Python, step by step. You'll find this tutorial easy to follow, and by the end, you'll have fully functional code ready to extract the financial data you need from Yahoo Finance.
Does Yahoo Finance allow scraping?
You can generally scrape Yahoo Finance, and most of the data available on the website is publicly accessible. However, if you want to practice ethical web scraping, it's important to respect their terms of service and avoid overwhelming their servers.
How to scrape Yahoo Finance using Python
Follow this step-by-step tutorial to learn how to create a web scraper for Yahoo Finance using Python.
1. Setup and prerequisites
Before you start scraping Yahoo Finance with Python, make sure your development environment is ready:
- Install Python: Download and install the latest version of Python from the official Python website.
- Choose an IDE: Use an IDE like PyCharm, Visual Studio Code, or Jupyter Notebook for your development work.
- Basic knowledge: Make sure you understand CSS selectors and are comfortable using browser DevTools to inspect page elements.
Next, create a new project using Poetry:
This command will generate the following project structure:
Navigate into the project directory and install Playwright:
Yahoo Finance uses JavaScript to load content dynamically. Playwright can render JavaScript, making it suitable for scraping dynamic content from Yahoo Finance.
Open the pyproject.toml
file to check your project's dependencies, which should include:
Finally, create a main.py
file within the yahoo_finance_scraper
folder to write your scraping logic.
Your updated project structure should look like this:
Your environment is now set up, and you're ready to start writing the Python Playwright code to scrape Yahoo Finance.
Note: If you prefer not to set up all this on your local machine, you can deploy your code directly on Apify. Later in this tutorial, I'll show you how to deploy and run your scraper on Apify.
2. Connect to the target Yahoo Finance page
To begin, let's launch a Chromium browser instance using Playwright. While Playwright supports various browser engines, we'll use Chromium for this tutorial:
To run this script, you'll need to execute the main()
function using an event loop at the end of your script.
Next, navigate to the Yahoo Finance page for the stock you want to scrape. The URL for a Yahoo Finance stock page looks like this:
A ticker symbol is a unique code that identifies a publicly traded company on a stock exchange, such as AAPL
for Apple Inc. or TSLA
for Tesla, Inc. When the ticker symbol changes, the URL also changes. Therefore, you should replace {ticker_symbol}
with the specific stock ticker you want to scrape.
Here's the complete script so far:
When you run this script, it will open the Yahoo Finance page for some seconds before terminating.
Great! Now, you just have to change the ticker symbol to scrape data for any stock of your choice.
Note: launching the browser with the UI (headless=False
) is perfect for testing and debugging. If you want to save resources and run the browser in the background, switch to headless mode:
3. Bypass cookies modal
When accessing Yahoo Finance from a European IP address, you may encounter a cookies consent modal that needs to be addressed before you can proceed with scraping.
To continue to the desired page, you'll need to interact with the modal by clicking "Accept all" or "Reject all." To do this, right-click on the "Accept All" button and select "Inspect" to open your browser's DevTools:
In the DevTools, you can see that the button can be selected using the following CSS selector:
To automate clicking this button in Playwright, you can use the following script:
This script will attempt to click the "Accept All" button in case the cookies consent modal shows up. This lets you continue scraping without interruption.
4. Inspect the page to select elements to scrape
To effectively scrape data, you first need to understand the DOM structure of the webpage. Suppose you want to extract the regular market price (224.72), change (+3.00), and change percent (+1.35%). These values are all contained within a div
element. Inside this div
, you'll find three fin-streamer
elements, each representing the market price, change, and percent, respectively.
To target these elements precisely, you can use the following CSS selectors:
Great! Next, let's look at how to extract the market close time, which is displayed as "4 PM EDT" on the page.
To select the market close time, use this CSS selector:
Now, let’s move on to extract critical company data like market cap, previous close, and volume from the table:
As you can see, the data is structured as a table with multiple li
tags representing each field, starting from "Previous Close" and ending at "1y Target Est".
To extract specific fields like "Previous Close" and "Open", you can use the data-field
attribute, which uniquely identifies each element:
The data-field
attribute provides a straightforward way to select the elements. However, there may be cases where such an attribute isn't present. For instance, extracting the "Bid" value, which lacks a data-field
attribute or any unique identifier. In this case, we’ll first locate the "Bid" label using its text content and then move to the next sibling element to extract the corresponding value.
Here's a combined selector you can use:
5. Scrape the stock data
Now that you've identified the elements you need, it's time to write the Playwright script to extract the data from Yahoo Finance.
Let’s define a new function named scrape_data
that will handle the scraping process. This function takes a ticker symbol, navigates to the Yahoo Finance page, and returns a dictionary containing the extracted financial data.
Here's how it works:
The code extracts data using the identified CSS selectors, utilizes the locator
method to fetch each element, and then applies the text_content()
method to retrieve the text from these elements. The extracted metrics are stored in a dictionary, where each key represents a financial metric, and the corresponding value is the extracted text.
Finally, define the main
function that orchestrates the entire process by iterating over each ticker and collecting data
At the end of the scraping process, the following data will be printed in the console:
6. Scrape historical stock data
After getting real-time data, let's look at Yahoo Finance's historical stock information. This data shows how a stock performed in the past, which helps with investment choices. You can get daily, weekly, or monthly data for different periods — last month, last year, or the stock's entire history.
To access historical stock data on Yahoo Finance, you need to customize the URL by modifying specific parameters:
frequency
: Specifies the data interval, such as daily (1d
), weekly (1wk
), or monthly (1mo
).period1
andperiod2
: These parameters set the start and end dates for the data in Unix timestamp format.
For example, the below URL retrieves weekly historical data for Amazon (AMZN) from August 16, 2023, to August 16, 2024:
When you navigate to this URL, you’ll see a table containing the historical data. In our case, the data shown is for the last year with a weekly interval.
To extract this data, you use the query_selector_all
method in Playwright and CSS selector .table tbody tr
:
Each row contains multiple cells ( tags) that hold the data. Here's how to extract the text content from each cell:
Next, create a function to generate Unix timestamps, which we'll use to define the start (period1
) and end (period2
) dates for the data:
Now, let's write a function to scrape historical data:
The scrape_historical_data
function constructs the Yahoo Finance URL using the given parameters, navigates to the page while managing any cookie prompts, waits for the historical data table to load fully, and then extracts and prints the relevant data to the console.
Finally, here's how to run the script with different configurations:
Customize the data period and frequency by adjusting the parameters:
Here’s the complete script so far to scrape the historical data from Yahoo Finance:
Run this script to print all the historical stock data to the console based on your specified parameters.
7. Scrape multiple stocks
So far, we've scraped data for a single stock. To gather data for multiple stocks at once, we can modify the script to accept ticker symbols as command-line arguments and process each one.
To run the script, pass the ticker symbols as arguments:
This will scrape and display data for Apple Inc. (AAPL), Microsoft Corporation (MSFT), and Tesla Inc. (TSLA).
8. Avoid getting blocked
Websites often spot and stop automated scraping. They use rate limits, IP blocks, and check browsing patterns. Here are some effective ways to stay undetected when web scraping:
1. Random Intervals Between Requests
Adding random delays between requests is a simple way to avoid detection. This basic method can make your scraping less obvious to websites.
Here's how to add random delays in your Playwright script:
This script introduces a random delay of 2 to 5 seconds between requests, making the actions less predictable and reducing the likelihood of being flagged as a bot.
2. Setting and Switching User-Agents
Websites often use User-Agent strings to identify the browser and device behind each request. By rotating User-Agent strings, you can make your scraping requests appear to come from different browsers and devices, helping you avoid detection.
Here's how to implement User-Agent rotation in Playwright:
This method uses a list of User-Agent strings and randomly selects one for each request. This technique helps mask your scraper's identity and reduces the likelihood of being blocked.
Note: You can refer to websites like useragentstring.com to get a comprehensive list of User-Agent strings.
3. Using Playwright-Stealth
To further minimize detection and enhance your scraping efforts, you can use the playwright-stealth library, which applies various techniques to make your scraping activities look like a real user.
First, install playwright-stealth
:
Then, modify your script:
These techniques can help avoid blocking, but you might still face issues. If so, try more advanced methods like using proxies, rotating IP addresses, or implementing CAPTCHA solvers. You can check out the detailed guide 21 tips to crawl websites without getting blocked. It’s your go-to guide on choosing proxies wisely, fighting Cloudflare, solving CAPTCHAs, avoiding honeytraps, and more.

Blocked again? Apify Proxy will get you through
Improve the performance of your scrapers by smartly rotating datacenter and residential IP addresses.
Available on all Apify plans
9. Export scraped stock data to CSV
After scraping the desired stock data, the next step is to export it into a CSV file to make it easy to analyze, share with others, or import into other data processing tools.
Here's how you can save the extracted data to a CSV file:
The code starts by gathering data for each ticker symbol. After that, it creates a CSV file named stock_data.csv
. It then uses Python's csv.DictWriter
method to writing the data, starting with the column headers using the writeheader()
method, and adding each row of data using the writerows()
method.
10. Putting everything together
Let’s pull everything together into a single script. This final code snippet includes all the steps from scraping data from Yahoo Finance to exporting it to a CSV file.
You can run the script from the terminal by providing one or more stock ticker symbols as command-line arguments.
After running the script, the CSV file named stock_data.csv
will be created in the same directory. This file will contain all the data in an organized way. The CSV file will look like this:
11. Deploying the code to Apify
With your scraper ready, it’s time to deploy it to the cloud using Apify. This will allow you to run your scraper on a schedule and utilize Apify’s powerful features. For this task, we’ll use the Python Playwright template for a quick setup. On Apify, scrapers are called Actors.
Start by cloning the Playwright + Chrome template from the Apify Python template repository.
To get started, you'll need to install the Apify CLI, which will help you manage your Actor. On macOS or Linux, you can do this using Homebrew:
Or, via NPM:
With the CLI installed, create a new Actor using the Python Playwright + Chrome template:
This command will set up a project named yf-scraper
in your directory. It installs all the necessary dependencies and provides some boilerplate code to get you started.
Navigate to your new project folder and open it with your favorite code editor. In this example, I’m using VS Code:
The template comes with a fully functional scraper. You can test it by running the command apify run
to see it in action. The results will be saved in storage/datasets
.
Next, modify the code in src/main.py
to tailor it for scraping Yahoo Finance.
Here’s the modified code:
Before running the code, update the input_schema.json
file in the .actor/
directory to include the Yahoo Finance quote page URL and also add a tickers
field.
Here's the updated input_schema.json
file:
Also, update the input.json
file by changing the URL to the Yahoo Finance page to prevent conflicts during execution or you can simply delete this file.
To run your Actor, run this command in your terminal:
The scraped results will saved in storage/datasets
, where each ticker will have its own JSON file as shown below:
To deploy your Actor, first create an Apify account if you don’t already have one. Then, get your API Token from Apify Console under Settings → Integrations, and finally log in with your token using the following command:
Finally, push your Actor to Apify with:
After a few moments, your Actor should appear in the Apify Console under Actors → My actors.
Your scraper is now ready to run on the Apify platform. Click the "Start" button to begin. Once the run is complete, you can preview and download your data in various formats from the "Storage" tab.
Bonus: A key advantage of running your scrapers on Apify is the option to save different configurations for the same Actor and set up automatic scheduling. Let's set this up for our Playwright Actor.
On the Actor page, click on Create empty task.
Next, click on Actions and then Schedule.
Finally, select how often you want the Actor to run and click Create.
Perfect! Your Actor is now set to run automatically at the time you specified. You can view and manage all your scheduled runs in the "Schedules" tab of the Apify platform.
To begin scraping with Python on the Apify platform, you can utilize Python code templates. These templates are available for popular libraries such as Requests, Beautiful Soup, Scrapy, Playwright, and Selenium. Using these templates allows you to quickly build scrapers for various web scraping tasks.
Does Yahoo Finance have an API?
Yahoo Finance provides a free API that gives users access to a wealth of financial information. This includes real-time stock quotes, historical market data, and the latest financial news. The API offers various endpoints, allowing you to retrieve information in different formats like JSON, CSV, and XML. You can easily integrate the data into your projects, using it in whatever way best suits your needs.
Yahoo Finance is open for business
You've built a practical system to extract financial data from Yahoo Finance using Playwright. This code handles multiple ticker symbols and saves the results to a CSV file. You've learned how to navigate around blocking mechanisms, keeping your scrapers up and running.
The Apify platform and its Actor framework now allow you to scale your web scraping efforts. You can schedule your scraper to run when it's most useful for you. We hope you can put these tools to work for you and your understanding of the markets.

Deploy your scraping code to the cloud
Headless browsers, infrastructure scaling, sophisticated blocking.
Meet Apify - the full-stack web scraping and browser automation platform that makes it all easy.