Scraping web data from Google Finance can be challenging for three reasons:
It has a complex HTML structure
It's updated frequently
It requires precise CSS or XPath selectors
This guide will show you how to overcome these challenges using Python, step by step. You'll find this tutorial easy to follow, and by the end, you'll have fully functional code ready to extract the financial data you need from Google Finance.
Does Google Finance allow scraping?
Yes, you can generally scrape Google Finance. Most of the data available on the Google Finance website is publicly accessible. However, you should respect their terms of service and avoid overwhelming their servers.
How to scrape Google Finance using Python
Follow this step-by-step tutorial to learn how to create a web scraper for Google Finance using Python.
1. Setup and prerequisites
Before you start, make sure your development environment is ready:
Install Python: Download and install the latest version of Python from the official Python website.
Choose an IDE: Use an IDE like PyCharm, Visual Studio Code, or Jupyter Notebook for your development work.
Navigate into the project directory and install Playwright:
cd google-finance-scraper
poetry add playwright
poetry run playwright install
Google Finance uses JavaScript to load content dynamically. Playwright can render JavaScript, making it suitable for scraping dynamic content from Google Finance.
Open the pyproject.toml file to check your project's dependencies, which should include:
Note: At the time of writing, the version of playwright is 1.46.0, but it may change. Check for the latest version and update your pyproject.toml if necessary.
Finally, create a main.py file within the google_finance_scraper folder to write your scraping logic.
Your updated project structure should look like this:
Your environment is now set up, and you're ready to start writing the Python Playwright code to scrape Google Finance.
2. Connect to the target Google Finance page
To begin, let's launch a Chromium browser instance using Playwright. While Playwright supports various browser engines, we'll use Chromium for this tutorial:
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as playwright:
browser = await playwright.chromium.launch(headless=False) # Launch a Chromium browser
context = await browser.new_context()
page = await context.new_page()
if __name__ == "__main__":
asyncio.run(main())
To run this script, you'll need to execute the main() function using an event loop at the end of your script.
Next, navigate to the Google Finance page for the stock you want to scrape. The URL format for a Google Finance stock page looks like this:
A ticker symbol is a unique code that identifies a publicly traded company on a stock exchange, such as AAPL for Apple Inc. or TSLA for Tesla, Inc. When the ticker symbol changes, the URL also changes. Therefore, you should replace {ticker_symbol} with the specific stock ticker you want to scrape.
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as playwright:
# ...
ticker_symbol = "AAPL:NASDAQ" # Replace with the desired ticker symbol
google_finance_url = f"https://www.google.com/finance/quote/{ticker_symbol}"
await page.goto(google_finance_url) # Navigate to the Google Finance page
if __name__ == "__main__":
asyncio.run(main())
Here's the complete script so far:
import asyncio
from playwright.async_api import async_playwright
async def main():
async with async_playwright() as playwright:
# Launch a Chromium browser
browser = await playwright.chromium.launch(headless=False)
context = await browser.new_context()
page = await context.new_page()
ticker_symbol = "AAPL:NASDAQ" # Replace with the desired ticker symbol
google_finance_url = f"https://www.google.com/finance/quote/{ticker_symbol}"
# Navigate to the Google Finance page
await page.goto(google_finance_url)
# Wait for a few seconds
await asyncio.sleep(3)
# Close the browser
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
When you run this script, it will open the Google Finance page for some seconds before terminating.
Great! Now, you just have to change the ticker symbol to scrape data for any stock of your choice.
Note that launching the browser with the UI (headless=False) is perfect for testing and debugging. If you want to save resources and run the browser in the background, switch to headless mode:
To effectively scrape data, you first need to understand the DOM structure of the webpage. Suppose you want to extract the regular market price ($229.79), change (+1.46), and change percent (+3.30%). These values are all contained within a div element.
You can use selectors div.YMlKec.fxKbKc to extract the price, div.enJeMd div.JwB6zf for the percentage change, and span.P2Luy.ZYVHBb for the value change from Google Finance.
Great! Next, let's look at how to extract the market close time, which is displayed as "06:02:19 UTC-4" on the page.
To select the market close time, use this CSS selector:
//div[contains(text(), "Closed:")]
Now, let’s move on to extract critical company data like market cap, previous close, and volume from the table:
As you can see, the data is structured in the table, and multiple div tags represent each field, starting from "Previous Close" and ending at "Primary Exchange".
You can use the selectors .mfs7Fc to extract labels and .P6K39c to extract corresponding values from the Google Finance table. These selectors target elements by their class names, allowing you to retrieve and process the table's data in pairs.
.mfs7Fc
.P6K39c
4. Scrape the stock data
Now that you've identified the elements you need, it's time to write the Playwright script to extract the data from Google Finance.
Let’s define a new function named scrape_data that will handle the scraping process. This function takes a ticker symbol, navigates to the Google Finance page, and returns a dictionary containing the extracted financial data.
The code first navigates to the stock's page and extracts various metrics like price and market cap using query_selectorand query_selector_all, which are the common Playwright methods to select and fetch the data from the elements based on CSS selectors and XPath queries.
After that, the text from the elements is extracted using inner_text() and stored in a dictionary, where each key represents a financial metric (e.g., price, market cap), and each value is the corresponding extracted text. At last, the browser session is closed to free up resources.
Now, define the main function that orchestrates the entire process by iterating over each ticker and collecting data.
async def main():
# Define the ticker symbol
ticker = "AAPL"
# Append ":NASDAQ" to the ticker for the Google Finance URL
ticker = f"{ticker}:NASDAQ"
async with async_playwright() as playwright:
# Collect data for the ticker
data = await scrape_data(playwright, ticker)
print(data)
# Run the main function
if __name__ == "__main__":
asyncio.run(main())
At the end of the scraping process, the following data will be printed in the console:
5. Scrape multiple stocks
So far, we've scraped data for a single stock. To gather data for multiple stocks at once from Google Finance, we can modify the script to accept ticker symbols as command-line arguments and process each one. Make sure to import the sysmodule.
import sys
async def main():
# Get ticker symbols from command line arguments
if len(sys.argv) < 2:
print("Please provide at least one ticker symbol as a command-line argument.")
sys.exit(1)
tickers = sys.argv[1:]
async with async_playwright() as playwright:
results = []
for ticker in tickers:
data = await scrape_data(playwright, f"{ticker}:NASDAQ")
results.append(data)
print(results)
# Run the main function
if __name__ == "__main__":
asyncio.run(main())
To run the script, pass the ticker symbols as arguments:
python google_finance_scraper/main.py aapl meta amzn
This will scrape and display data for Apple, Meta, and Amazon.
6. Avoid getting blocked
Websites often detect and prevent automated scraping using techniques such as rate limiting, IP blocking, and analyzing browsing patterns. When scraping data from websites, it's crucial to employ strategies to avoid detection. Here are some effective ways to stay undetected:
1. Random intervals between requests
A simple method to reduce the risk of detection is to introduce random delays between requests. This straightforward technique can significantly lower the chances of being identified as an automated scraper.
Here's how to add random delays in your Playwright script:
import asyncio
import random
from playwright.async_api import Playwright, async_playwright
async def scrape_data(playwright: Playwright, ticker: str):
browser = await playwright.chromium.launch()
context = await browser.new_context()
page = await context.new_page()
url = f"https://www.google.com/finance/quote/{ticker}"
await page.goto(url)
# Random delay to mimic human-like behavior
await asyncio.sleep(random.uniform(2, 5))
# Your scraping logic here...
await context.close()
await browser.close()
async def main():
async with async_playwright() as playwright:
await scrape_data(playwright, "AAPL:NASDAQ")
if __name__ == "__main__":
asyncio.run(main())
This script introduces a random delay of 2 to 5 seconds between requests, making the actions less predictable and reducing the likelihood of being flagged as a bot.
2. Setting and switching user-agents
Websites often use User-Agent strings to identify the browser and device behind each request. By rotating User-Agent strings, you can make your scraping requests appear to come from different browsers and devices, helping you avoid detection.
Here's how to implement User-Agent rotation in Playwright:
import asyncio
import random
from playwright.async_api import Playwright, async_playwright
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0",
]
async def scrape_data(playwright: Playwright, ticker: str) -> None:
browser = await playwright.chromium.launch(headless=True)
context = await browser.new_context(user_agent=random.choice(user_agents))
page = await context.new_page()
url = f"https://www.google.com/finance/quote/{ticker}"
await page.goto(url)
# Your scraping logic goes here...
await context.close()
await browser.close()
async def main():
async with async_playwright() as playwright:
await scrape_data(playwright, "AAPL:NASDAQ")
if __name__ == "__main__":
asyncio.run(main())
This method uses a list of User-Agent strings and randomly selects one for each request. This technique helps mask your scraper's identity and reduces the likelihood of being blocked.
📌
Note: You can refer to websites like useragentstring.com to get a comprehensive list of User-Agent strings.
3. Using playwright-stealth
To further minimize detection and enhance your scraping efforts, you can use the playwright-stealth library, which applies various techniques to make your scraping activities look like a real user.
First, install playwright-stealth:
poetry add playwright-stealth
If you encounter a ModuleNotFoundError for pkg_resources, it’s likely because the setuptools package is not installed. To resolve this, also install setuptools:
poetry add setuptools
Then, modify your script:
import asyncio
from playwright.async_api import Playwright, async_playwright
from playwright_stealth import stealth_async
async def scrape_data(playwright: Playwright, ticker: str) -> None:
browser = await playwright.chromium.launch(headless=True)
context = await browser.new_context()
# Apply stealth techniques to avoid detection
await stealth_async(context)
page = await context.new_page()
url = f"https://www.google.com/finance/quote/{ticker}"
await page.goto(url)
# Your scraping logic here...
await context.close()
await browser.close()
async def main():
async with async_playwright() as playwright:
await scrape_data(playwright, "AAPL:NASDAQ")
if __name__ == "__main__":
asyncio.run(main())
These techniques can help avoid blocking, but you might still face issues. If so, try more advanced methods like using proxies, rotating IP addresses, or implementing CAPTCHA solvers. You can check out our tips for crawling websites without getting blocked. It’s a go-to guide on choosing proxies wisely, fighting Cloudflare, solving CAPTCHAs, avoiding honeytraps, and more.
Blocked again? Apify Proxy will get you through
Improve the performance of your scrapers by smartly rotating datacenter and residential IP addresses.
After scraping the desired stock data, the next step is to export it into a CSV file to make it easy to analyze, share with others, or import into other data processing tools.
Here's how you can save the extracted data to a CSV file:
# ...
import csv
async def main() -> None:
# ...
async with async_playwright() as playwright:
# Collect data for all tickers
results = []
for ticker in tickers:
data = await scrape_data(playwright, ticker)
results.append(data)
# Define the CSV file name
csv_file = "financial_data.csv"
# Write data to CSV
with open(csv_file, mode="w", newline="") as file:
writer = csv.DictWriter(file, fieldnames=results[0].keys())
writer.writeheader()
writer.writerows(results)
if __name__ == "__main__":
asyncio.run(main())
The code starts by gathering data for each ticker symbol. After that, it creates a CSV file named financial_data.csv. It then uses Python's csv.DictWriter method to writing the data, starting with the column headers using the writeheader() method, and adding each row of data using the writerows() method.
8. Putting everything together
Let’s pull everything together into a single script. This final code snippet includes all the steps from scraping data from Google Finance to exporting it to a CSV file.
import asyncio
import sys
import csv
from playwright.async_api import async_playwright, Playwright
async def scrape_data(playwright: Playwright, ticker: str) -> dict:
"""
Scrape financial data for a given stock ticker from Google Finance.
Args:
playwright (Playwright): The Playwright instance.
ticker (str): The stock ticker symbol.
Returns:
dict: A dictionary containing the scraped financial data.
"""
financial_data = {
"ticker": ticker.split(":")[0],
"price": None,
"price_change_value": None,
"price_change_percentage": None,
"close_time": None,
"previous_close": None,
"day_range": None,
"year_range": None,
"market_cap": None,
"avg_volume": None,
"p/e_ratio": None,
"dividend_yield": None,
"primary_exchange": None,
}
try:
# Launch the browser and navigate to the Google Finance page for the ticker
browser = await playwright.chromium.launch(headless=True)
context = await browser.new_context()
page = await context.new_page()
await page.goto(f"https://www.google.com/finance/quote/{ticker}")
# Scrape current price
price_element = await page.query_selector("div.YMlKec.fxKbKc")
if price_element:
price_text = await price_element.inner_text()
financial_data["price"] = price_text.replace(",", "")
# Scrape price change percentage
percentage_element = await page.query_selector("div.enJeMd div.JwB6zf")
if percentage_element:
percentage_text = await percentage_element.inner_text()
financial_data["price_change_percentage"] = percentage_text.strip()
# Scrape price change value
value_element = await page.query_selector("span.P2Luy.ZYVHBb")
if value_element:
value_text = await value_element.inner_text()
value_parts = value_text.split()
if value_parts:
financial_data["price_change_value"] = value_parts[0].replace(
"$", "")
# Scrape close time
close_time_element = await page.query_selector('//div[contains(text(), "Closed:")]')
if close_time_element:
close_time_text = await close_time_element.inner_text()
close_time = close_time_text.split(
"·")[0].replace("Closed:", "").strip()
clean_close_time = close_time.replace("\\u202f", " ")
financial_data["close_time"] = clean_close_time
# Scrape additional financial data
label_elements = await page.query_selector_all(".mfs7Fc")
value_elements = await page.query_selector_all(".P6K39c")
for label_element, value_element in zip(label_elements, value_elements):
label = await label_element.inner_text()
value = await value_element.inner_text()
label = label.strip().lower().replace(" ", "_")
if label in financial_data:
financial_data[label] = value.strip()
except Exception as e:
print(f"An error occurred for {ticker}: {str(e)}")
finally:
# Ensure browser is closed even if an exception occurs
await context.close()
await browser.close()
return financial_data
async def main():
"""
Main function to scrape financial data for multiple stock tickers and save to CSV.
"""
# Get ticker symbols from command line arguments
if len(sys.argv) < 2:
print("Please provide at least one ticker symbol as a command-line argument.")
sys.exit(1)
tickers = sys.argv[1:]
async with async_playwright() as playwright:
results = []
for ticker in tickers:
data = await scrape_data(playwright, f"{ticker}:NASDAQ")
results.append(data)
# Define CSV file name
csv_file = "financial_data.csv"
# Write data to CSV
with open(csv_file, mode="w", newline="") as file:
writer = csv.DictWriter(file, fieldnames=results[0].keys())
writer.writeheader()
writer.writerows(results)
print(f"Data exported to {csv_file}")
# Run the main function
if __name__ == "__main__":
asyncio.run(main())
You can run the script from the terminal by providing one or more stock ticker symbols as command-line arguments.
python google_finance_scraper/main.py AAPL META AMZN TSLA
After running the script, the CSV file named financial_data.csv will be created in the same directory. This file will contain all the data in an organized way. The CSV file will look like this:
9. Deploying the code to Apify
With your scraper ready, it’s time to deploy it to the cloud using Apify. This will allow you to run your scraper on a schedule and utilize Apify’s powerful features. For this task, we’ll use the Python Playwright template for a quick setup. On Apify, scrapers are called Actors.
To get started, you'll need to install the Apify CLI, which will help you manage your Actor. On macOS or Linux, you can do this using Homebrew:
brew install apify-cli
Or, via NPM:
npm -g install apify-cli
With the CLI installed, create a new Actor using the Python Playwright + Chrome template:
apify create gf-scraper -t python-playwright
This command will set up a project named gf-scraper in your directory. It installs all the necessary dependencies and provides some boilerplate code to get you started.
Navigate to your new project folder and open it with your favorite code editor. In this example, I’m using VS Code:
cd gf-scraper
code .
The template comes with a fully functional scraper. You can test it by running the command apify run to see it in action. The results will be saved in storage/datasets.
Next, modify the code in src/main.py to tailor it for scraping Google Finance.
Here’s the modified code:
from playwright.async_api import async_playwright
from apify import Actor
async def extract_stock_data(page, ticker):
financial_data = {
"ticker": ticker.split(":")[0],
"price": None,
"price_change_value": None,
"price_change_percentage": None,
"close_time": None,
"previous_close": None,
"day_range": None,
"year_range": None,
"market_cap": None,
"avg_volume": None,
"p/e_ratio": None,
"dividend_yield": None,
"primary_exchange": None,
}
# Scrape current price
price_element = await page.query_selector("div.YMlKec.fxKbKc")
if price_element:
price_text = await price_element.inner_text()
financial_data["price"] = price_text.replace(",", "")
# Scrape price change percentage
percentage_element = await page.query_selector("div.enJeMd div.JwB6zf")
if percentage_element:
percentage_text = await percentage_element.inner_text()
financial_data["price_change_percentage"] = percentage_text.strip()
# Scrape price change value
value_element = await page.query_selector("span.P2Luy.ZYVHBb")
if value_element:
value_text = await value_element.inner_text()
value_parts = value_text.split()
if value_parts:
financial_data["price_change_value"] = value_parts[0].replace(
"$", "")
# Scrape close time
close_time_element = await page.query_selector('//div[contains(text(), "Closed:")]')
if close_time_element:
close_time_text = await close_time_element.inner_text()
close_time = close_time_text.split(
"·")[0].replace("Closed:", "").strip()
clean_close_time = close_time.replace("\\u202f", " ")
financial_data["close_time"] = clean_close_time
# Scrape additional financial data
label_elements = await page.query_selector_all(".mfs7Fc")
value_elements = await page.query_selector_all(".P6K39c")
for label_element, value_element in zip(label_elements, value_elements):
label = await label_element.inner_text()
value = await value_element.inner_text()
label = label.strip().lower().replace(" ", "_")
if label in financial_data:
financial_data[label] = value.strip()
return financial_data
async def main() -> None:
"""
Main function to run the Apify Actor and extract stock data using Playwright.
Reads input configuration from the Actor, enqueues URLs for scraping,
launches Playwright to process requests, and extracts stock data.
"""
async with Actor:
# Retrieve input parameters
actor_input = await Actor.get_input() or {}
start_urls = actor_input.get("start_urls", [])
tickers = actor_input.get("tickers", [])
if not start_urls:
Actor.log.info(
"No start URLs specified in actor input. Exiting...")
await Actor.exit()
base_url = start_urls[0].get("url", "")
# Enqueue requests for each ticker
default_queue = await Actor.open_request_queue()
for ticker in tickers:
url = f"{base_url}{ticker}:NASDAQ"
await default_queue.add_request(url)
# Launch Playwright and open a new browser context
Actor.log.info("Launching Playwright...")
async with async_playwright() as playwright:
browser = await playwright.chromium.launch(headless=Actor.config.headless)
context = await browser.new_context()
# Process requests from the queue
while request := await default_queue.fetch_next_request():
url = (
request.url
) # Use attribute access instead of dictionary-style access
Actor.log.info(f"Scraping {url} ...")
try:
# Open the URL in a new Playwright page
page = await context.new_page()
await page.goto(url, wait_until="domcontentloaded")
# Extract the ticker symbol from the URL
ticker = url.rsplit("/", 1)[-1]
data = await extract_stock_data(page, ticker)
# Push the extracted data to Apify
await Actor.push_data(data)
except Exception as e:
Actor.log.exception(
f"Error extracting data from {url}: {e}")
finally:
# Ensure the page is closed and the request is marked as handled
await page.close()
await default_queue.mark_request_as_handled(request)
Before running the code, update the input_schema.json file in the .actor/ directory to include the Google Finance quote page URL and also add a tickers field.
Also, update the input.json file by changing the URL to the Google Finance page to prevent conflicts during execution or you can simply delete this file.
To run your Actor, run this command in your terminal:
apify run
The scraped results will saved in storage/datasets, where each ticker will have its own JSON file as shown below:
To deploy your Actor, first create an Apify account if you don’t already have one. Then, get your API Token from Apify Console under Settings → Integrations, and finally log in with your token using the following command:
apify login -t YOUR_APIFY_TOKEN
Finally, push your Actor to Apify with:
apify push
After a few moments, your Actor should appear in the Apify Console under Actors → My actors.
Your scraper is now ready to run on the Apify platform. Click the "Start" button to begin. Once the run is complete, you can preview and download your data in various formats from the "Storage" tab.
Bonus: A key advantage of running your scrapers on Apify is the option to save different configurations for the same Actor and set up automatic scheduling. Let's set up this for our Playwright Actor.
On the Actor page, click on Create empty task.
Next, click on Actions and then Schedule.
Finally, select how often you want the Actor to run and click Create.
Perfect! Your Actor is now set to run automatically at the time you specified. You can view and manage all your scheduled runs in the "Schedules" tab of the Apify platform.
To begin scraping with Python on the Apify platform, you can utilize Python code templates. These templates are available for popular libraries such as Requests, Beautiful Soup, Scrapy, Playwright, and Selenium. Using these templates allows you to quickly build scrapers for various web scraping tasks.
No, Google Finance does not have a publicly accessible API. Although it used to have one, it was deprecated in 2012. Since then, Google has not released a new public API for accessing financial data through Google Finance.
Conclusion
You've learned to use Playwright to interact with Google Finance and extract valuable financial data. You’ve also explored methods to avoid getting blocked and built a code solution where you simply pass ticker symbols (one or multiple), and all the desired data is stored in a CSV file. Additionally, you now have a solid understanding of how to use the Apify platform and its Actor framework for building scalable web scrapers and schedule your scraper to run at the most convenient times for you.
Deploy your scraping code to the cloud
From code to cloud and everything in between. Apify is a full-stack platform built to handle it all, so you don’t have to.
I am a freelance technical writer based in India. I write quality user guides and blog posts for software development startups. I have worked with more than 10 startups across the globe.