How to scrape Google Finance in Python

This guide will walk you through the process of extracting Google Finance data step by step.

Content

Scraping web data from Google Finance can be challenging for three reasons:

  • It has a complex HTML structure
  • It's updated frequently
  • It requires precise CSS or XPath selectors

This guide will show you how to overcome these challenges using Python, step by step. You'll find this tutorial easy to follow, and by the end, you'll have fully functional code ready to extract the financial data you need from Google Finance.

Does Google Finance allow scraping?

Yes, you can generally scrape Google Finance. Most of the data available on the Google Finance website is publicly accessible. However, you should respect their terms of service and avoid overwhelming their servers.

How to scrape Google Finance using Python

Follow this step-by-step tutorial to learn how to create a web scraper for Google Finance using Python.

1. Setup and prerequisites

Before you start, make sure your development environment is ready:

  • Install Python: Download and install the latest version of Python from the official Python website.
  • Choose an IDE: Use an IDE like PyCharm, Visual Studio Code, or Jupyter Notebook for your development work.
  • Basic Knowledge: Ensure you understand CSS selectors and are comfortable using browser DevTools to inspect page elements.

Next, create a new project using Poetry:

poetry new google-finance-scraper

This command will generate the following project structure:

google-finance-scraper/
├── pyproject.toml
├── README.md
├── google_finance_scraper/
│   └── __init__.py
└── tests/
    └── __init__.py

Navigate into the project directory and install Playwright:

cd google-finance-scraper
poetry add playwright
poetry run playwright install

Google Finance uses JavaScript to load content dynamically. Playwright can render JavaScript, making it suitable for scraping dynamic content from Google Finance.

Open the pyproject.toml file to check your project's dependencies, which should include:

[tool.poetry.dependencies]
python = "^3.12"
playwright = "^1.46.0"
📌
Note: At the time of writing, the version of playwright is 1.46.0, but it may change. Check for the latest version and update your pyproject.toml if necessary.

Finally, create a main.py file within the google_finance_scraper folder to write your scraping logic.

Your updated project structure should look like this:

google-finance-scraper/
├── pyproject.toml
├── README.md
├── google_finance_scraper/
│   ├── __init__.py
│   └── main.py
└── tests/
    └── __init__.py

Your environment is now set up, and you're ready to start writing the Python Playwright code to scrape Google Finance.

2. Connect to the target Google Finance page

To begin, let's launch a Chromium browser instance using Playwright. While Playwright supports various browser engines, we'll use Chromium for this tutorial:

from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as playwright:
        browser = await playwright.chromium.launch(headless=False)  # Launch a Chromium browser
        context = await browser.new_context()
        page = await context.new_page()

if __name__ == "__main__":
    asyncio.run(main())

To run this script, you'll need to execute the main() function using an event loop at the end of your script.

Next, navigate to the Google Finance page for the stock you want to scrape. The URL format for a Google Finance stock page looks like this:

https://www.google.com/finance/quote/{ticker_symbol}

ticker symbol is a unique code that identifies a publicly traded company on a stock exchange, such as AAPL for Apple Inc. or TSLA for Tesla, Inc. When the ticker symbol changes, the URL also changes. Therefore, you should replace {ticker_symbol} with the specific stock ticker you want to scrape.

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as playwright:
        # ...

        ticker_symbol = "AAPL:NASDAQ"  # Replace with the desired ticker symbol
        google_finance_url = f"https://www.google.com/finance/quote/{ticker_symbol}"

        await page.goto(google_finance_url)  # Navigate to the Google Finance page


if __name__ == "__main__":
    asyncio.run(main())

Here's the complete script so far:

import asyncio
from playwright.async_api import async_playwright

async def main():
    async with async_playwright() as playwright:
        # Launch a Chromium browser
        browser = await playwright.chromium.launch(headless=False)
        context = await browser.new_context()
        page = await context.new_page()

        ticker_symbol = "AAPL:NASDAQ"  # Replace with the desired ticker symbol
        google_finance_url = f"https://www.google.com/finance/quote/{ticker_symbol}"

        # Navigate to the Google Finance page
        await page.goto(google_finance_url)

        # Wait for a few seconds
        await asyncio.sleep(3)

        # Close the browser
        await browser.close()

if __name__ == "__main__":
    asyncio.run(main())

When you run this script, it will open the Google Finance page for some seconds before terminating.

scrape-google-finance-screenshot-aapl-page.png

Great! Now, you just have to change the ticker symbol to scrape data for any stock of your choice.


Note that launching the browser with the UI (headless=False) is perfect for testing and debugging. If you want to save resources and run the browser in the background, switch to headless mode:

browser = await playwright.chromium.launch(headless=True)

3. Inspect the page to select elements to scrape

To effectively scrape data, you first need to understand the DOM structure of the webpage. Suppose you want to extract the regular market price ($229.79), change (+1.46), and change percent (+3.30%). These values are all contained within a div element.

scraping Google Finance for market prices

You can use selectors div.YMlKec.fxKbKc to extract the price, div.enJeMd div.JwB6zf for the percentage change, and span.P2Luy.ZYVHBb for the value change from Google Finance.

div.YMlKec.fxKbKc
div.enJeMd div.JwB6zf
span.P2Luy.ZYVHBb

Great! Next, let's look at how to extract the market close time, which is displayed as "06:02:19 UTC-4" on the page.

scraping Googl Finance for close times

To select the market close time, use this CSS selector:

//div[contains(text(), "Closed:")]

Now, let’s move on to extract critical company data like market cap, previous close, and volume from the table:

scraping Google Finance for company data

As you can see, the data is structured in the table, and multiple div tags represent each field, starting from "Previous Close" and ending at "Primary Exchange".

You can use the selectors .mfs7Fc to extract labels and .P6K39c to extract corresponding values from the Google Finance table. These selectors target elements by their class names, allowing you to retrieve and process the table's data in pairs.

.mfs7Fc
.P6K39c

4. Scrape the stock data

Now that you've identified the elements you need, it's time to write the Playwright script to extract the data from Google Finance.

Let’s define a new function named scrape_data that will handle the scraping process. This function takes a ticker symbol, navigates to the Google Finance page, and returns a dictionary containing the extracted financial data.

Here's how it works:

import asyncio
from playwright.async_api import async_playwright, Playwright

async def scrape_data(playwright: Playwright, ticker: str) -> dict:
    financial_data = {
        "ticker": ticker.split(":")[0],
        "price": None,
        "price_change_value": None,
        "price_change_percentage": None,
        "close_time": None,
        "previous_close": None,
        "day_range": None,
        "year_range": None,
        "market_cap": None,
        "avg_volume": None,
        "p/e_ratio": None,
        "dividend_yield": None,
        "primary_exchange": None,
    }

    try:
        browser = await playwright.chromium.launch(headless=True)
        context = await browser.new_context()
        page = await context.new_page()
        await page.goto(f"https://www.google.com/finance/quote/{ticker}")

        price_element = await page.query_selector("div.YMlKec.fxKbKc")
        if price_element:
            price_text = await price_element.inner_text()
            financial_data["price"] = price_text.replace(",", "")

        percentage_element = await page.query_selector("div.enJeMd div.JwB6zf")
        if percentage_element:
            percentage_text = await percentage_element.inner_text()
            financial_data["price_change_percentage"] = percentage_text.strip()

        value_element = await page.query_selector("span.P2Luy.ZYVHBb")
        if value_element:
            value_text = await value_element.inner_text()
            value_parts = value_text.split()
            if value_parts:
                financial_data["price_change_value"] = value_parts[0].replace(
                    "$", "")

        close_time_element = await page.query_selector('//div[contains(text(), "Closed:")]')
        if close_time_element:
            close_time_text = await close_time_element.inner_text()
            close_time = close_time_text.split(
                "·")[0].replace("Closed:", "").strip()
            clean_close_time = close_time.replace("\\u202f", " ")
            financial_data["close_time"] = clean_close_time

        label_elements = await page.query_selector_all(".mfs7Fc")
        value_elements = await page.query_selector_all(".P6K39c")

        for label_element, value_element in zip(label_elements, value_elements):
            label = await label_element.inner_text()
            value = await value_element.inner_text()
            label = label.strip().lower().replace(" ", "_")
            if label in financial_data:
                financial_data[label] = value.strip()

    except Exception as e:
        print(f"An error occurred for {ticker}: {str(e)}")
    finally:
        await context.close()
        await browser.close()

    return financial_data

The code first navigates to the stock's page and extracts various metrics like price and market cap using query_selectorand query_selector_all, which are the common Playwright methods to select and fetch the data from the elements based on CSS selectors and XPath queries.

After that, the text from the elements is extracted using inner_text() and stored in a dictionary, where each key represents a financial metric (e.g., price, market cap), and each value is the corresponding extracted text. At last, the browser session is closed to free up resources.

Now, define the main function that orchestrates the entire process by iterating over each ticker and collecting data.

async def main():
    # Define the ticker symbol
    ticker = "AAPL"

    # Append ":NASDAQ" to the ticker for the Google Finance URL
    ticker = f"{ticker}:NASDAQ"

    async with async_playwright() as playwright:
        # Collect data for the ticker
        data = await scrape_data(playwright, ticker)
        print(data)

# Run the main function
if __name__ == "__main__":
    asyncio.run(main())

At the end of the scraping process, the following data will be printed in the console:

scraping Google Finance. Extracted data

5. Scrape multiple stocks

So far, we've scraped data for a single stock. To gather data for multiple stocks at once from Google Finance, we can modify the script to accept ticker symbols as command-line arguments and process each one. Make sure to import the sysmodule.

import sys

async def main():
    # Get ticker symbols from command line arguments
    if len(sys.argv) < 2:
        print("Please provide at least one ticker symbol as a command-line argument.")
        sys.exit(1)

    tickers = sys.argv[1:]
    async with async_playwright() as playwright:
        results = []
        for ticker in tickers:
            data = await scrape_data(playwright, f"{ticker}:NASDAQ")
            results.append(data)
        print(results)

# Run the main function
if __name__ == "__main__":
    asyncio.run(main())

To run the script, pass the ticker symbols as arguments:

python google_finance_scraper/main.py aapl meta amzn

This will scrape and display data for Apple, Meta, and Amazon.

scraping Google Finance. Multiple tickers

6. Avoid getting blocked

Websites often detect and prevent automated scraping using techniques such as rate limiting, IP blocking, and analyzing browsing patterns. When scraping data from websites, it's crucial to employ strategies to avoid detection. Here are some effective ways to stay undetected:

1. Random intervals between requests

A simple method to reduce the risk of detection is to introduce random delays between requests. This straightforward technique can significantly lower the chances of being identified as an automated scraper.

Here's how to add random delays in your Playwright script:

import asyncio
import random
from playwright.async_api import Playwright, async_playwright

async def scrape_data(playwright: Playwright, ticker: str):
    browser = await playwright.chromium.launch()
    context = await browser.new_context()
    page = await context.new_page()

    url = f"https://www.google.com/finance/quote/{ticker}"
    await page.goto(url)

    # Random delay to mimic human-like behavior
    await asyncio.sleep(random.uniform(2, 5))

    # Your scraping logic here...

    await context.close()
    await browser.close()

async def main():
    async with async_playwright() as playwright:
        await scrape_data(playwright, "AAPL:NASDAQ")

if __name__ == "__main__":
    asyncio.run(main())

This script introduces a random delay of 2 to 5 seconds between requests, making the actions less predictable and reducing the likelihood of being flagged as a bot.

2. Setting and switching user-agents

Websites often use User-Agent strings to identify the browser and device behind each request. By rotating User-Agent strings, you can make your scraping requests appear to come from different browsers and devices, helping you avoid detection.

Here's how to implement User-Agent rotation in Playwright:

import asyncio
import random
from playwright.async_api import Playwright, async_playwright

user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0",
]

async def scrape_data(playwright: Playwright, ticker: str) -> None:
    browser = await playwright.chromium.launch(headless=True)

    context = await browser.new_context(user_agent=random.choice(user_agents))

    page = await context.new_page()

    url = f"https://www.google.com/finance/quote/{ticker}"
    await page.goto(url)

    # Your scraping logic goes here...

    await context.close()
    await browser.close()

async def main():
    async with async_playwright() as playwright:
        await scrape_data(playwright, "AAPL:NASDAQ")

if __name__ == "__main__":
    asyncio.run(main())

This method uses a list of User-Agent strings and randomly selects one for each request. This technique helps mask your scraper's identity and reduces the likelihood of being blocked.

📌
Note: You can refer to websites like useragentstring.com to get a comprehensive list of User-Agent strings.

3. Using playwright-stealth

To further minimize detection and enhance your scraping efforts, you can use the playwright-stealth library, which applies various techniques to make your scraping activities look like a real user.

First, install playwright-stealth:

poetry add playwright-stealth

If you encounter a ModuleNotFoundError for pkg_resources, it’s likely because the setuptools package is not installed. To resolve this, also install setuptools:

poetry add setuptools

Then, modify your script:

import asyncio
from playwright.async_api import Playwright, async_playwright
from playwright_stealth import stealth_async

async def scrape_data(playwright: Playwright, ticker: str) -> None:
    browser = await playwright.chromium.launch(headless=True)
    context = await browser.new_context()

    # Apply stealth techniques to avoid detection
    await stealth_async(context)

    page = await context.new_page()

    url = f"https://www.google.com/finance/quote/{ticker}"
    await page.goto(url)

    # Your scraping logic here...

    await context.close()
    await browser.close()

async def main():
    async with async_playwright() as playwright:
        await scrape_data(playwright, "AAPL:NASDAQ")

if __name__ == "__main__":
    asyncio.run(main())

These techniques can help avoid blocking, but you might still face issues. If so, try more advanced methods like using proxies, rotating IP addresses, or implementing CAPTCHA solvers. You can check out our tips for crawling websites without getting blocked. It’s a go-to guide on choosing proxies wisely, fighting Cloudflare, solving CAPTCHAs, avoiding honeytraps, and more.

Blocked again? Apify Proxy will get you through

Improve the performance of your scrapers by smartly rotating datacenter and residential IP addresses.

Available on all Apify plans

Sign up for free and stop getting blocked

7. Export scraped stock data to CSV

After scraping the desired stock data, the next step is to export it into a CSV file to make it easy to analyze, share with others, or import into other data processing tools.

Here's how you can save the extracted data to a CSV file:

# ...

import csv

async def main() -> None:
    # ...

    async with async_playwright() as playwright:

        # Collect data for all tickers
        results = []
        for ticker in tickers:
            data = await scrape_data(playwright, ticker)
            results.append(data)

        # Define the CSV file name
        csv_file = "financial_data.csv"

        # Write data to CSV
        with open(csv_file, mode="w", newline="") as file:
            writer = csv.DictWriter(file, fieldnames=results[0].keys())
            writer.writeheader()
            writer.writerows(results)

if __name__ == "__main__":
    asyncio.run(main())

The code starts by gathering data for each ticker symbol. After that, it creates a CSV file named financial_data.csv. It then uses Python's csv.DictWriter method to writing the data, starting with the column headers using the writeheader() method, and adding each row of data using the writerows() method.

8. Putting everything together

Let’s pull everything together into a single script. This final code snippet includes all the steps from scraping data from Google Finance to exporting it to a CSV file.

import asyncio
import sys
import csv
from playwright.async_api import async_playwright, Playwright

async def scrape_data(playwright: Playwright, ticker: str) -> dict:
    """
    Scrape financial data for a given stock ticker from Google Finance.

    Args:
        playwright (Playwright): The Playwright instance.
        ticker (str): The stock ticker symbol.

    Returns:
        dict: A dictionary containing the scraped financial data.
    """
    financial_data = {
        "ticker": ticker.split(":")[0],
        "price": None,
        "price_change_value": None,
        "price_change_percentage": None,
        "close_time": None,
        "previous_close": None,
        "day_range": None,
        "year_range": None,
        "market_cap": None,
        "avg_volume": None,
        "p/e_ratio": None,
        "dividend_yield": None,
        "primary_exchange": None,
    }

    try:
        # Launch the browser and navigate to the Google Finance page for the ticker
        browser = await playwright.chromium.launch(headless=True)
        context = await browser.new_context()
        page = await context.new_page()
        await page.goto(f"https://www.google.com/finance/quote/{ticker}")

        # Scrape current price
        price_element = await page.query_selector("div.YMlKec.fxKbKc")
        if price_element:
            price_text = await price_element.inner_text()
            financial_data["price"] = price_text.replace(",", "")

        # Scrape price change percentage
        percentage_element = await page.query_selector("div.enJeMd div.JwB6zf")
        if percentage_element:
            percentage_text = await percentage_element.inner_text()
            financial_data["price_change_percentage"] = percentage_text.strip()

        # Scrape price change value
        value_element = await page.query_selector("span.P2Luy.ZYVHBb")
        if value_element:
            value_text = await value_element.inner_text()
            value_parts = value_text.split()
            if value_parts:
                financial_data["price_change_value"] = value_parts[0].replace(
                    "$", "")

        # Scrape close time
        close_time_element = await page.query_selector('//div[contains(text(), "Closed:")]')
        if close_time_element:
            close_time_text = await close_time_element.inner_text()
            close_time = close_time_text.split(
                "·")[0].replace("Closed:", "").strip()
            clean_close_time = close_time.replace("\\u202f", " ")
            financial_data["close_time"] = clean_close_time

        # Scrape additional financial data
        label_elements = await page.query_selector_all(".mfs7Fc")
        value_elements = await page.query_selector_all(".P6K39c")

        for label_element, value_element in zip(label_elements, value_elements):
            label = await label_element.inner_text()
            value = await value_element.inner_text()
            label = label.strip().lower().replace(" ", "_")
            if label in financial_data:
                financial_data[label] = value.strip()

    except Exception as e:
        print(f"An error occurred for {ticker}: {str(e)}")
    finally:
        # Ensure browser is closed even if an exception occurs
        await context.close()
        await browser.close()

    return financial_data

async def main():
    """
    Main function to scrape financial data for multiple stock tickers and save to CSV.
    """
    # Get ticker symbols from command line arguments
    if len(sys.argv) < 2:
        print("Please provide at least one ticker symbol as a command-line argument.")
        sys.exit(1)

    tickers = sys.argv[1:]
    async with async_playwright() as playwright:
        results = []
        for ticker in tickers:
            data = await scrape_data(playwright, f"{ticker}:NASDAQ")
            results.append(data)

        # Define CSV file name
        csv_file = "financial_data.csv"

        # Write data to CSV
        with open(csv_file, mode="w", newline="") as file:
            writer = csv.DictWriter(file, fieldnames=results[0].keys())
            writer.writeheader()
            writer.writerows(results)

        print(f"Data exported to {csv_file}")

# Run the main function
if __name__ == "__main__":
    asyncio.run(main())

You can run the script from the terminal by providing one or more stock ticker symbols as command-line arguments.

python google_finance_scraper/main.py AAPL META AMZN TSLA

After running the script, the CSV file named financial_data.csv will be created in the same directory. This file will contain all the data in an organized way. The CSV file will look like this:

scraping Google Finance. Data in CSV

9. Deploying the code to Apify

With your scraper ready, it’s time to deploy it to the cloud using Apify. This will allow you to run your scraper on a schedule and utilize Apify’s powerful features. For this task, we’ll use the Python Playwright template for a quick setup. On Apify, scrapers are called Actors.

Start by cloning the Playwright + Chrome template from the Apify Python template repository.

To get started, you'll need to install the Apify CLI, which will help you manage your Actor. On macOS or Linux, you can do this using Homebrew:

brew install apify-cli

Or, via NPM:

npm -g install apify-cli

With the CLI installed, create a new Actor using the Python Playwright + Chrome template:

apify create gf-scraper -t python-playwright

This command will set up a project named gf-scraper in your directory. It installs all the necessary dependencies and provides some boilerplate code to get you started.

Navigate to your new project folder and open it with your favorite code editor. In this example, I’m using VS Code:

cd gf-scraper
code .

The template comes with a fully functional scraper. You can test it by running the command apify run to see it in action. The results will be saved in storage/datasets.

Next, modify the code in src/main.py to tailor it for scraping Google Finance.

Here’s the modified code:

from playwright.async_api import async_playwright
from apify import Actor

async def extract_stock_data(page, ticker):
    financial_data = {
        "ticker": ticker.split(":")[0],
        "price": None,
        "price_change_value": None,
        "price_change_percentage": None,
        "close_time": None,
        "previous_close": None,
        "day_range": None,
        "year_range": None,
        "market_cap": None,
        "avg_volume": None,
        "p/e_ratio": None,
        "dividend_yield": None,
        "primary_exchange": None,
    }

    # Scrape current price
    price_element = await page.query_selector("div.YMlKec.fxKbKc")
    if price_element:
        price_text = await price_element.inner_text()
        financial_data["price"] = price_text.replace(",", "")

    # Scrape price change percentage
    percentage_element = await page.query_selector("div.enJeMd div.JwB6zf")
    if percentage_element:
        percentage_text = await percentage_element.inner_text()
        financial_data["price_change_percentage"] = percentage_text.strip()

    # Scrape price change value
    value_element = await page.query_selector("span.P2Luy.ZYVHBb")
    if value_element:
        value_text = await value_element.inner_text()
        value_parts = value_text.split()
        if value_parts:
            financial_data["price_change_value"] = value_parts[0].replace(
                "$", "")

    # Scrape close time
    close_time_element = await page.query_selector('//div[contains(text(), "Closed:")]')
    if close_time_element:
        close_time_text = await close_time_element.inner_text()
        close_time = close_time_text.split(
            "·")[0].replace("Closed:", "").strip()
        clean_close_time = close_time.replace("\\u202f", " ")
        financial_data["close_time"] = clean_close_time

    # Scrape additional financial data
    label_elements = await page.query_selector_all(".mfs7Fc")
    value_elements = await page.query_selector_all(".P6K39c")

    for label_element, value_element in zip(label_elements, value_elements):
        label = await label_element.inner_text()
        value = await value_element.inner_text()
        label = label.strip().lower().replace(" ", "_")
        if label in financial_data:
            financial_data[label] = value.strip()
    return financial_data

async def main() -> None:
    """
    Main function to run the Apify Actor and extract stock data using Playwright.

    Reads input configuration from the Actor, enqueues URLs for scraping,
    launches Playwright to process requests, and extracts stock data.
    """
    async with Actor:

        # Retrieve input parameters
        actor_input = await Actor.get_input() or {}
        start_urls = actor_input.get("start_urls", [])
        tickers = actor_input.get("tickers", [])

        if not start_urls:
            Actor.log.info(
                "No start URLs specified in actor input. Exiting...")
            await Actor.exit()
        base_url = start_urls[0].get("url", "")

        # Enqueue requests for each ticker
        default_queue = await Actor.open_request_queue()
        for ticker in tickers:
            url = f"{base_url}{ticker}:NASDAQ"
            await default_queue.add_request(url)

        # Launch Playwright and open a new browser context
        Actor.log.info("Launching Playwright...")
        async with async_playwright() as playwright:
            browser = await playwright.chromium.launch(headless=Actor.config.headless)
            context = await browser.new_context()

            # Process requests from the queue
            while request := await default_queue.fetch_next_request():
                url = (
                    request.url
                )  # Use attribute access instead of dictionary-style access
                Actor.log.info(f"Scraping {url} ...")

                try:
                    # Open the URL in a new Playwright page
                    page = await context.new_page()
                    await page.goto(url, wait_until="domcontentloaded")

                    # Extract the ticker symbol from the URL
                    ticker = url.rsplit("/", 1)[-1]
                    data = await extract_stock_data(page, ticker)

                    # Push the extracted data to Apify
                    await Actor.push_data(data)
                except Exception as e:
                    Actor.log.exception(
                        f"Error extracting data from {url}: {e}")
                finally:
                    # Ensure the page is closed and the request is marked as handled
                    await page.close()
                    await default_queue.mark_request_as_handled(request)

Before running the code, update the input_schema.json file in the .actor/ directory to include the Google Finance quote page URL and also add a tickers field.

Here's the updated input_schema.json file:

{
    "title": "Python Playwright Scraper",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "start_urls": {
            "title": "Start URLs",
            "type": "array",
            "description": "URLs to start with",
            "prefill": [
                {
                    "url": "https://www.google.com/finance/quote/"
                }
            ],
            "editor": "requestListSources"
        },
        "tickers": {
            "title": "Tickers",
            "type": "array",
            "description": "List of stock ticker symbols to scrape data for",
            "items": {
                "type": "string"
            },
            "prefill": [
                "AAPL",
                "GOOGL",
                "AMZN"
            ],
            "editor": "stringList"
        },
        "max_depth": {
            "title": "Maximum depth",
            "type": "integer",
            "description": "Depth to which to scrape to",
            "default": 1
        }
    },
    "required": [
        "start_urls",
        "tickers"
    ]
}

Also, update the input.json file by changing the URL to the Google Finance page to prevent conflicts during execution or you can simply delete this file.

scraping Google Finance. INPUT-json

To run your Actor, run this command in your terminal:

apify run

The scraped results will saved in storage/datasets, where each ticker will have its own JSON file as shown below:

scraping Google Finance. Stored data

To deploy your Actor, first create an Apify account if you don’t already have one. Then, get your API Token from Apify Console under Settings → Integrations, and finally log in with your token using the following command:

apify login -t YOUR_APIFY_TOKEN

Finally, push your Actor to Apify with:

apify push

After a few moments, your Actor should appear in the Apify Console under Actors → My actors.

scraping Google Finance with Gf Scraper

Your scraper is now ready to run on the Apify platform. Click the "Start" button to begin. Once the run is complete, you can preview and download your data in various formats from the "Storage" tab.

scraping Google Finance with Gf Scraper. Apify storage.

Bonus: A key advantage of running your scrapers on Apify is the option to save different configurations for the same Actor and set up automatic scheduling. Let's set up this for our Playwright Actor.

On the Actor page, click on Create empty task.

scraping Google Finance with Gf Scraper. Creating a task

Next, click on Actions and then Schedule.

scraping Google Finance with Gf Scraper. Scheduling a task

Finally, select how often you want the Actor to run and click Create.

scraping Google Finance with Gf Scraper. Scheduled runs

Perfect! Your Actor is now set to run automatically at the time you specified. You can view and manage all your scheduled runs in the "Schedules" tab of the Apify platform.


To begin scraping with Python on the Apify platform, you can utilize Python code templates. These templates are available for popular libraries such as Requests, Beautiful Soup, Scrapy, Playwright, and Selenium. Using these templates allows you to quickly build scrapers for various web scraping tasks.


Does Google Finance have an API?

No, Google Finance does not have a publicly accessible API. Although it used to have one, it was deprecated in 2012. Since then, Google has not released a new public API for accessing financial data through Google Finance.

Conclusion

You've learned to use Playwright to interact with Google Finance and extract valuable financial data. You’ve also explored methods to avoid getting blocked and built a code solution where you simply pass ticker symbols (one or multiple), and all the desired data is stored in a CSV file. Additionally, you now have a solid understanding of how to use the Apify platform and its Actor framework for building scalable web scrapers and schedule your scraper to run at the most convenient times for you.

Deploy your scraping code to the cloud

From code to cloud and everything in between. Apify is a full-stack platform built to handle it all, so you don’t have to.

Sign up for free
Satyam Tripathi
Satyam Tripathi
I am a freelance technical writer based in India. I write quality user guides and blog posts for software development startups. I have worked with more than 10 startups across the globe.

Get started now

Step up your web scraping and automation