Ask any developer who has built bots—whether for automation or web scraping—and they'll tell you that Cloudflare is the biggest challenge. That's because their anti-bot solutions are highly effective and continuously updated to thwart most automated scripts.
Plus, with the rise of AI agents, more and more sites are adopting Cloudflare to protect against bots. So it's becoming an increasingly critical hurdle.
In this guide, you'll learn how to build a solution to bypass Cloudflare efficiently and consistently across any protected site. We'll walk you through the entire process, from setting up your project to testing your Cloudflare-ready scraper in the cloud!
What are Cloudflare's anti-scraping defenses?
Cloudflare is a global network that many websites rely on for performance and web security services. In detail, it's largely known for protecting sites from malicious bots and abuse.
Specifically, its anti-scraping solutions pose a serious challenge through a mix of defenses: JavaScript-based challenges, IP reputation filtering, TLS fingerprinting, behavior analysis, rate limiting, and even an AI Labyrinth designed to stop AI crawlers.
If your scraper triggers any of these protections, you might receive an HTTP 403 Forbidden
or 429 Too Many Requests
error—or be denied access entirely.
How does Cloudflare detect bots?
To identify bots, Cloudflare relies on both server-side and client-side methods. If a request comes from an untrusted IP or looks suspicious, it gets blocked immediately. Otherwise, Cloudflare serves a client-side protection page to the browser.
If the client passes the background checks performed in the browser, access is granted automatically. If not, or if verification is required, a one-click Turnstile challenge appears:
Data about the challenge interaction is collected, analyzed locally, and sent to Cloudflare's servers to determine whether the visitor is a bot or a legitimate user.
Server-side detection techniques:
- IP reputation and ASN checks
- Header anomalies and malformed requests
- TLS fingerprint mismatches and JA3/JA4 analysis
Client-side detection techniques:
- JavaScript challenges and browser fingerprinting checks (e.g., canvas, fonts, WebGL)
- Mouse movement and interaction tracking when clicking on the Turnstile checkbox
- Cookie and local storage verification
How to bypass Cloudflare
In this guided section, you'll learn how to systematically bypass Cloudflare using Crawlee, Playwright, and Camoufox.
And what better site to test this setup on than Cloudflare itself? If we can bypass their anti-bot protection on their own domain, we can bypass it anywhere. Specifically, we'll target the Top Developer Discussions page from the Cloudflare Community forum:
As you'll notice, this site is protected by Cloudflare's anti-bot solution. Once we successfully bypass the bot protection, we'll also scrape some data from the page.
The entire process will involve the following steps:
- Prerequisites and project setup
- Get familiar with the "Crawlee + Playwright + Camoufox" template
- Residential proxy configuration
- Connect to the target page
- Implement the scraping logic
- Collect the scraped data
- Complete code
- Run the scraper
Let's dive in!
1. Prerequisites and setup
The easiest way to bypass Cloudflare protections is by using the "Crawlee + Playwright + Camoufox" template available on Apify. With this approach, you'll build a cloud-based scraper that can bypass Cloudflare every time—without worrying about local setup.
Everything—from configuration to coding and deployment—will be handled entirely in the Apify online platform.
To follow this approach, make sure you have:
- An Apify account
- A basic understanding of how Apify works
- Knowledge of what Crawlee is and how it works, especially of
PlaywrightCrawler
- Familiarity with how data parsing works in Playwright
- A basic understanding of what Camoufox is and how it works
- Knowledge of async programming in JavaScript
Now, to initialize your Cloudflare-ready scraping project on the Apify platform:
- Log in
- Reach the Console
- Under the "Actors" dropdown, select "Development" and click the "Develop new" button:
Next, select the "View all templates" option:
In the "JavaScript" section, click on the "Crawlee + Playwright + Camoufox" card:
Inspect the starter project code and select "Use this template" to fork it:
Wait while Apify creates a new Actor based on the selected template.
You'll then be redirected to the Apify Web IDE, where you can customize your Actor. For example, name it "Cloudflare Community Scraper." Also, you can write your scraping logic directly in the browser—no need to install libraries or set up a local environment:
Under the hood, the "Crawlee + Playwright + Camoufox" template automatically sets up and integrates the following libraries for you:
- Crawlee: A web scraping and automation library for Node.js and Python, simplifying the process of building reliable crawlers and scrapers with features like request management and task scheduling.
- Playwright: A Node.js library developed by Microsoft for reliable cross-browser automation, enabling programmatic interaction with web pages.
- Camoufox: An open-source anti-detect browser built for robust fingerprint injection and advanced anti-bot evasion. It's a stealthy, minimalistic, custom build of Firefox, purpose-built for web scraping, and it integrates natively with Playwright.
2. Get familiar with the "Crawlee + Playwright + Camoufox" template
Before jumping into coding, you should first get familiar with the existing code. In the Web IDE, you'll notice that the src/
folder contains two key files:
main.js
: Handles the Crawlee initialization logic, including integration with Playwright and Camoufoxroutes.js
: Defines the Crawlee route-handling logic, where the actual scraping behavior is implemented
Here's what main.js
should look like:
As you can see, the template has already configured PlaywrightCrawler
to work with Camoufox for you. This means your Actor will automatically render pages using a Camoufox instance controlled by Playwright.
And this is route.js
:
3. Residential proxy configuration
In main.js
, notice how the Crawlee instance is already set up to work with Apify proxies:
const proxyConfiguration = await Actor.createProxyConfiguration();
ProxyConfigurationOptions.proxyUrls
option.In the Crawlee + Playwright + Camoufox setup, connecting through a residential proxy is key to bypassing Cloudflare. The reason is that, if you deploy your code on a VPS or in a data center, your scraper will trigger Cloudflare's server-side bot detection. That's because the IPs from such servers typically have a low trust score and are commonly flagged as suspicious.
Thus, Cloudflare will block your request. After all, while Camoufox provides advanced anti-bot capabilities at the browser level, it cannot compensate for a low-trust IP.
If you don't operate on trusted IPs, your Camoufox-based Cloudflare bypass script will likely fail with the following 403 error:
ERROR PlaywrightCrawler: Request failed and reached maximum retries. Error: Request blocked - received 403 status code.
To avoid that, your scraper must rely on reliable exit IPs, such as those offered by residential proxies. Apify provides residential proxies even on the free plan, so you do not have to pay to use them.
To view the available proxies in your Apify account, click on the "Proxy" link and switch to the "Groups" section:
You'll see a group named "RESIDENTIAL" available by default:
You can configure your scraper to use that group like so:
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'], // connect to residential proxies
countryCode: 'US', // optional: restrict to U.S. IPs
});
In this case, we also specified that we only want IPS from the United States.
With this setup, your scraper will now:
- Use Camoufox to evade browser-based detection
- Operate over trusted residential IPs, improving your chances of bypassing Cloudflare every time
4. Connect to the target page
In main.js
, you'll notice that if no start URLs are defined in the Actor's input, it defaults to the Apify homepage. Change that to your actual target URL, which is the Cloudflare Community site:
const {
startUrls = ['https://community.cloudflare.com/c/developers/39/l/top?period=yearly'],
} = await Actor.getInput() ?? {};
Next, in routes.js
, remove the addHandler()
method for 'detail'
pages. Also, simplify the logic by replacing the callback in addDefaultHandler()
with a function that logs the raw HTML content of the page. Your routes.js
file should now look like this:
import { Dataset, createPlaywrightRouter } from 'crawlee';
export const router = createPlaywrightRouter();
router.addDefaultHandler(async ({ request, page, log }) => {
// retrieve the page HTML content and log it
const html = await page.content();
log.info(html);
});
Click "Save & Build" to build your Actor for the first time:
Reach the "Input" tab and paste the Cloudflare-protected URL you want to scrape in the "Start URLs" field:
Now, press "Save & Start" to run your Cloudflare Community Scraper Actor.
If everything works as intended, your Actor should bypass Cloudflare and log the full HTML of the target page. If not, you'll see a 403 Forbidden
error instead.
Below is what you should see in the full log of your run:
As you can tell, the HTML of the Cloudflare-protected page was successfully retrieved and logged. That confirms that the setup works perfectly for bypassing Cloudflare.
5. Implement the scraping logic
Now that you've confirmed you can connect to the target page without issues, it's time to scrape some real content from it.
Specifically, we'll extract discussion topics from the Cloudflare Community target page. These are visible after scrolling a bit down the page:
First, open the target site in Incognito mode in your browser (to ensure a clean session), right-click on the discussion table, and choose the "Inspect" option:
If you're not familiar with how to use browser DevTools, read our guide on inspecting elements.
You'll see that each discussion thread is represented as an .topic-list .topic-list-item
HTML element.
In the addDefaultHandler()
method, use a Playwright locator to select all topic rows and loop through them:
const topicElements = await page.locator('.topic-list .topic-list-item');
for (const topicElement of await topicElements.all()) {
// scraping logic...
}
Now, focus on the content inside each row of the discussion table. Start by analyzing the cells on the left:
Next, take a look at the cells on the right:
Note that, from each thread, you can extract:
- The discussion title from the text of the
.main-link a.raw-topic-link
node - The relative URL to the discussion page from the
href
attribute of the same element - The number of replies from
.posts
- The number of views from the
.views
node - The last activity date (in UNIX format) from the
data-time
HTML attribute of the.activity span
element
Implement the scraping logic with the following code:
const titleElement = topicElement.locator('.main-link a.raw-topic-link');
const title = (await titleElement.textContent())?.trim();
const url = `https://community.cloudflare.com${await titleElement.getAttribute('href')}`;
const repliesElement = topicElement.locator('.posts');
const replies = (await repliesElement.textContent())?.trim();
const viewsElement = topicElement.locator('.views');
const views = (await viewsElement.textContent())?.trim();
// convert the UNIX timestamp to ISO date
const activityElement = topicElement.locator('.activity span');
const unixTime = await activityElement.getAttribute('data-time');
const date = unixTime ? new Date(Number(unixTime)).toISOString() : null;
These few lines are enough to extract the key discussion data from the Cloudflare Community forum. Adapt this logic to match the structure of your own Cloudflare-protected target site and the specific data points you're interested in.
6. Collect the scraped data
Right now, your scraped data is stored in a JavaScript object. To save it to the Dataset of your Apify Actor, use the Dataset.pushData()
method:
await Dataset.pushData(topic)
This way, the scraped data will become available via API or downloadable in several formats (JSON, CSV, Excel) from the Apify Console.
Now, the target page might contain a large number of discussion threads. Since this is just a test to show how to bypass Cloudflare, it's a good idea to limit the number of topics you scrape:
if ((await Dataset.getData()).total >= 30) {
return
}
This limit isn't strictly required, but it helps keep test runs fast and manageable.
Great! The Cloudflare-bypass and scraping logic is complete.
7. Complete code
This is what your main.js
file should contain:
import { Actor } from 'apify';
import { PlaywrightCrawler } from 'crawlee';
import { router } from './routes.js';
import { firefox } from 'playwright';
import { launchOptions as camoufoxLaunchOptions } from 'camoufox-js';
// Initialize the Apify SDK
await Actor.init();
const {
startUrls = ['https://community.cloudflare.com/c/developers/39/l/top?period=yearly'],
} = await Actor.getInput() ?? {};
const proxyConfiguration = await Actor.createProxyConfiguration({
groups: ['RESIDENTIAL'], // connect to residential proxies
countryCode: 'US', // optional: restrict to U.S. IPs
});
const crawler = new PlaywrightCrawler({
proxyConfiguration,
requestHandler: router,
launchContext: {
launcher: firefox,
launchOptions: await camoufoxLaunchOptions({
headless: true,
// custom Camoufox options...
}),
}
});
// launch the Apify crawler
await crawler.run(startUrls);
// exit successfully
await Actor.exit();
And this is what the router.js
file should hold:
import { Dataset, createPlaywrightRouter } from 'crawlee';
export const router = createPlaywrightRouter();
router.addDefaultHandler(async ({ request, page, log }) => {
// select all topic elements in the topic table
const topicElements = await page.locator('.topic-list .topic-list-item');
// iterate over them and apply the data parsing logic
for (const topicElement of await topicElements.all()) {
// scraping logic
const titleElement = topicElement.locator('.main-link a.raw-topic-link');
const title = (await titleElement.textContent())?.trim();
const url = `https://community.cloudflare.com${await titleElement.getAttribute('href')}`
const repliesElement = topicElement.locator('.posts');
const replies = (await repliesElement.textContent())?.trim();
const viewsElement = topicElement.locator('.views');
const views = (await viewsElement.textContent())?.trim();
// convert the UNIX timestamp to ISO string
const activityElement = topicElement.locator('.activity span');
const unixTime = await activityElement.getAttribute('data-time');
const date = unixTime ? new Date(Number(unixTime)).toISOString() : null;
// populate a new topic with the scraped data
const topic = {
title: title,
url: url,
replies: replies,
views: views,
date,
}
// append the scraped data to the Apify dataset
await Dataset.pushData(topic)
// avoid scraping more than 30 topics as this is just an example...
if ((await Dataset.getData()).total >= 30) {
return
}
}
});
8. Run the scraper
In the Apify Console, run your Actor by pressing the "Save, Build & Start" button:
Once the run is complete, move to the "Last run" tab, and you should be able to see the results as follows:
This contains the desired Cloudflare-protected data scraped from the Community forum.
Switch to the "Storage" tab to export the scraped data:
From here, you can export your scraped data in various formats—such as JSON, CSV, XML, Excel, HTML Table, RSS, and JSONL.
And that's it! You've successfully bypassed Cloudflare and scraped your target site.
Conclusion
In this tutorial, you learned how to bypass Cloudflare using a setup based on Crawlee, Playwright, and Camoufox—a quite new open-source anti-bot browser technology that's quickly gaining popularity.
As shown here, deploying your Cloudflare-bypass scraper to Apify simplifies the setup process and makes it easier to integrate your script with residential proxies—which are required for consistent results. To explore more web scraping and automation capabilities, check out the available code templates.
Frequently asked questions
Can you bypass Cloudflare?
Yes, you can bypass Cloudflare by using open-source, free tools like Crawlee, Playwright, and Camoufox. This setup mimics real user behavior to evade detection by anti-bot systems. For consistent results when deploying on a VPS or in the cloud, residential proxies may be required.
Why does Cloudflare block my IP?
Cloudflare blocks your IP because it either has a history of suspicious activity or a low reputation. This often happens when deploying scrapers on data centers and VPS. The reason is that the IPs of those servers are sequential. Thus, Cloudflare's anti-bot defenses can easily flag them as non-human traffic.
Is it easy to bypass Cloudflare?
No, it isn't easy to bypass Cloudflare because it uses advanced anti-bot techniques. Still, with the right tools, it's definitely possible. With solutions like Camoufox and Apify's Crawlee, you can achieve your goal. Remember that Apify supports startups with 30% off the Scale plan to help them grow using web data.