Did you know that, technically, basic web crawling and SEO monitoring share the same background? The reason for this is simple: in order to identify broken links, you need to crawl and parse the website first. It's impossible to detect bad links without first crawling the content. So making smart SEO moves is a logical extension of web scraping. In this tutorial, we'll show you how, with a basic link checking tool, web scraping can take your search engine optimization to a whole new level.
Why is it bad to have broken links on a website?
Let's get the obvious things out of the way: there are two main reasons why such a seemingly small thing as a few broken links are damaging to the SEO health of a website. For a user, it's frustrating; for a website creator, it's potentially dragging down their work on the mysterious Google charts. This is why battling 404 error issues, and similar errors on websites is a priority task for a website owner.
🧑💻 User experience. The web is constantly in flux, so as time passes, there's an increasing chance for the backlinks to be replaced. The same goes for the internally produced URLs as the catalog of the services, product descriptions, and content in general expands and transforms. So it is natural that when the domain gets old enough, the dead links might and will appear more often. Now, should they? For a user who doesn't need to care for the internal URL kitchen of your domain, coming across a page with dead links might suggest poor maintenance and a general UX feeling of being dated. So users will most likely just vote with their feet (or cursors).
⚙️ Search engine algorithms. In a perfect internet world, a website visitor should never see a Page Not Found message. An all-seeing search engine should enjoy the same experience - to be able to help your website show up in the ranks. Because while a website user may just shrug and walk away from a website (bad enough as it is), the Googlebot crawler will take digital notes and cast down the website's rating - thus lowering its chances of being found in the first place. There's a whole art to keeping the Google algorithms happy, and running a regular broken links check is a part of mastering it.
These days, nobody has time to manually check links every other week just to see if all of them work. Using a tool to monitor broken links is one of the easiest ways to maintain a good user experience, keep SEO indicators at a decent level, and generally provide the value the user expects from the website. There are many tools to check links and keep track of their performance, so naturally, some are better than others, some are more expensive than others, and some are free! Here's our free Broken Links Checker and instructions on how to use it.
What a basic broken links checker should be able to do:
- Carry out a basic SEO audit of the website.
- Scrape the main domain and subdomains.
- Monitor both inbound and outbound links.
- Check all links and identify broken fragments.
- Automate link monitoring by crawling the website daily, weekly, or monthly.
- Present link checking results in various formats.
How a link checker works
This tool starts the link inspection with an original URL and then moves down the hierarchy of URLs. So, for instance, if the crawler starts at
https://www.example.com/something, then you can expect it to also check all the pages down the line, such as:
https://www.example.com/something/index.html https://www.example.com/something/else https://www.example.com/something/even/more/deeper/file.html
Another thing that the crawler will do is analyze whether the links to other pages are working or not. For example, if the page contains a link to
.../another/page#anchor, the crawler will do a triple check: open the page
.../another/page, confirm whether it loads, and make sure it includes the
#anchor part. If the link doesn't match this little checklist, the link checker will flag it and include it in the report.
Armed with all that SEO knowledge, here's a small action plan that you can carry out:
- Make a basic SEO audit or dig deeper into the subdomains.
- Choose to check the whole website or particular pages.
- Get a report containing an assessment of all links or just the broken ones.
- Automate the link checks or launch the link inspection manually, on-demand.
- Quickly zero in on broken URL fragments and fix them.
- Prevent link rot at early stages.
- Find broken links in your competitor's web resources and use them to your advantage by creating link-building opportunities.
And that's just for starters! You can easily customize our Broken Links Checker to suit your SEO-related needs. No need to install a plugin, some browser extension, or activate the downloaded software, no credit card required. Just create a free Apify account and let's try how this simple SEO tool works, step-by-step.
How to use Apify Broken Links Checker
Find your actor
1. Go to the Broken Links Checker page among the SEO tools in Apify Store and click the ▶️ Try for free button.
2. If you’re not signed in, you’ll land on the sign-up page (otherwise, feel free to skip to Step 3👇). Sign up using your email account, Gmail, or GitHub, and get your free Apify account.
3. You will be redirected to Apify Console. Apify Console is your workspace to create tasks for your scrapers, crawlers, actors, and integrations. Let's click again the Start your free trial button and do some SEO checks.
4. Broken Links Checker only requires a few input fields:
- URL of the website to be checked for bad links. In our example, we're crawling Apify Blog.
- Your email address to receive the report with identified links needing your attention.
- Two toggle buttons to keep in mind: Save only broken links to exclude thousands of healthy links from the report, and crawl subdomains to dive deeper into if needed.
- You can also limit the number of crawled pages in Max pages field unless you want a thorough website check.
5. Once you’re all set, click the Start button. Notice that your task will change its status to Running, so wait for the crawler's run to finish. It might take some time for the crawler to go through all the links before seeing the status turn into Succeeded. For instance, our Apify Blog check took 59 minutes to complete.
6. Now click on the Key-value store tab to see how many broken links you've got. You can download the list of problematic links either as a machine-readable JSON report or an easy-to-read HTML table. You can also preview the data by clicking the ⤴️ View button.
7. You can download the SEO report onto your computer for further use.
8. Don't forget to check your inbox! If you also included your email address in the input field, the full broken links report will appear there as soon as the run finishes 🚀
Schedule broken link crawls
8. Bonus: schedule the scraper to run once per a defined period: daily, weekly, monthly. You will receive the broken links notification paired with a report in your inbox once the job is done.
Get started with Broken Links Checker right now and click this button:
Other SEO tools to track website performance
With regular monitoring, purging, and link replacements, the day may come soon when our link checking tool will show no results for the broken links on your website. Automation tools are the foundation of keeping the website's SEO health score optimal, and combining a few of them will improve it. Here are several other free SEO monitoring tools in Apify:
1. SEO Audit Tool 🔗 for a more comprehensive SEO inspection.
2. Google Search Results Scraper 🔍 for monitoring the position on Google SERPs.
3. Content Checker ☑️ for notifying you when website content changes (any part of the website).
All of these tools are free to use and make up a great combo to make your SEO journey easier. They will crawl the website data and deliver it to you in machine-readable formats such as Excel, JSON, CSV, or XML. Those file formats are made to fit the apps, data analysis tools, and data projects.