Email might seem like old hat these days, but with an estimated 3.9 billion email users worldwide, building an email database remains critical for any modern business.
What is the most efficient way to build such a database? With email scraping, or web scraping – a method of automating data collection from websites.
In this article, I'll show you how to build an email address database for almost any niche using Apify, a highly versatile platform for web scraping. It provides pre-built tools (called Actors), which are serverless cloud programs that you can utilize for web data collection and automation, including email scraping. You can use these Actors to quickly set up a data-extraction pipeline.
The process of building an email address database with Apify requires 3 steps:
Find relevant leads for your niche
Extract email addresses
Check the validity of the email addresses
To make this article more interesting, I'll use a concrete example. Let’s say I want to boost my branding by being a guest on some podcast. As my expertise is growth hacking and lead generation, I'd need to find podcasts focused on this niche. Furthermore, I'd need a list of emails to automate my outreach easily.
Step 1: Find relevant leads for your niche
This is the most complex step, and it will require some research.
To find a list of podcasts, I just googled the best website to host a podcast and ended up with a few results:
Now, I could develop a web scraper for each of those websites to get a list of podcasts focusing on growth hacking, but that would be highly time-consuming. Thankfully, there's a faster way: use an off-the-shelf web scraper from Apify Store.
First of all, I want to check if there are any ready-made Actors for those websites:
No Actors are available for those specific websites, but no worries: there's a workaround. Let me introduce to you one of the most powerful Apify Actors: Google Search Results Scraper. It's an Actor you can use to scrape any Google search results.
You can also use this Actor to get data from specific websites. Google offers a wonderful search parameter, “site:”. If you search “site:https://pod.fan/ growth”, you'll get results only from pod.fan website.
This approach won’t be 100% exhaustive, as you're limited to links indexed by Google. However, it lets you quickly get data about a specific website without investing in a custom scraper.
Pro tip: If you're looking for an email specifically, add “@” to your search. It will limit the search to pages that are the most likely to contain an email address.
Step 2: Scrape email addresses
Once you build a list of relevant web pages with the Google search export, you can try to harvest email addresses from those.
There's another free-to-use Apify Actor for that: Contact Details Scraper. This will extract any email address present on a web page.
These two Actors also provide useful information like LinkedIn and Twitter profiles. People tend to be reluctant to share their email addresses online, so if you limit yourself only to email addresses, you'll greatly restrict the database. You might want to combine your email outreach with some LinkedIn or Twitter outreach to get a wider database.
Step 3: Validate email addresses
By now, you should have a small database of email addresses. However, you should NOT use those emails straight away.
Many email addresses that you'll find online are spam traps. They're email addresses purposefully shared by email providers like Gmail to identify spammers. If you send an email to those spam traps, your emails will start landing in spam.
Addresses scraped from the internet also have a high chance of bouncing. A high bounce rate in your email outreach can hurt your email deliverability.
Whenever you're collecting email addresses from the internet, you want to check their validity before using them. Thankfully, there's also an Apify Actor for that.
The EmailListVerify Actor will check any email and tell you if it's safe to use for outreach. You'll need to buy credits from EmailListVerify, but it will allow you to avoid spam traps and hard bounces.
Pro tip: If you plan on wide-scale email outreach, you should use an email warmup service like WarmpupInbox. This tool generates activity in your inbox to make it more trustworthy for email providers like Gmail.