Email might seem like old hat these days, but with an estimated 3.9 billion email users worldwide, building an email database remains critical for any modern business.
What is the most efficient way to build such a database? With email scraping, or web scraping – a method of automating data collection from websites.
In this article, I'll show you how to build an email address database for almost any niche using Apify, a highly versatile platform for web scraping. It provides pre-built tools (called Actors), which are serverless cloud programs that you can utilize for web data collection and automation, including email scraping. You can use these Actors to quickly set up a data-extraction pipeline.
The process of building an email address database with Apify requires 3 steps:
- Find relevant leads for your niche
- Extract email addresses
- Check the validity of the email addresses
To make this article more interesting, I'll use a concrete example. Let’s say I want to boost my branding by being a guest on some podcast. As my expertise is growth hacking and lead generation, I'd need to find podcasts focused on this niche. Furthermore, I'd need a list of emails to automate my outreach easily.
Step 1: Find relevant leads for your niche
This is the most complex step, and it will require some research.
To find a list of podcasts, I just googled the best website to host a podcast and ended up with a few results:
Now, I could develop a web scraper for each of those websites to get a list of podcasts focusing on growth hacking, but that would be highly time-consuming. Thankfully, there's a faster way: use an off-the-shelf web scraper from Apify Store.
First of all, I want to check if there are any ready-made Actors for those websites:
No Actors are available for those specific websites, but no worries: there's a workaround. Let me introduce to you one of the most powerful Apify Actors: Google Search Results Scraper. It's an Actor you can use to scrape any Google search results.
You can also use this Actor to get data from specific websites. Google offers a wonderful search parameter, “site:”. If you search “site:https://pod.fan/ growth”, you'll get results only from pod.fan website.
Learn other Google search and scraping tricks in How to scrape Google like a pro
This approach won’t be 100% exhaustive, as you're limited to links indexed by Google. However, it lets you quickly get data about a specific website without investing in a custom scraper. You can put the found database in sheets and compress pdf files into one so that it won't take much space, and you will find all you need in one place.
Step 2: Scrape email addresses
Once you build a list of relevant web pages with the Google search export, you can try to harvest email addresses from those.
There's another free-to-use Apify Actor for that: Contact Details Scraper. This will extract any email address present on a web page.
These two Actors also provide useful information like LinkedIn and Twitter profiles. People tend to be reluctant to share their email addresses online, so if you limit yourself only to email addresses, you'll greatly restrict the database. You might want to combine your email outreach with some LinkedIn or Twitter outreach to get a wider database.
Step 3: Validate email addresses
By now, you should have a small database of email addresses. However, you should NOT use those emails straight away. Before reaching out, run your list through a phishing link checker to ensure no malicious links are present.
Many email addresses that you'll find online are spam traps. They're email addresses purposefully shared by email providers like Gmail to identify spammers. If you send an email to those spam traps, your emails will start landing in spam.
Addresses scraped from the internet also have a high chance of bouncing. A high bounce rate in your email outreach can hurt your email deliverability. So, you can integrate a LinkedIn email finder to personalize and minimize bounce rates and ensure the effectiveness of your outreach campaigns.
Whenever you're collecting email addresses from the internet, you want to check their validity before using them. Thankfully, there's also an Apify Actor for that.
The EmailListVerify Actor will check any email and tell you if it's safe to use for outreach. You'll need to buy credits from EmailListVerify, but it will allow you to avoid spam traps and hard bounces.
Summary of the whole process
Let's end with a quick recap:
Once you've done some research to find leads for your niche, use an Apify Actor such as Google Search Results Scraper or Contact Details Scraper to automate email data collection. You can use these to also collect data from LinkedIn and Twitter to expand your database, or you can use any of the social media scrapers on Apify Store.
Once you've got your email contacts, use EmailListVerify to validate the addresses to avoid spam traps and hard bounces.