How to extract emails, phone numbers, and social profiles from websites

Natasha Lekh
Natasha Lekh

Searching for contact information on the web can be painful. If you’re lucky, finding the right email address may be a matter of a few clicks. But what if you also want to find phone numbers, Facebook pages, LinkedIn profiles, Twitter handles, and Instagram profiles? And what if you need to find hundreds of them? Today, everyone has different habits when it comes to their online presence, so getting in touch with someone often entails finding all possible ways of contacting them. Doing this job manually is a nightmare, especially if multiple web pages or websites need to be inspected. Luckily, you can automate this job using a technique called web scraping, which lets you automatically extract meaningful data from websites.

Apify's Contact Details Scraper will do the job of scraping contact information for you. In general, scrapers are a great tool for web scraping, automation, and data extraction tasks, including but not limited to scraping contact details. The scrapers we make are called actors, cloud programs running on the Apify platform, and you can find lots of them in our Apify Store. The job of these actors is to automatically crawl the web pages of your choice, scrape the contact information from them, and then save it so that you can download it in Excel, CSV, JSON, or HTML formats.

What can you do with all that contact details data?

So why would you scrape contact details data? Here are just some compelling reasons to use web scraping to collect contact information:

Data collection made simple, cost-effective and time-efficient

Scraping contact details is a much-requested solution from many marketing teams in every line of business. Why? Because web scraping emails, names and phone numbers means no more manual work to get the data. In this way, web scraping tools save the resources of the marketing and sales teams for tasks that truly matter and have the potential to make an impact on the success of their business. Contact Details Scraper can harvest contact data in minutes compared to what it would take a human. This results in lead generation being a much less complicated, but more affordable and time-efficient process than it would’ve been if data gathering was done otherwise.

Make data the cornerstone of sales and marketing work

You might be surprised, but good sales work is always about data first. An automated approach to harvesting contact details can easily be considered a first big step in organizing the whole workflow of filling up and maintaining a database of contacts, leads, and prospective customers. And if not a first big step, then at least a very important element of that workflow. Moreover, if the data that the scraper collects is of universal format and therefore can be integrated with other data processing and data classification tools, the lead generation process is on a solid path to getting automated.

Streamline your sales funnel with leads generation

Last but not least, contact scrapers to connect all lead generation channels into one: that can include anything from the company website and landing pages to event-specific content and social media. As the team gets comfortable with automated processes that are already in place, you can add more or adjust them to the new business challenges. And if that isn't enough to get you thinking about what you can do with the data, you can check out our industry pages for use cases and more inspiration.

Disclaimer: privacy of personal data on the web is of our highest priority and we are fully committed to adhering to the personal data protection legislation. Luckily, there are many ways to collect data ethically, legitimately and for a good cause. For instance, we’ve cooperated with NGOs and law enforcement institutions on investigations of juvenile sex trafficking. You can find much more information concerning data privacy on the web and how to create ethical scrapers in our recent post: Is web scraping legal.

Step-by-step guide to scraping contact details

Our Contact Details Scraper enables you to automatically scrape emails, phone numbers, as well as Facebook, Twitter, LinkedIn and Instagram profiles from web pages. In this short tutorial, we'll show you how to do that by scraping our very own website apify.com for contact details. Let's get started!

Sidenote: tech-savvy users will find the technical terms used for extraction at the end of the article.

1. Go to Apify’s website: https://apify.com

2. Sign in at the top-right corner using your email account, Google, or GitHub.

3. When you log in, you’ll be redirected to your Apify Console. Find the Store button and click on it. You'll be redirected to Apify Store.

4. On Apify Store, type Contact Details in the search bar. You'll see your scraper show up in the drop-down list. Click on it. You can also find that scraper among others in the marketing category.

5. Now you're on the page of the Contact Details Scraper. Scroll down or explore tabs to get familiar with its parameters and possibilities.

6. Now on this same Contact Information Scraper page, click on the Try me button.

7. You will be redirected back to the Apify Console and see the scraper's page there. At the top of the page, click the blue Create new task button. It will redirect you to new input parameters of your scraper. Note that your scraper can be found in the Actors tab on the left, because that's what we call our customized scrapers. You can read more about those terms here.

8.  The task is ready to be set. Now configure parameters in the fields that you see - the website URL, for example. You can even enter multiple website URLs and the actor will automatically scrape all of them. Let's scrape apify.com for all the emails and phone numbers that the website contains.

9. Under input, the actor has several input options that let you specify which pages shall be crawled:

  • Start URLs — A list of URLs of web pages where the crawler should start. You can enter multiple URLs, a text file with URLs, or even a Google Sheets document.
  • Maximum link depth — Specifies how many links away from the web pages specified in Start URLs shall the crawler visit. If zero, the actor ignores the links and only crawls the Start URLs.
  • Stay within the domain — If enabled, the actor will only follow links that are on the same domain as the referring page. For example, if this setting is enabled and the actor finds on a page http://www.example.com/some-page a link to http://www.another-domain.com/, it will not crawl the second page, since www.example.com is not the same as www.another-domain.com

Note that the actor accepts additional input options to specify proxy servers, limit the number of pages, etc. See Actor input for details. Once you’re all set with the input settings, click the Run button.

10. The scraper/actor will start running and you will then see a log where you can monitor its progress.

11. After the actor finishes the run, you can preview your contact scraping results by clicking on the Dataset tab.

c

12. You can download the results in various formats such as Excel, CSV, XML, HTML or JSON and then upload them into a database.

The technology behind the actor

The actor is built in Node.js and uses Apify SDK — an open-source web scraping and automation library. The full source code of the actor is available on GitHub.

When started, the actor loads the web pages provided in Start URLs. It does so using Google’s headless Chrome browser, with the help of the Puppeteer library. Through Puppeteer, the actor is able to simulate user inputs on the web page, such as clicks and scrolling. The actor looks for any links to other pages on the website and crawls them recursively, easily using the PuppeteerCrawler class provided by Apify SDK.

Once the web page has loaded, the actor downloads the web page’s HTML source code. By using headless Chrome, the downloaded HTML represents the actual content of the web page that the user would see, including dynamic content loaded using AJAX. This allows the actor to extract all contact details as they are presented on the pages.

To extract contact details from the HTML, the actor harnesses the power of regular expressions. Below are a few of the regular expressions used to extract contact information from HTML (ECMAScript / JavaScript format). You can test them on regex101.com. Note that all the expressions are provided by the Social Utils in Apify SDK (see source code).

Final points

If you need to have more control over the crawling and data extraction process, you can fork the actor on GitHub and build your own version, or go for a customized version from us. For more details on building your own scraper, see our Actors documentation and our Apify SDK library.

Just to reiterate - it's important to stay informed and vigilant about how we share our personal data on the web, including but not limited to scraping contact details data. Personal data on the internet is a very sensitive and progressive subject that should be discussed more in the IT and legal communities, which is why we’ve done our research as well and are fully committed to the goal of ethical web scraping.

And that’s everything you need to know to get started using this actor. Be sure to check out other actors in Apify Store - they're just as great. And if you come up with anything fascinating on how to use that scraped data, we’ll be happy to hear about that from you on Twitter :)





Great! Next, complete checkout for full access to Apify
Welcome back! You've successfully signed in
You've successfully subscribed to Apify
Success! Your account is fully activated, you now have access to all content
Success! Your billing info has been updated
Your billing was not updated