Using a man-in-the-middle proxy to scrape data from a mobile app API

In this technical post, you’ll learn how to set up a man-in-the-middle proxy and install a self-signed certificate on your mobile phone in order to intercept HTTPS communication between any mobile app and its backend API.

Content

Man in the middle

Data drives today's world — you need it to make business decisions, create marketing or to design new products. One of the most popular ways to collect data from the outside world in an automated way is by extracting it from websites — a technique called web scraping. It involves methods such as crawling, HTML parsing, snooping XHR requests or executing JavaScript. However, these methods are becoming less efficient, as websites become more complicated or employ various anti-scraping protections such as CAPTCHA.

CAPTCHAs: why they’re bad UX and how they get bypassed
Apify explores CAPTCHA and the dark side of captcha solvers.

Fortunately, since there are increasingly more mobile apps supplementing or even replacing traditional websites, a new, highly-efficient way to collect data is emerging — directly tapping into mobile app APIs, also known as mobile API scraping. With this technique, you can get a list of participants from your favorite meetup app, automate a food delivery order or extract a list of hotels and their prices from a hotel booking app.

The great thing about mobile APIs is that they are very concise and efficient, and typically employ far fewer anti-scraping protections than websites. Many mobile apps do not require a login to fetch and show data and only use IP address rate limiting to block access for bots, which can be easily circumvented using proxies. In other words, scraping data from mobile APIs is extremely efficient.

In the following sections, you’ll learn how to get started scraping mobile APIs. All the steps are demonstrated on an Apple iPhone with iOS 12.3 and MacBook Pro with macOS 10.14.1. If you are using Windows or Android, or have a different OS version, you might need slightly different tooling, but the principles are the same.

1) Set up an HTTP proxy server on a computer

To intercept requests that are going out of the phone app to an external backend API, we’ll need to set up an HTTP proxy on a computer to which the phone will connect. Most mobile apps use HTTPS encryption to communicate with their backend APIs, so we’ll need to effectively perform a man-in-the-middle attack on your own phone to be able to intercept the traffic. No worries, this is very safe 😎 There are many tools that can do this job, but here we’ll use mitmproxy. It can be easily installed using Homebrew by running the following command in the macOS terminal:

$ brew install mitmproxy

You can check whether mitmproxy was correctly installed by running:

$ mitmproxy

If everything goes well, you should see a window like the following. Don't worry if you don't see any requests yet.

mitmproxy after startup

2) Connect your phone to the HTTP proxy

Now it’s time to set up the phone. First of all, ensure your phone is connected to the same Wi-Fi network as your computer so that they can see each other. Then on your iOS device, go to SettingsWi-Fi and click on the current Wi-Fi network:

hilighted infortaion icon on the apify network in wifi settings of an iphone.
Wi-Fi network details

Then click Configure Proxy at the very bottom:

Configure HTTP proxy for the Wi-Fi network

Select the Manual option so that you can enter the Server and Port properties:

Set a manual HTTP proxy for the Wi-Fi network

The Server property is the internal IP address of your computer, where the HTTP proxy is running. You can obtain it by running the following command in your macOS terminal:

$ ifconfig

Look for the en0 network adapter, which is usually assigned an internal IP address in the 192.168.x.x range.

The Port property indicates the TCP port number where mitmproxy is running, by default it is 8080.

Once you enter the Server and Port settings, you should start seeing requests going through the proxy in the mitmproxy app on your computer. If you don’t see any requests, make sure your phone can access your computer, e.g. that there is no firewall blocking the access.

While you can see the requests flowing through the proxy, most likely they use the HTTPS protocol with SSL/TLS encryption, so you won’t be able to see the content of the requests. We will fix that in the next step.

3) Install a self-signed root SSL/TLS certificate on the phone

Now, on your phone, open http://mitm.it/ in a browser. You should see a list of operating systems. Just click on Apple:

Downloading a self-signed TLS/SSL certificate from http://mitm.it/

The certificate should download to your phone.

To install the certificate, navigate to the Settings of iPhone and below your Account name should row with text Profile Downloaded:

Profile downloaded iphone settings.
Where to find the installation of the downloaded profile

Click on it and follow the installation process. You can find more details about how to install the downloaded profile here.

In the next step, you’ll need to enable it. On your iOS device, go to SettingsGeneralAboutCertificate Trust Settings:

Enabling self-signed root SSL/TLS certificate on iOS

WORD OF CAUTION: Once you complete the following step, all traffic from your phone can be intercepted and monitored from the computer running mitmproxy, including login credentials to all your apps. Only do this if you trust the computer, and close all the mobile apps that you don’t want to be monitored. Once you’re done, make sure to disable the certificate again. Never share the logs from mitmproxy with untrusted parties!

Now the only thing you have to do is to enable the mitmproxy certificate:

cerificate trust setting on iphone.
Enabling self-signed root SSL/TLS certificate from mitmproxy

4) Scraping the mobile app API

Now it’s time for the exciting hacking part! First, install and open the Swiggy app on your phone. When you open the app, you should see unencrypted HTTP requests flowing through the mitmproxy tool:

Swiggy requests from IOS mobile app

From this initial screen, you can already get a good idea of what API endpoints are used by the mobile app. You can probably see a few REST endpoints. Just select one and view its details.You can now see all the necessary details to replicate that request.

Restaurant list from Swiggy mobile app

Now comes a bit of trial and error where you’ll be reverse-engineering the mobile API. You have to figure out what specific API endpoints do, which query parameters and HTTP headers are required by them and which can be omitted. In case of this one all you need to do is to pass the correct user-agent header.

And that's it you can now get all restaurants for a particular location!

If you get stuck, something is unclear or if you find some other problem, please let us know so that we can improve this tutorial. Any feedback would be greatly appreciated!

You can reach us at https://twitter.com/apify

Petr Pátek
Petr Pátek
Full-stack developer at Apify.

Get started now

Step up your web scraping and automation