Hey, we're Apify. The Apify platform gives you access to 2,000+ data extraction tools and unofficial APIs. Check us out.
Google Lens is an image recognition tool combining image search, object identifier, and OCR technologies. Turned into an API, the opportunities for its use can be quite exciting, from simple document digitization to machine learning.
📸 What is Google Lens for?
Google Lens is an image recognition tool able to find information about objects using nothing but visual input. As the web is firmly on the visual-first track, being able to not only search using an image reference but also pull up contextual search results around it is not only convenient, it’s expected — both in our phones and on our laptops.
Google Lens is your best companion for the following tasks related to image data:
- Text detection and OCR: recognize the writing on an image and extract its data.
- Language detection and translation: identify the language of the text on the image, and then translate it.
- Accessibility and alt text: find the alt text of the image.
- Recognizing image type: identify what the image is about even with no text on it.
- Image search and product search: find images and items similar to the ones you’ve provided.
🗿 Google Lens alternatives: comparison with Rosetta
There are a few alternatives to Google Lens, with Meta’s Rosetta firmly established as one of them. As many pictures shared across Instagram and Facebook contain text, having a text recognition AI of its own was a sensible idea for Meta. Since simply recognizing characters across different languages wasn’t enough, Meta needed to pair text detection with a capable object recognition system. Thus, a large-scale learning system, Rosetta, was created. However, even though Rosetta's results in context reading are quite impressive, they can’t outperform Google Lens.
Let's compare the two models based on a few randomly scraped Instagram posts of a Korean restaurant named 033:
Rosetta: Photo by 033 in 033. May be an image of drink and indoor. Google Lens: Jack Daniels |
Image source 🔗 |
Rosetta: Photo by 033 in 033. Google Lens: image type Link https://www.knobcreek.com/our-products `OCR text KNOB CREEK,KENTUCKY STRAIGHT BOURBON WHISKEY,GMALL MAYEN,MAAL 100 PROOF,Manl,W HI,CLERMONT, KENTUCKY |
Image source 🔗 |
Rosetta: Photo by 033 in 033. May be an image of drink and indoor. Google Lens: Gin and tonic Even a photo without text or brand label was correctly recognized by Google Lens as an alcoholic drink (this doesn't happen always of course, but is amazing nonetheless, right?) |
Image source 🔗 |
Rosetta: Photo by 033 in 033. Google Lens: Woodford Reserve Kentucky Straight Bourbon Whiskey |
Image source 🔗 |
In a few cases, Rosetta was just comparable with lower accuracy: Rosetta: Photo by 033 in 033. May be an image of text that says RFID KOVAL SINGLE BARREL Bourbon WHISKEY KOVAL SINGLE BARREL Bourbon WHISKEY CHICAGO DISTILLED 500ML CHICAGO DISTILLED 00NIL, PREMIUM ORGANIC PREMIUM Google Lens: image type Koval Single Barrel Whiskey OCR text RFID,KOVAL,SINGLE BARREL,Bourbon,WHISKEY,DISTILLED IN CHICAGO,47% Alc. by Vol. 500ML,KOVAL,SINGLE BARREL,Bourbon,WHISKEY,DISTILLED IN CHICAGO 47% Alc by Vol 500ML,PREMIUM ORGANIC,PREMIUM ORGANIC |
Image source 🔗 |
Rosetta's results are available live from Instagram (after a mandatory login, you can see Rosetta's output for every image as an alt image tag) which can be pretty handy. But after taking a quick reference look at the dataset with Google Lens results, it became clear to us that if you really want an accurate representation of objects in images, there is no real alternative to Google Lens.
🖼️ How to use Google Lens API for image scraping
As Google Lens increases its accuracy and proficiency, more developers are interested in using this Google tool in their projects and applications. So it would be nice to have programmatic access to it via API. Google Lens API (or as it is officially called, Cloud Vision API) allows for integration including image labeling, face detection, OCR, landmark recognition, and explicit content tagging.
But what about scraping and finding similar images? We've developed our own Google Lens API capable of recognizing text on the image, finding alt text, identifying language, recognizing image type, and finding similar products and visuals by image URL. Here's how you can use it:
Step 1. Go to Google Lens Actor
On the Google Lens Actor page, click the try for free button to sign up for a free plan or sign in to Apify Console.
You can create an Apify account using your email or GitHub account. No credit card details are required. After you create an account, you’ll be redirected to Apify Console — your workspace for web crawlers and other web automation tools.
Step 2. Select the image URL you want OCR text from
Now head over to Google and find the image you want to scrape OCR data from or find its visual matches. Find the direct image link (not the Google one) copy its URL and paste it into the Image URLs
field.
❗️ The URL must contain an image file extension such as .png or .jpeg at the end.
You can add as many images as you want and indicate whether you want the Actor to find websites with similar images.
Step 3. Click Start ▶️
The Google Lens API will now visit each image you’ve chosen and extract the image data from it. Once the scraper’s status changes from Running 🏃🏻♀️ to Succeeded 🏁, you’re one step away from downloading the image data.
Here's an example of getting just the image type and OCR text:
Alternatively, here is an example of getting not only the image data, but also matching images and URLs where to find them.
Step 4. Download image data
You can Preview 👁 the extracted data as a table, spreadsheet, CSV, or JSON file. You can always find it in the Storage tab and download it in any format. You can also filter your results before extracting them so you only download the fields that you need.
👁 Need more Google scraping tools?
If you have a specific scraping case for Google data extraction, check out these simple scrapers. They're designed to handle Google scraping, extracting data from Google Maps, News, and even Google Search. Take a peek and see if any of them fit the bill.
🦾 Google Lens and machine learning
Google Lens image search can be used for early training of AI models. With its computer vision algorithms able to identify objects, text, and other visual information in images and videos, this technology is an easy choice for building datasets for training AI models.
For instance, suppose you are building an AI model to recognize different types of plants. While Google Lens is not a real substitute for proper data labeling and annotation, you could use Google Lens API to scrape pictures of various plants and identify the plant species in each image. You could then use this information to build a dataset for training your AI model. And another one. And another one.
As labeled data is as valuable as gold for machine learning, Google Lens can definitely come in helpful when gathering visual data. You can use this Google Lens API to automate image labeling with varying degrees of accuracy:
- identify exactly what is presented by image using
image type
, - use OCR text fragments that provide more insights about image content, and if none of the above works,
- a pretty wide guess is still available by visual matches at other websites.
In praise of scraping
With scrapers, you can make the training of your own custom machine learning models slightly more (bearable?) automated. Web scraping in general is the way to bootstrap AI training. To that end, you might find useful our other scrapers and AI integrations such as 🦜🔗LangChain or LLaMA🦙. Of course, those scrapers will be more text and LLM- rather than image-focused but rest assured, they will get the data collection part done for you.
Fast, reliable data for ChatGPT and LLMs
Extract text content from the web to feed your vector databases, fine-tune or train your large language models (LLMs) such as ChatGPT or LLaMA.