What is machine learning doing for us?

Machine learning is becoming crucial for industries that deal with big data. In this article, you’ll discover why companies are turning to machine learning, understand how machines learn, and find out what machine learning can do for your business.

Content

Rise of the machines

The American rock band, Rage Against the Machine never specified which machine they were raging against, but it was probably an IBM PS/1. This PC was released in 1990 (a year before the band formed), and it was terrible. It had a power supply inside the monitor, which made swapping out the display very difficult, and it couldn’t accept standard ISA cards, which meant upgrades were not possible.

Though it didn’t have anything to do with RATM’s vitriolic music, machines came a long way in the next couple of years. Around the time the band released their debut album in 1992, Gerald Tesauro developed TD-Gammon. This computer backgammon program used an artificial intelligence neural network to train in temporal-difference learning. TD-Gammon could rival top human backgammon players, but it could not consistently surpass them.

Very old computer.
The IBM PS/1 (model 2011) - Image from https://en.wikipedia.org/wiki/IBM_PS/1

Only five years later, in 1997, IBM’s Deep Blue beat the world chess champion, Garry Kasparov.

In 2016, Google’s AlphaGo program beat a professional human Go player using tree search techniques. A year later, an enhanced version of the program, called AlphaZero, extended its achievement to chess and other two-player games.

How did programs manage to do this? Two words: machine learning.

What is machine learning?

Machine learning - also known by its acronym, ML - is a subset of artificial intelligence (AI). These two terms – artificial intelligence and machine learning – are often used interchangeably. While AI and ML are closely connected, they are not exactly the same. So, what is the difference between artificial intelligence and machine learning?

Artificial intelligence is the science of training machines to perform human tasks. Machine learning is the specific subset of AI that teaches machines to carry out those tasks without being explicitly programmed.

Machine learning uses algorithms to parse data, learn from it, and make informed decisions based on that data. It is often connected with deep learning, which is a subfield of machine learning based on artificial neural networks and representation learning.

The goal of machine learning is to independently predict results based on incoming data. Therefore, machine learning is the method behind how machines learn from data.

How do machines learn?

Machine learning models look for patterns in data and draw conclusions, just as we do. We do not explicitly program them. Instead, they are given samples and learn what to do based on those samples.

Let’s say you want the machine to learn how to distinguish cats from dogs. You give the machine some pictures of dogs and let it know they are dogs. You then give the machine some images of cats and let it know they are cats. You then feed it thousands of pictures of cats and dogs and let it distinguish them based on the examples it has already learned.

Once the algorithm gets good at drawing the correct conclusions, it applies that knowledge to new data sets. The more data and the greater the variety in the data samples, the better the machine will recognize images, identify patterns, and forecast outcomes. This means we need three things to teach computers: data, features, and algorithms.

Cartoon of potential energy toy.
Like a perpetuum mobile, machines can work autonomously once the data has set them in motion

How do I get enough data for machine learning?

Whether you want to find out customer preferences, predict trends, forecast stocks, or detect spam, you need data. There are three methods of data collection: manual, automatic, and semi-automatic.

Manual data collection provides the cleanest and most accurate results, but it takes ages. Many thousands of rows of data would be the bare minimum for effective machine learning. Automatic data collection is much faster and brings in a lot more data. That makes automation your best hope for success if you value your time and sanity. A compromise between these two methods is semi-automatic data collection which, as the name suggests, is partly manual and partly automatic. This method aims to strike a balance between speed and accuracy.

Training your LLM: how to get the data you need
Learn how to collect and process data for LLMs like ChatGPT.

How to collect and process data for LLMs 

What are features in machine learning?

In the context of machine learning, features are variables in statistics. These are factors that a machine can look at. Such variables could be things like dates, prices, a user’s gender, word frequency in a text, and so on. When these are in the form of structured data (for example, column names in a table), they are pretty easy for a machine to identify. But what about unstructured data, like emails, text messages, photos, and videos? Recognizing that kind of data is the hard part of machine learning (hence CAPTCHAs). This photo demonstrates that beautifully:

Capcha asks for images of a dogs face among mufins with raisins.
Machines may have beaten us at chess, but we are still better at telling the difference between muffins and chihuahuas (Image from https://www.freecodecamp.org)

What do algorithms do?

Machine learning uses programmed algorithms that receive and analyze data to predict values. Machine learning algorithms have been around for quite some time, but the ability to automatically apply them to big data repeatedly and rapidly is a relatively new development.

When Spotify recommends songs to you based on the tracks you have already listened to; machine learning is at work. When Amazon tells you about the purchases of people who have bought what you just purchased, that is the result of algorithms. The greater the volume and variety of the data the machine receives, the better it gets at tailoring its recommendations.

AI and its offshoot, machine learning, will be a foundational tool for creating social good as well as business success
– Mark Hurd (CEO of Oracle)

Why is machine learning important?

In case you are wondering why ML is such a big deal right now, take a moment to consider how much you already depend on machines to do all the hard work for you. When you want to know how long your train journey will take, you don’t ask a human; you consult your mobile phone. Having difficulty communicating in a foreign language? Google Translate is there for you. Don’t want to get lost in a place you’ve never been? Google Maps will show you the way. From fraud detection to recommendations on Netflix, from self-driving cars to finding out what your customers are saying about you on Twitter. All of these things are possible because of machine learning.

Finding missing children with web scraping and AI
Scraping Facebook for data labeling to reunite families in Egypt

How machine learning is helping to find missing children

What’s more, machines do many of our tasks much faster and (in some cases at least) more precisely than we do. Once a machine has learned a task reliably based on the data given, it can perform its task with superhuman speed and then apply what it has learned to other data. That saves us a lot of time and effort. This is why machine learning is such a hot topic in business today.

Meme saing machine learning is so hot right now.
Industries working with big data recognize the value of machine learning technology

5 use cases for machine learning

We have mentioned how machine learning integrates with our daily lives, but how is it used in business and other industries? Here are 5 use cases for machine learning:

1. Healthcare and pharma

Machine learning is a fast-growing trend in the health care industry. ML can use data to assess a patient's health in real time and help medics analyze data to improve diagnoses and treatments. ML can also provide extremely useful tools for all stages of drug discovery, from validation to clinical trials.

2. Marketing and media

Companies can use machine learning to build programs that process and analyze large amounts of “natural language data” such as reviews. ML can help to track public sentiment, social media, ad performance, and article popularity. It can also be used to spot fake news and gather up-to-date intelligence.

Applications of ChatGPT and other large language models in web scraping
A few use cases for large language models used for web scraping.


3. Retail and e-commerce

When websites recommend items to customers based on their previous purchases, they use machine learning to analyze their buying history. Retailers use ML to capture and analyze data for market research, price monitoring, and product tracking.

4. Finance

In the financial industry, machine learning technology is used to prevent fraud and get insights into data for investment opportunities. Data mining can identify clients with high-risk profiles, and cyber surveillance can be used to identify signs of fraud.

What is LangChain?
Find out how LangChain overcomes the limits of ChatGPT


5. Web scraping

Web scraping is a term used for automatically retrieving data from the internet and structuring it in a useful manner. Web scraping is a highly efficient method of data collection, especially for industries that work with big data. Machine learning makes it possible to create a robust web scraping algorithm designed to continuously scrape a specific website even if the HTML code is altered.

What is web scraping?
Reasons, challenges, and tools to extract data from the web.

Machine learning with Apify

Apify provides some powerful automation and web scraping solutions. If you are looking for automated web scraping tools, there are hundreds of them in Apify Store. If you want to automate the generation of large-scale datasets from the web for machine learning and artificial intelligence models, you can get a complete end-to-end solution from Apify experts for your automation needs.

Fast, reliable data for your AI and machine learning · Apify
Get the data to train ChatGPT API and Large Language Models, fast.

Collect data to fine-tune or train machine learning models

Theo Vasilis
Theo Vasilis
Writer, Python dabbler, and crafter of web scraping tutorials. Loves to inform, inspire, and illuminate. Interested in human and machine learning alike.

Get started now

Step up your web scraping and automation