How web scraping and AI are helping to find missing children
The Missing Children initiative began with a Facebook page. Web scraping Facebook for data labeling has taken it to a whole new level and the initiative is now reuniting families all over Egypt.
Ever since the 2011 Egyptian revolution, child abduction has been on the rise in Egypt. During the revolution alone, 1,200 persons were reported missing, and the problem of lost persons is still not going away. Children make up the largest proportion of victims. Thousands of children go missing in Egypt every year. There are five main reasons: adoption, begging, sex trade, organ trade, and ransom.
This tragic situation was the motivation behind the Missing Children (Atfal Mafkoda) initiative. Atfal Mafkoda is a community effort and Facebook page with over 2 million followers that works on identifying missing and trafficked children and those mistreated in orphanages. The initiative has so far tracked down over 3,000 people to reunite them with their families.
The Missing Children initiative was launched in 2016 by engineer Rami el-Gebali, beginning with a "No to using children as beggars" campaign. Mr. Gebali asked people to take pictures of children begging on the streets and send them to the page. As a result, he collected tens of thousands of such photos, and a match-making process between begging children and misplaced children began.
While that campaign led to only three children being found, the vast number of images led to the page having the largest database of missing children with pictures in Egypt.
"Our motto is that no family should suffer the pain of missing a living loved one. We want to spread our model across the world. We proved the concept, and we know it works."
- Rami el-Gebali, founder of Atfal Mafkoda
AI face recognition to the rescue
In 2023, something made Rami el-Gebali realize that there had to be a way to speed up the complicated process of finding lost children. A community member pointed out that there was a photo that looked just like a person who had been missing for ten years. That person has now finally been reunited with his family. But Atfal Mafkoda had that photo two months after the person went missing. AI could have solved that case in one minute! Relying on human beings to do in ten years what AI would do in the blink of an eye didn't make sense!
The problem was that the process of finding lost children is not effective when the photos are too old, the picture quality is low, or the children have been missing for so long that their photos no longer match their current facial features. Atfal Mafkoda needed AI face recognition technology: face detection, face enhancement, face comparison, and face aging.
Rami asked Youssef for his help. So Mr. Abukwaik consulted a former manager who teaches at Boston University and inquired about the possibility of starting a graduation project for the semester that would benefit both the students and the Missing Children initiative.
What followed was the Spark project, for which the students won an Audience Choice Award. The students applied generative deep learning methods to improve forensic face aging to provide higher-quality aged face photos. Thanks to those brilliant students, Mr. Abukwaik was able to implement what the students did (see the GitHub repo here) and apply it to the Missing Children initiative.
Web scraping Facebook for data labeling
The next step was web scraping. Youssef needed to scrape the Atmal Mafkoda Facebook page in order to turn it into a dataset for data labeling. And this is where Youssef ran into problems.
"Apify allowed me to fully scrape our own Facebook page without the limitations I had experienced with alternative solutions. I was able to drill down up to 5,000 posts without blocks. No other open-source solution or alternative I tried compared to that."
- Youssef A. Abukwaik, Software Engineer
Youssef first tried downloading the Facebook page, but it wasn't easily parsable. So he turned to an open-source Facebook scraper. Facebook blocked him after just 30 requests. He then tried a proxy application, but the configuration was hard to do. After scraping 500 posts, he had to start all over again after encountering problems.
So, Youssef did what any of us would do: he turned to search engines and googled "How to scrape Facebook", and the first result was Apify.
"Apify has incredible potential for AI and machine learning. It was a turnkey solution that let me harvest the data in our Facebook community for data labeling and put it to use without any extra work."
- Youssef A. Abukwaik, Software Engineer
How Apify is helping the Missing Children initiative
Apify allowed Youssef to fully scrape the Atfal Mafkoda Facebook page without the limitations he experienced in alternative solutions he came across. He was able to extract up to 5,000 posts without getting blocked.
Also, the ability to have a dataset is profoundly helpful. The scraped results don't disappear, so he doesn't have to use them immediately. He can do queries against the data, which sticks around long enough to perform business logic without having to suffer re-running. With one run, Youssef is able to re-download and work with the output.
Mr. Abukwaik now has an administration site where he can pull Facebook data. The data appears in a structured way, and he can see all the posts so he knows when he needs to do face matching with the John/Jane Does, and label them according to whether a person is missing or reunited with their family.
The Missing Children initiative is focused on Egypt for now, but as its founder has said, “Our dream is to have one global database of missing people around the world”.
Apify shares that dream, and with this and the Spotlight project, which uses Apify to help find trafficked children in the US, let's hope that the dream becomes a reality very soon!
Want to read more of Apify's web scraping & AI-related success stories? Then check out the content below.
I used to write books. Then I took an arrow in the knee. Now I'm a technical content marketer, crafting tutorials for developers and conversion-focused content for SaaS.