Russian invasion of Ukraine in February 2022 has prompted the grievous mission to collect and archive digital evidence of war crimes. Read how Mnemonic is using web scraping to collect social media data for further use in international law, journalism, research, and memory preservation.
Mnemonic and what they do
Mnemonic is a Berlin-based international NGO dedicated to preserving digital evidence of human rights violations and international war crimes. Their open-source activities are centered around developing transparent and sustainable archiving strategies to promote advocacy for affected groups, support justice, and foster accountability.
Mnemonic started as a research group working on preserving at-risk digital evidence of human rights violations in Syria: reports, images, videos, articles, and posts. In an effort to organize and process the material amassed over time, and with the rising need to preserve and archive it according to standards, the idea of having a centralized archive was born. Now Mnemonic’s three archives – the Syrian Archive, the Sudanese Archive, and the Yemeni Archive – preserve over 10 million records that are crucial for investigative, research and criminal-case building purposes.
Now, with the ongoing Russian invasion of Ukraine, the gears of digital memory preservation are running faster again. With all the experience gained over a decade, the preservation procedures that took years for Mnemonic to develop for Syria took only a few days to set up for Ukraine. Mnemonic is now working tirelessly on enriching the Ukrainian Archive with verified, cross-referenced, and often user-generated content from the web to be further used within the scope of international law, research, and journalism.
Scraping social media to save digital evidence
Throughout the years, Mnemonic has developed extensive strategies to find and collect the required content, an established procedure to process and label it, a methodology to prove the content was not manipulated with, a secure place to preserve it, and a way to share it transparently with third parties. They also carry a particular emphasis on the open-source nature of their technology and data collection – a principle that we here at Apify adhere to as well.
As it appears, archiving information is the final stage of the whole process, whereas stage one would be collecting information. There is a wide range of information sources on the web that Mnemonic gathers data from, among them:
- Reports and articles from media and international organizations;
- Incident reports shared by the Mnemonic’s reporters’ network via social media;
- Potential evidence posted on social media (TikTok, Twitter, Telegram, and Facebook).
Obviously, it’s impossible to keep track of all these data streams manually, let alone collect the data that way every time. Besides, the metadata has to be preserved for every piece of content; this metadata is incredibly valuable, as having it simplifies proving the content’s authenticity. This is where automation and web scraping step in.
What’s Apify’s role here?
- Using ready-made social media scrapers:
Over the years, Mnemonic has developed their own in-house plugins able to archive data from different platforms. They also have their own plugins to scrape social media platforms specifically. However, those don’t always work well, as social media is notoriously tricky to scrape. While working on refactoring their own Facebook scrapers, Mnemonic Tech Lead stumbled upon Apify Facebook Scraper and decided to give it a try. After getting the datasets, they soon decided to expand to other social media scrapers from Apify Store, such as TikTok, Twitter, and YouTube. Mnemonic team has been using them ever since. This partial transition saved their team a lot of precious time as it decreased the number of hours usually allocated for scrapers’ maintenance.
- Increasing success rates with proxies:
Proxies are a part of Apify’s scraping ecosystem; without them, it’s difficult to imagine reliable scraping of large or complicated websites. Using residential IP proxies was a great solution for Mnemonic, which raised the success rate of the scraping to 65% almost right away. They still use Apify proxies for several archiving projects, including the Ukrainian one as well.
- Acquiring flexibility:
Mnemonic’s technical team can now rely both on their internal tools and Apify tools. If one of the solutions fails them in any aspect, they can work with the other one. Also, with a backup scraper option, instead of urgently maintaining their own scrapers/plugins, they now can dedicate time and effort to other sides of data collection.
How web scraping helps NGOs
The current use of Apify scrapers and proxies allows Mnemonic to be more efficient, flexible, and, most importantly, timely, as collecting evidence during an international conflict is a time-sensitive matter. The reason for the latter is that social media platforms often label graphic or disturbing content as unsuitable or violating terms and conditions, which puts this content at risk of being removed, and, by extension, partially lost for the researchers and human rights defenders.
With not only efficient scrapers but also an environment supporting them, it is now easier for the Mnemonic team to collect and provide potential evidence used for legal case building and memory research. We are humbled to contribute to investigative journalism and facilitate data collection with our technological capacities as well as by providing free credits and extra support.
Mnemonic’s use of data scraping technologies is a case in point when data acquired via web scraping is used for the ultimate good of society. We’re hoping that Mnemonic’s good work of today is nearing the day when there will be no need for building such archives, only maintaining them for future generations.
You can discover other cases of when Apify web scraping technologies have helped strengthen the work of NGOs:
- Helping the planet with data with Omdena 🌳
- Bringing IT opportunities closer to women with Czechitas 👩💻
- Fighting child traffickers with technology with Thorn ⚖️
If you’re a representative of an NGO interested in collaboration, do not hesitate to reach out to us at firstname.lastname@example.org.