Using the Apify platform to scrape prices, we ran an extensive analysis of the three leading Czech e-shops for the month leading up to Black Friday and found some surprising results.
How e-shops perform some tricky sleight of hand with discounts
TL;DR: Using the Apify platform to scrape prices, we ran an extensive analysis of the three leading Czech e-shops for the month leading up to Black Friday. We found that only 2% of products were discounted by an average of 12%, while the original price was artificially inflated so that “discounts of up to 80%” could be advertised online. The e-shops analysed were Alza.cz, Mall.cz and CZC.cz before and during Black Friday 2017.
Note: this is a translation of the original Czech version, which was covered extensively by the Czech press in November 2017.
You could be forgiven for having trouble believing the 80% discounts that most Czech e-shops offer during Black Friday, especially as you can often find instances where the seller has magically changed the original price just before the sale begins. But are these discounts really that exceptional? Or is Black Friday just a chance for sellers to publicize discounts before Christmas that they in fact offer throughout the year?
This year, Apify decided to run a small internal hackathon and study this properly. We chose three of the largest Czech e-shops (Alza.cz, Mall.cz and CZC.cz) and tracked the daily price fluctuations of all their products for the month before Black Friday. During the Black Friday sale (which ran for almost the full week up to Friday, November 24), we tracked all prices four times every day. So what did we find out?
Magical discounts
Did Mall.cz tempt you with a Black & Decker portable drill? It was impressively discounted by 45% for Black Friday, with a price of CZK 2,650 vs. an original price of CZK 4,899. A month before, it was only discounted by 2%. However, at that time it also only cost CZK 100 more, or CZK 2,749, having been reduced from CZK 2,799. The magic lies in how Mall.cz increased the price from CZK 2,799 to CZK 4,899 on November 19 (the beginning of their Black Friday promotions).
We can find a similar example at Alza.cz. The Black Friday price for the Urbanstar Uscooter was CZK 6,290, but this price had been increased on November 17 from CZK 7,500 to CZK 9,999. Instead of a discount of 37%, the real discount was 16%.
On CZC.cz, we even found products advertised in the Black Friday section that had not only seen increases in the original price, but also in the eventual sale price. If you bought the Philips Brilliance 241P6QPJKES on Black Friday at CZC.cz, you might have enjoyed a great discount, but you would also have ultimately paid a higher price than if you had bought it at the same e-shop 14 days before.
And these are not exceptions. In total, we found 291 products where the original price was increased (comparing the average to November 1 and during Black Friday). The original price was altered by CZC.cz (212 cases), then Mall.cz (63 cases) and the least by Alza.cz (16 cases). The complete table is available here.
Attractive discounts, but will you really save money?
Now let’s look at the average discount for products listed in the Black Friday category. There are some interesting figures for the Black Friday period: 37% for Mall.cz, 31% for Alza.cz and 28% for CZC.cz. But if we compare these with the average discount for the same products for the month before Black Friday, it isn’t so impressive. Mall.cz raised its average discount by 12 percentage points, from 25%; Alza.cz went up by 15 percentage points from 13%; and CZC.cz increased by only 5 percentage points, from 23%. In addition to this, we need to take into account the magic performed with discounts such as those examples described above.
In order to eliminate the impact of the increased original prices, we took the Black Friday products and found their minimum price in the period before and after Black Friday. The table below shows the average, minimum and overall average share of products with a discount in the Black Friday sale in each e-shop.
On average, Alza.cz had the lowest price (20%), but only about 1,000 products (which were also sold before Black Friday). Mall.cz reduced prices by 15% on average, but had at least 2.5 times as many products. CZC.cz had even more discounted products, but the average real discount was only 6%. When we calculate the average across all e-shops according to number of products, we get a real discount of 12%.
If we look at the average discount on all products in our e-shop, we can barely detect that it was Black Friday. At Mall.cz the average discount on November 24 was 1 percentage point higher than during the month before. At Alza.cz and CZC.cz, the average discount was even lower on November 24 than it had been for the duration of the preceding month.
The share of products included in the Black Friday sale was only 1.67% for Alza.cz, 1.95% for Mall.cz and 7.14% for CZC.cz.
So if you did not expressly buy products based on their inclusion in the sale, you had only a minimal chance of finding a discount in any selected product. And if you were lucky enough to find your chosen product in the Black Friday category, on average you got a 12% discount on the price you might have paid during the month before Black Friday. And you could only hope that the discount had not been artificially inflated…
How we obtained and processed the data
From October 20, 2017, we used Apify to download the complete range of products offered by these e-shops every day. Vaclav set up three crawlers that scraped the basic product parameters (mainly URL, name, current and original price). During Black Friday, we also downloaded all products in the Black Friday category four times every day. In order to avoid overloading the e-shop servers, the crawler did not delve into the details of each product, but downloaded data from listings in pagination in all categories. Some products were listed in multiple categories and we dealt with this by means of deduplication.
In this manner, we processed approx. 30,000 pages per day and collected approx. 733,000 entries:
approx. 18,500 pages and approx. 380,000 entries from Alza.cz
approx. 8,000 pages and approx. 260,000 entries from Mall.cz
approx. 3,800 pages and approx. 93,000 entries from CZC.cz
All our crawlers are publicly available here. Just create an account on Apify, copy the crawler to your account and you can monitor future prices in the same way.
The next step was that Marek prepared a MySQL DB on AWS and wrote a simple act that went through all our crawler runs and added the data into the DB. The MySQL crawler results are here (just set the correct credentials for your DB and you can use them).
Next, our SQL wizard Jarda deduplicated the records, cleaned them up a bit (see notes below) and uploaded the resulting data to Google Data Studio. From there we were able to search, calculate and visualize the results described above.
Technical notes:
Some products were included in multiple categories, forcing us to deduplicate.
We discarded products available on a paid instalment plan.
In the course of our data collection, CZC.cz slightly modified the structure of the site, and for some time we read prices without VAT. We calculated the VAT for the period, and the run during which this occurred was excluded from the data.
There was no crawler for Alza.cz for one day and no crawler for three days for CZC.cz (the account running these crawlers exceeded the limit on parallel crawlers).
Before Black Friday we loaded the data only once a day, so that we did not have to capture the hourly activity of the “Sale Tornado” on Alza.cz.
Disclaimer: Because of the above-described technical notes, our records may not be 100% accurate; however, any discrepancies should not affect the resulting statistics and examples given.
Publication of data
We did not find any restrictions on the automatic acquisition and publication of data on Alza.cz and CZC.cz. Mall.cz has the following copyright notice in the copyright section of the site:
Any part of the Company’s website (in particular, descriptions and illustrations of the products sold, the description of the purchase and the division of categories and parameters) may not be copied electronically or mechanically and made available to the public without the prior written permission of the copyright holder. (Translated from Czech).
At the same time, the site disclaims responsibility for product and service information:
Internet Mall, Inc. warns that the information on the websites of its stores is partly taken from third parties, may contain material and technical inaccuracies or typographical errors and may be updated without prior notice. Internet Mall, Inc. may at any time without prior notice change the products and services described on its site and do not guarantee the accuracy of its content. (Translated from Czech).
We infer from this that the prohibition concerns the copying of the e-shop itself, not the prices of the products, where they do not guarantee their correctness.
The files contain records from these three e-shops from 20.10.2017–26.11.2017. This is an interesting dataset for further analysis and we would be very pleased to see someone discover more useful data in it. If you use the dataset in any publication, please include our logo and a link to www.apify.com
By the way, if you found this analysis interesting and it seems like something you would enjoy doing, we currently have open positions at Apify.
And as we are already in the middle of Black Friday, we have a 20% discount on any Apify plan until 30.11.2017. And it should be noted that we did not increase our prices before Black Friday :-)
Apifier since 2016 so learned about web scraping and automation from the experts. MSc in Computer Science from TCD. Former game designer and newspaper production manager. Now Head of Content at Apify.