Unique ID: 2015065

Division: Deputy Director General
Issue Date: February 13th 2019
Last modified: February 22nd 2019
Collaborative

Price information based on scanner data and website information

Using scanner data and web scraping for price information

For the CPI and other price indices, several sources may be used. Apart from price observation in shops, Statistics Netherlands uses scanner data from retail businesses. In addition, price information may be available on the websites of retail businesses or on websites of third parties that provide price comparisons. Price information for specific products is already collected manually from websites, and increasingly by internet robots. The aim of the project is to systematically collect price information by internet robots for a limited number of retail chains, so that the observation in the shops can be stopped.

Project Objective:

Pilot intended to go to production to replace existing data

Project Outcomes:

At the beginning of 2015 a module for price collection by internet robots for retail chains in the area of clothing was taken into production. This concerns websites where products can be ordered as well as websites that provide information on prices of clothing that can be bought in shops. Observation is done daily or weekly, depending on the website.

Statistical Area

Price

Project Sources
Project Sources
Type Of Institution: National statistical office
Big Data Source: Web scraping data, Scanner data
Region: Europe & Central Asia
Country Area: Netherlands
Id Country Regional: country
Partnerships
Partnerships
Partnership Comments: None
Accessing Data
Accessing Data
Data Access Rights: Only for this project
Data Coverage
Data Coverage
Data Coverage: All available data
Coverage Geo Pop: Whole country / low % of market
Cost Implication: Free
Coverage Period: Continuous
Data Quality
Data Quality
Quality Aspects Evaluated: Completeness, Usability, Time Factors, Accessibility, Relevance, Validity, Coherence, including linkability to other sources
Validation Comments: Price collection from websites was already done manually, so this is the quality reference for the use of web robots.
Quality Framework Comments: The existing CPI quality framework applies. The price collection for the production of the CPI is based on modules which can be managed separately. Retail prices are collected manually in shops, from scanner data, and by using internet robots.
Data Quality Concerns Comments: Changes in websites have to be monitored, of course.
Methodology
Methodology
Methods Used: Traditional statistical methods
Technologies
Technologies
Technologies Comments: The usual toolset of Statistics Netherlands (SQL etc.)
Other
Other
Income Level: High-income
Iso: NL
Timeframe To Produce Indicator: NA
Write Your Own Review
You're reviewing:Price information based on scanner data and website information
Your Rating