Unique ID: 2015065
Division: | Deputy Director General |
---|---|
Issue Date: | February 13th 2019 |
Last modified: | February 22nd 2019 |
Price information based on scanner data and website information
Using scanner data and web scraping for price information
For the CPI and other price indices, several sources may be used. Apart from price observation in shops, Statistics Netherlands uses scanner data from retail businesses. In addition, price information may be available on the websites of retail businesses or on websites of third parties that provide price comparisons. Price information for specific products is already collected manually from websites, and increasingly by internet robots. The aim of the project is to systematically collect price information by internet robots for a limited number of retail chains, so that the observation in the shops can be stopped.
Project Objective:
Pilot intended to go to production to replace existing data
Project Outcomes:
At the beginning of 2015 a module for price collection by internet robots for retail chains in the area of clothing was taken into production. This concerns websites where products can be ordered as well as websites that provide information on prices of clothing that can be bought in shops. Observation is done daily or weekly, depending on the website.
Statistical Area
Project Sources
Type Of Institution: | National statistical office |
---|---|
Big Data Source: | Web scraping data, Scanner data |
Region: | Europe & Central Asia |
Country Area: | Netherlands |
Id Country Regional: | country |
Partnerships
Partnership Comments: | None |
---|
Accessing Data
Data Access Rights: | Only for this project |
---|
Data Coverage
Data Coverage: | All available data |
---|---|
Coverage Geo Pop: | Whole country / low % of market |
Cost Implication: | Free |
Coverage Period: | Continuous |
Data Quality
Quality Aspects Evaluated: | Completeness, Usability, Time Factors, Accessibility, Relevance, Validity, Coherence, including linkability to other sources |
---|---|
Validation Comments: | Price collection from websites was already done manually, so this is the quality reference for the use of web robots. |
Quality Framework Comments: | The existing CPI quality framework applies. The price collection for the production of the CPI is based on modules which can be managed separately. Retail prices are collected manually in shops, from scanner data, and by using internet robots. |
Data Quality Concerns Comments: | Changes in websites have to be monitored, of course. |
Methodology
Methods Used: | Traditional statistical methods |
---|
Technologies
Technologies Comments: | The usual toolset of Statistics Netherlands (SQL etc.) |
---|
Other
Income Level: | High-income |
---|---|
Iso: | NL |
Timeframe To Produce Indicator: | NA |