Unique ID: 2015045

Division: Unit B204
Issue Date: February 13th 2019
Last modified: February 22nd 2019
Collaborative

Web scraping for price statistics

Using web scraping data for compilation of price indices

Project Objective:

Exploration

Project Outcomes:

Design of a working environment for the permanent application of automatized price collections, using the approach to imitate the manual price collections on the internet. Automation of further price collections for HICP

Publications Comments:

Article in WiSta 4/14 'Automatiserte Preiserhebung im Internet'

Project Publications:

Statistical Area

Price

Project Sources
Project Sources
Type Of Institution: National statistical office
Big Data Source: Web scraping data
Region: Europe & Central Asia
Country Area: Germany
Id Country Regional: country
Partnerships
Partnerships
Partnership Comments: No external partners are involved.
Accessing Data
Accessing Data
Data Access Comments: No access rights, only public information on the internet is used.
Data Coverage
Data Coverage
Data Coverage: Other
Coverage Geo Pop: Whole country / high % of market
Cost Comments: Access to data is free (but the current study determines cost for the FSO (e.g. IT equipment, staff)).
Coverage Geo Comments: At the current stage, the market share is no explicit criterion for the use of web scraping, but only goods and services with an unique pricing strategy all over Germany are explored. Therefore, a relevant market share has to be assumed.
Project Details
Project Details
Frequency Comments: Imitation of existing manual internet price collection for selected goods and services
Data Quality
Data Quality
Quality Aspects Evaluated: Completeness, Usability, Time Factors, Other
Validation Comments: At the beginning, prices of the atomized and of the manual price collections were compared to check the correctness of the approach.
Quality Framework Comments: No, the process of index production has not changed yet.
Data Quality Concerns Comments: No differences between the manually-collected price and the web scraped price.
Quality Assessment Comments: The usability of the used Web Scraping Tool iMacros was tested in comparison to other tools. The criteria 'efficiency of the price collection' and 'quality of the results' were used to evaluate automatized price collection processes as feasible for price statistics (Sample size, error rate, comparison with manually collected data).
Validation With Training Data:
Methodology
Methodology
Methods Used: Traditional statistical methods
Technologies
Technologies
Technologies: Relational database, Spreadsheet, Other
Other
Other
Income Level: High-income
Iso: DE
Timeframe To Produce Indicator: NA
Frequency Comments: Imitation of existing manual internet price collection for selected goods and services
Write Your Own Review
You're reviewing:Web scraping for price statistics
Your Rating