Unique ID: 2015045
Division: | Unit B204 |
---|---|
Issue Date: | February 13th 2019 |
Last modified: | February 22nd 2019 |
Cost: FREE
Web scraping for price statistics
Using web scraping data for compilation of price indices
Project Objective:
Exploration
Project Outcomes:
Design of a working environment for the permanent application of automatized price collections, using the approach to imitate the manual price collections on the internet. Automation of further price collections for HICP
Publications Comments:
Article in WiSta 4/14 'Automatiserte Preiserhebung im Internet'
Statistical Area
Project Sources
Type Of Institution: | National statistical office |
---|---|
Big Data Source: | Web scraping data |
Region: | Europe & Central Asia |
Country Area: | Germany |
Id Country Regional: | country |
Partnerships
Partnership Comments: | No external partners are involved. |
---|
Accessing Data
Data Access Comments: | No access rights, only public information on the internet is used. |
---|
Data Coverage
Data Coverage: | Other |
---|---|
Coverage Geo Pop: | Whole country / high % of market |
Cost Comments: | Access to data is free (but the current study determines cost for the FSO (e.g. IT equipment, staff)). |
Coverage Geo Comments: | At the current stage, the market share is no explicit criterion for the use of web scraping, but only goods and services with an unique pricing strategy all over Germany are explored. Therefore, a relevant market share has to be assumed. |
Project Details
Frequency Comments: | Imitation of existing manual internet price collection for selected goods and services |
---|
Data Quality
Quality Aspects Evaluated: | Completeness, Usability, Time Factors, Other |
---|---|
Validation Comments: | At the beginning, prices of the atomized and of the manual price collections were compared to check the correctness of the approach. |
Quality Framework Comments: | No, the process of index production has not changed yet. |
Data Quality Concerns Comments: | No differences between the manually-collected price and the web scraped price. |
Quality Assessment Comments: | The usability of the used Web Scraping Tool iMacros was tested in comparison to other tools. The criteria 'efficiency of the price collection' and 'quality of the results' were used to evaluate automatized price collection processes as feasible for price statistics (Sample size, error rate, comparison with manually collected data). |
Methodology
Methods Used: | Traditional statistical methods |
---|
Technologies
Technologies: | Relational database, Spreadsheet, Other |
---|
Other
Income Level: | High-income |
---|---|
Iso: | DE |
Timeframe To Produce Indicator: | NA |
Frequency Comments: | Imitation of existing manual internet price collection for selected goods and services |
Write Your Own Review