Unique ID: 2015009
Division: | Analysis/ Quality Management/ Methods |
---|---|
Issue Date: | February 13th 2019 |
Last modified: | February 22nd 2019 |
Cost: FREE
Automatic price collection on the internet: Using web scraping and web Crawlers for price index compilations
Using web Crawlers for price index compilations
Development and implementation of automatic price collection procedures using Web Crawlers for price index compilations. Assessment of automatically-collected data in terms of quality and efficiency gains and their applicability for other statistics.
Project Objective:
Pilot intended to go to production to supplement existing data, Pilot intended to go to production to replace existing data
Project Outcomes:
The automatic price collection project should support the objective to achieve higher quality (price) statistics while using existing resources more efficiently.
Project Sources
Type Of Institution: | National statistical office |
---|---|
Big Data Source: | Web scraping data |
Region: | Europe & Central Asia |
Country Area: | Austria |
Id Country Regional: | country |
Partnerships
Partnership Comments: | None |
---|
Data Coverage
Data Coverage: | Other |
---|---|
Cost Implication: | Free |
Project Details
Frequency Comments: | The coverage and frequency can be handled flexibly. In theory, all data of a website can be scraped daily. In practice, there is mostly no need for this so that coverage and frequency are reduced. However, daily web scraping of websites is being tested in the pilot project. |
---|
Data Quality
Quality Aspects Evaluated: | Privacy and Security, Completeness, Usability, Time Factors, Accessibility, Relevance, Validity, Accuracy, including selectivity, Coherence, including linkability to other sources |
---|---|
Data Quality Concerns Comments: | Internet prices might not be transaction prices. Unknown quantities of the sold products and services offered on the Internet. Not easy to avoid double counting when scraping job vacancies. |
Methodology
Methods Used: | Traditional statistical methods |
---|
Other
Income Level: | High-income |
---|---|
Iso: | AT |
Timeframe To Produce Indicator: | NA |
Frequency Comments: | The coverage and frequency can be handled flexibly. In theory, all data of a website can be scraped daily. In practice, there is mostly no need for this so that coverage and frequency are reduced. However, daily web scraping of websites is being tested in the pilot project. |
Write Your Own Review