Unique ID: 2015066

Issue Date: February 13th 2019
Last modified: February 22nd 2019
Collaborative

The use of online data in the HICP

Exploring the use of web scraping data for compilation of price indices

Identify areas where internet trade is significant and where prices are obtainable. Develop possible automated processes for collecting data, both to replace the present manual work, but also to expand into new areas. A discussion on how to identify scope and how to sample. Analyze collected online data. Evaluation of the results.

Project Objective:

Pilot intended to go to production to supplement existing data, Pilot intended to go to production to replace existing data

Project Outcomes:

Intended outcomes is to implement web scraped data into the regular production in order to increase efficiency and reduce response burden. And to increase the quality of the price indices.

Publications Comments:

Ottawa Group paper for 2015Meeting 'The use of online prices in the Norwegian CPI' http://www.stat.go.jp/english/info/meetings/og2015/pdf/t1s2p5_pap.pdf

Statistical Area

Price

Project Sources
Project Sources
Type Of Institution: National statistical office
Big Data Source: Web scraping data
Region: Europe & Central Asia
Country Area: Norway
Id Country Regional: country
Partnerships
Partnerships
Partnership Comments: None - but the Eurostat Grant is given to several countries and experiences are shared (via workshops) - but also general cooperation in the field of CPI between other national statistical institutes.
Accessing Data
Accessing Data
Data Access Rights: Broader access rights
Data Coverage
Data Coverage
Data Coverage: Only a portion of all data
Coverage Geo Pop: Whole country / low % of market
Cost Implication: Free
Cost Comments: Data is not paid for.
Coverage Geo Comments: Web scraping is collected from 4 major online retailers within home electronics, personal care products, in addition to airline fares and dental services. Online purchases of goods and services accounts for less than 10% of the total sales in general. Fo
Coverage Period: As of December 2014
Project Details
Project Details
Frequency Comments: Only scraping most sold products
Data Quality
Data Quality
Quality Framework: Quality of processing/throughput
Quality Aspects Evaluated: Privacy and Security, Completeness, Usability, Time Factors, Accessibility, Relevance, Institutional/Business Environment, Validity, Accuracy, including selectivity, Coherence, including linkability to other sources
Validation Comments: No ideal benchmark, but uses available official statistics/price data for validation.
Quality Framework Comments: All points are relevant as existing QF within price statistics need to be applied to meet international requirements.
Data Quality Concerns Comments: It is a challenge to have satisfactory metadata as data per definition is big and unstructured. But the timeliness and magnitude of data are positive factors.
Quality Assessment Comments: All relevant
Methodology
Methodology
Methods Used: Traditional statistical methods
Methods Comments: Established automatic routines to download data. Currently a work in progress and much is computed in SAS.
Technologies
Technologies
Technologies: Spreadsheet, Data mining tools, Other, Cloud services
Technologies Comments: SAS programming
Other
Other
Income Level: High-income
Iso: NO
Timeframe To Produce Indicator: NA
Frequency Comments: Only scraping most sold products
Write Your Own Review
You're reviewing:The use of online data in the HICP
Your Rating