Unique ID: 2015046

Division: Methodology Department
Issue Date: February 13th 2019
Last modified: February 22nd 2019
Collaborative

Web scraping data from retailers' websites for the CPI calculation

Using web scraping data for compilation of price indices

Statistical Area

Price

Project Sources
Project Sources
Type Of Institution: National statistical office
Big Data Source: Web scraping data
Region: Europe & Central Asia
Country Area: Hungary
Id Country Regional: country
Partnerships
Partnerships
Other Partners: Other
Partnership Comments: Partnerships are not considered at the moment as the main focus of the project is on the actual acquisition of the information. No partnerships are foreseen for statistical production as the new Big Data sources are planned to be integrated into already existing production environment (after successful. pilot). The use of cloud server might be considered in the future.
Accessing Data
Accessing Data
Data Access Rights: Broader access rights
Intermediary Comments: The HCSO is web scarping the data directly from the designated websites. No pre-treatment of the datasets is necessary at this stage of the work.
Data Access Comments: Website information is generally open for use without limiting the potential purposes.
Data Coverage
Data Coverage
Data Coverage: Only a portion of all data
Coverage Geo Pop: Part of country / high % of market
Cost Implication: Free
Cost Comments: Access to website information is free.
Coverage Geo Comments: For the time being the research is focusing on a few actors on the market. Potentially, the use should not be limited to only a few actors.
Coverage Period: It depends on the need of the concerned subject matter department and the research (daily, weekly data).
Project Details
Project Details
Frequency Comments: The HCSO is currently web scraping only a portion of the available information the statistical purposes.
Data Quality
Data Quality
Quality Framework: Quality of processing/throughput
Quality Aspects Evaluated: Completeness, Usability, Time Factors, Accessibility, Relevance, Coherence, including linkability to other sources
Quality Framework Comments: No specific frameworks are applied to the source itself. Quality frameworks apply to the business process in which data from this Big Source is to be used in the future.
Data Quality Concerns Comments: No quality concerns at the current state of the research.
Methodology
Methodology
Methods Used: Traditional statistical methods
Technologies
Technologies
Technologies: Spreadsheet, Other
Technologies Comments: The HCSO is using Excel macro and SAS solutions for the research.
Other
Other
Income Level: High-income
Iso: HU
Timeframe To Produce Indicator: NA
Frequency Comments: The HCSO is currently web scraping only a portion of the available information the statistical purposes.
Write Your Own Review
You're reviewing:Web scraping data from retailers' websites for the CPI calculation
Your Rating