Unique ID: 2015068

Division: Programing and Coordination of Statistical Surveys Department
Issue Date: February 13th 2019
Last modified: February 22nd 2019

Training Institutions Frame Creation: using web scraping to improve survey frames

Using web scraping to improve survey frames

The goal is to use web scraping to prepare the survey frame for training institutions. At the moment there is no reliable frame for new survey that will cover several aspects of training institutions activities.

Project Objective:

Exploration, Scientific / research

Project Outcomes:

To prepare a frame for a specific survey for training Institutions.

Publications Comments:

Research papers on Big Data not specifically related to this project.

Statistical Area

Demographic and social

Project Sources
Project Sources
Type Of Institution: National statistical office
Big Data Source: Web scraping data
Region: Europe & Central Asia
Country Area: Poland
Id Country Regional: country
Other Partners: Research or academic institute
Partnership Comments: At the moment academic institution as a provider of Big Data infrastructure.
Accessing Data
Accessing Data
Data Access Rights: Only for this project
Data Coverage
Data Coverage
Data Coverage: Only a portion of all data
Coverage Geo Pop: Whole country / high % of market
Cost Implication: Free
Coverage Period: Timestamp - data on a specific day (30 June)
Data Quality
Data Quality
Quality Framework: Quality of source/input
Quality Aspects Evaluated: Privacy and Security, Completeness, Usability, Time Factors, Accessibility, Relevance, Validity, Accuracy, including selectivity, Coherence, including linkability to other sources
Validation Comments: Data are compared with the frame stored in relational database.
Quality Framework Comments: All of the quality indicators indicated in the UNECE Big Data Quality Framework.
Data Quality Concerns Comments: Accuracy, Coherence, Validity, Accessibility, Consistency, Relevance, Comparability, Timeliness, Accessibility, Ambiguousness
Methods Used: Traditional statistical methods
Methods Comments: Key value pairs
Technologies: Relational database, Spreadsheet, NoSQL database, Hadoop Clusters
Technologies Comments: The list of is processed and put
Income Level: High-income
Iso: PL
Timeframe To Produce Indicator: NA
Write Your Own Review
You're reviewing:Training Institutions Frame Creation: using web scraping to improve survey frames
Your Rating