Unique ID: 2015068
Division: | Programing and Coordination of Statistical Surveys Department |
---|---|
Issue Date: | February 13th 2019 |
Last modified: | February 22nd 2019 |
Cost: FREE
Training Institutions Frame Creation: using web scraping to improve survey frames
Using web scraping to improve survey frames
The goal is to use web scraping to prepare the survey frame for training institutions. At the moment there is no reliable frame for new survey that will cover several aspects of training institutions activities.
Project Objective:
Exploration, Scientific / research
Project Outcomes:
To prepare a frame for a specific survey for training Institutions.
Publications Comments:
Research papers on Big Data not specifically related to this project.
Statistical Area
Project Sources
Type Of Institution: | National statistical office |
---|---|
Big Data Source: | Web scraping data |
Region: | Europe & Central Asia |
Country Area: | Poland |
Id Country Regional: | country |
Partnerships
Other Partners: | Research or academic institute |
---|---|
Partnership Comments: | At the moment academic institution as a provider of Big Data infrastructure. |
Accessing Data
Data Access Rights: | Only for this project |
---|
Data Coverage
Data Coverage: | Only a portion of all data |
---|---|
Coverage Geo Pop: | Whole country / high % of market |
Cost Implication: | Free |
Coverage Period: | Timestamp - data on a specific day (30 June) |
Data Quality
Quality Framework: | Quality of source/input |
---|---|
Quality Aspects Evaluated: | Privacy and Security, Completeness, Usability, Time Factors, Accessibility, Relevance, Validity, Accuracy, including selectivity, Coherence, including linkability to other sources |
Validation Comments: | Data are compared with the frame stored in relational database. |
Quality Framework Comments: | All of the quality indicators indicated in the UNECE Big Data Quality Framework. |
Data Quality Concerns Comments: | Accuracy, Coherence, Validity, Accessibility, Consistency, Relevance, Comparability, Timeliness, Accessibility, Ambiguousness |
Methodology
Methods Used: | Traditional statistical methods |
---|---|
Methods Comments: | Key value pairs |
Technologies
Technologies: | Relational database, Spreadsheet, NoSQL database, Hadoop Clusters |
---|---|
Technologies Comments: | The list of is processed and put |
Other
Income Level: | High-income |
---|---|
Iso: | PL |
Timeframe To Produce Indicator: | NA |
Write Your Own Review