Unique ID: 2015067

Division: Programing and Coordination of Statistical Surveys Department
Issue Date: February 13th 2019
Last modified: March 28th 2019

Using web scraping data for job offers statistics

Using web scraping data for labor market statistics.

The goal of the experimental project is to decide whether the data on job offers (labor market statistics) can be gathered from websites. In pilot project the largest portals were used as a data source for web scraping.

Project Objective:

Pilot intended to go to production to supplement existing data

Project Outcomes:

The decision of using this way of gathering data as an alternative method to the questionnaire on job offers.

Publications Comments:

Published papers that are related to Big Data, not to this specific project.

Statistical Area

Economic and financial, Labor

Project Sources
Project Sources
Type Of Institution: National statistical office
Big Data Source: Web scraping data
Region: Europe & Central Asia
Country Area: Poland
Id Country Regional: country
Other Partners: Other, Research or academic institute
Partnership Comments: No written agreement between University and statistical office.
Accessing Data
Accessing Data
Data Access Rights: Only for this project
Intermediary Comments: There is no written agreement. The project is treated as a pilot project for research as well as official statistics purposes.
Data Coverage
Data Coverage
Data Coverage: All available data
Coverage Geo Pop: Whole country / high % of market
Cost Implication: Free
Coverage Period: Timestamp - the data as of 1 March
Data Quality
Data Quality
Quality Framework: Quality of source/input
Quality Aspects Evaluated: Privacy and Security, Completeness, Usability, Time Factors, Accessibility, Relevance, Validity, Accuracy, including selectivity, Coherence, including linkability to other sources
Quality Framework Comments: Yes, all of them. The evaluation was based on indicators prepared for the Big Data Quality Framework manual written by UNECE (we were a member of the UNECE Big Data Quality Task Team).
Data Quality Concerns Comments: Accuracy, Coherence, Validity, Accessibility, Consistency, Relevance, Comparability, Timeliness, Accessibility, Ambiguousness
Methods Used: Traditional statistical methods, Data visualization methods
Methods Comments: Analysis by MapReduce algorithms
Technologies: Relational database, Spreadsheet, NoSQL database, Hadoop Clusters
Technologies Comments: MapReduce algorithms to store the data in Spreadsheet
Income Level: High-income
Iso: PL
Timeframe To Produce Indicator: NA
Write Your Own Review
You're reviewing:Using web scraping data for job offers statistics
Your Rating