Unique ID: 2015064

Division: Deputy Director General
Issue Date: February 13th 2019
Last modified: February 22nd 2019
Collaborative

Exploring the use of social media messages for economic indicators

Exploring the use of social media messages for economic indicators

This is a research project that explores the usability of public social media messages for selected statistics, the methodology to be used for such statistics, and the reliability of the results. At the beginning of this research, an important research question was whether the results of the existing survey-based consumer confidence index could be replicated using only this source, while reducing its production time. The research contributes to the body of methodological knowledge and expands it beyond sampling theory. This data source is widely considered to have a huge potential for shedding light on a range of social phenomena. The current research has turned to indicators for social coherence. However, success does not automatically mean the application to official statistics, which requires an assessment that goes beyond the question of technical feasibility.

Project Objective:

Scientific / research

Project Outcomes:

Perhaps the most important lesson learned is that under certain conditions it may be possible to produce reliable statistics not based on sampling theory, i.e., without using population-based estimation methods. Linked to this is the lesson that for research of this type an open mindset is needed and a novel Big Data oriented approach can provide valuable insights. At an institutional level, the use of having a stand-alone research program was demonstrated. In the current phase of the project, the research is aimed at producing indicators for (changes in) social coherence.

Publications Comments:

Daas, P.J.H. and Puts, M.J.H. (2014) Social Media Sentiment and Consumer Confidence. European Central Bank Statistics Paper Series No. 5, Frankfurt, Germany.

Statistical Area

Demographic and social

Project Sources
Project Sources
Type Of Institution: National statistical office
Big Data Source: Social media data
Region: Europe & Central Asia
Country Area: Netherlands
Id Country Regional: country
Partnerships
Partnerships
Data Providers: Social media provider
Accessing Data
Accessing Data
Data Access Rights: Broader access rights
Data Coverage
Data Coverage
Data Coverage: Other
Coverage Geo Pop: Whole country / high % of market
Cost Implication: Commercial
Coverage Geo Comments: All public social media messages in Dutch are covered, for all sources.
Coverage Period: 2009 onwards
Project Details
Project Details
Frequency Comments: In principle, all available data can be obtained but we use only a selection of the data.
Data Quality
Data Quality
Quality Aspects Evaluated: Privacy and Security, Completeness, Usability, Time Factors, Validity, Accuracy, including selectivity, Coherence, including linkability to other sources
Validation Comments: By building a model based on fitting characteristics derived from social media messages to consumer confidence, a very high correlation was achieved. Both series are closely associated; changes in consumer confidence precede changes in sentiment by one we
Quality Framework Comments: It has been shown that under certain conditions it may be possible to produce reliable statistics not based on sampling theory, i.e., without using population-based estimation methods. For the sentiment indicator huge amounts of data are used; the sentiment indicator is based on the average sentiment of 75 million public Dutch social media messages produced per month.
Data Quality Concerns Comments: There are no concerns about the quality of the data obtained, but there are concerns about their meaning and the way to interpret them.
Quality Assessment Comments: Although the social messages are public, this does not imply that they may be used for any purpose, and data ownership may also be an issue. However, if the data is used in a transparent way and communication is well organized, we do consider the use of these data permissible at an aggregated level. After all, the data is already widely used for commercial reasons.
Methodology
Methodology
Methods Used: Traditional statistical methods, Other methods
Methods Comments: The most important outcome so far is the successful replication of the consumer confidence index with a production time of a few days instead of weeks. By building a model based on fitting characteristics derived from social media messages to consumer con
Technologies
Technologies
Technologies: Spreadsheet, Hadoop Clusters
Technologies Comments: Hadoop was used to assign sentiment and aggregate the sentiment for messages produced during specific periods. Aggregation was done in the database of the data provider, via a secure interface, after which the aggregated results were downloaded. These were processed within the premises of the national statistical institute.
Other
Other
Income Level: High-income
Iso: NL
Timeframe To Produce Indicator: NA
Frequency Comments: In principle, all available data can be obtained but we use only a selection of the data.
Write Your Own Review
You're reviewing:Exploring the use of social media messages for economic indicators
Your Rating