Unique ID: 2015064
Division: | Deputy Director General |
---|---|
Issue Date: | February 13th 2019 |
Last modified: | February 22nd 2019 |
Exploring the use of social media messages for economic indicators
Exploring the use of social media messages for economic indicators
This is a research project that explores the usability of public social media messages for selected statistics, the methodology to be used for such statistics, and the reliability of the results. At the beginning of this research, an important research question was whether the results of the existing survey-based consumer confidence index could be replicated using only this source, while reducing its production time. The research contributes to the body of methodological knowledge and expands it beyond sampling theory. This data source is widely considered to have a huge potential for shedding light on a range of social phenomena. The current research has turned to indicators for social coherence. However, success does not automatically mean the application to official statistics, which requires an assessment that goes beyond the question of technical feasibility.
Project Objective:
Scientific / research
Project Outcomes:
Perhaps the most important lesson learned is that under certain conditions it may be possible to produce reliable statistics not based on sampling theory, i.e., without using population-based estimation methods. Linked to this is the lesson that for research of this type an open mindset is needed and a novel Big Data oriented approach can provide valuable insights. At an institutional level, the use of having a stand-alone research program was demonstrated. In the current phase of the project, the research is aimed at producing indicators for (changes in) social coherence.
Publications Comments:
Daas, P.J.H. and Puts, M.J.H. (2014) Social Media Sentiment and Consumer Confidence. European Central Bank Statistics Paper Series No. 5, Frankfurt, Germany.
Statistical Area
Project Sources
Type Of Institution: | National statistical office |
---|---|
Big Data Source: | Social media data |
Region: | Europe & Central Asia |
Country Area: | Netherlands |
Id Country Regional: | country |
Partnerships
Data Providers: | Social media provider |
---|
Accessing Data
Data Access Rights: | Broader access rights |
---|
Data Coverage
Data Coverage: | Other |
---|---|
Coverage Geo Pop: | Whole country / high % of market |
Cost Implication: | Commercial |
Coverage Geo Comments: | All public social media messages in Dutch are covered, for all sources. |
Coverage Period: | 2009 onwards |
Project Details
Frequency Comments: | In principle, all available data can be obtained but we use only a selection of the data. |
---|
Data Quality
Quality Aspects Evaluated: | Privacy and Security, Completeness, Usability, Time Factors, Validity, Accuracy, including selectivity, Coherence, including linkability to other sources |
---|---|
Validation Comments: | By building a model based on fitting characteristics derived from social media messages to consumer confidence, a very high correlation was achieved. Both series are closely associated; changes in consumer confidence precede changes in sentiment by one we |
Quality Framework Comments: | It has been shown that under certain conditions it may be possible to produce reliable statistics not based on sampling theory, i.e., without using population-based estimation methods. For the sentiment indicator huge amounts of data are used; the sentiment indicator is based on the average sentiment of 75 million public Dutch social media messages produced per month. |
Data Quality Concerns Comments: | There are no concerns about the quality of the data obtained, but there are concerns about their meaning and the way to interpret them. |
Quality Assessment Comments: | Although the social messages are public, this does not imply that they may be used for any purpose, and data ownership may also be an issue. However, if the data is used in a transparent way and communication is well organized, we do consider the use of these data permissible at an aggregated level. After all, the data is already widely used for commercial reasons. |
Methodology
Methods Used: | Traditional statistical methods, Other methods |
---|---|
Methods Comments: | The most important outcome so far is the successful replication of the consumer confidence index with a production time of a few days instead of weeks. By building a model based on fitting characteristics derived from social media messages to consumer con |
Technologies
Technologies: | Spreadsheet, Hadoop Clusters |
---|---|
Technologies Comments: | Hadoop was used to assign sentiment and aggregate the sentiment for messages produced during specific periods. Aggregation was done in the database of the data provider, via a secure interface, after which the aggregated results were downloaded. These were processed within the premises of the national statistical institute. |
Other
Income Level: | High-income |
---|---|
Iso: | NL |
Timeframe To Produce Indicator: | NA |
Frequency Comments: | In principle, all available data can be obtained but we use only a selection of the data. |