|
on Big Data |
By: | Michelle Alexopoulos; Xinfen Han; Oleksiy Kryvtsov; Xu Zhang |
Abstract: | We study soft information contained in congressional testimonies by the Federal Reserve Chairs and analyze its effects on financial markets. Using machine learning, we construct high-frequency measures of Fed Chairs' and Congress members' emotions expressed via their words, voice and face. Increases in the Chair's text-, voice-, or face-emotion indices during the testimony generally raise the S&P500 index and lower the VIX. Stock prices are particularly sensitive to both the members' questions and the Fed Chair's answers about issues directly related to monetary policy. These effects add up and propagate after the testimony, reaching magnitudes comparable to those after a policy rate cut. Our findings resonate with the view in psychology that communication is much more than words and underscore the need for a holistic approach to central bank communication. |
Keywords: | Central bank communications, Financial markets, High-frequency identification, Facial emotion recognition, Vocal signal processing, Textual Analysis |
JEL: | E52 E58 E71 |
Date: | 2022–09–21 |
URL: | http://d.repec.org/n?u=RePEc:tor:tecipa:tecipa-737&r= |
By: | du Plessis, Emile; Fritsche, Ulrich |
Abstract: | A reflection on the lackluster growth over the decade since the Global Financial Crisis has renewed interest in preventative measures for a long-standing problem. Advances in machine learning algorithms during this period present promising forecasting solutions. In this context, the paper develops new forecasting methods for an old problem by employing 13 machine learning algorithms to study 147 year of systemic financial crises across 17 countries. It entails 12 leading indicators comprising real, banking and external sectors. Four modelling dimensions encompassing a contemporaneous pooled format through an expanding window, transformations with a lag structure and 20-year rolling window as well as individual format are implemented to assess performance through recursive out-of-sample forecasts. Findings suggest fixed capital formation is the most important variable. GDP per capita and consumer inflation have increased in prominence whereas debt-to-GDP, stock market and consumption were dominant at the turn of the 20th century. Through a lag structure, banking sector predictors on average describe 28 percent of the variation in crisis prevalence, real sector 64 percent and external sector 8 percent. A lag structure and rolling window both improve on optimised contemporaneous and individual country formats. Nearly half of all algorithms reach peak performance through a lag structure. As measured through AUC, F1 and Brier scores, top performing machine learning methods consistently produce high accuracy rates, with both random forests and gradient boosting in front with 77 percent correct forecasts. Top models contribute added value above 20 percentage points in most instances and deals with a high degree of complexity across several countries. |
Keywords: | machine learning,systemic financial crises,leading indicators,forecasting,early warning signal |
JEL: | C14 C15 C32 C35 C53 E37 E44 G21 |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:zbw:uhhwps:67&r= |
By: | Yiyan Huang; Cheuk Hang Leung; Xing Yan; Qi Wu; Shumin Ma; Zhiri Yuan; Dongdong Wang; Zhixiang Huang |
Abstract: | Many practical decision-making problems in economics and healthcare seek to estimate the average treatment effect (ATE) from observational data. The Double/Debiased Machine Learning (DML) is one of the prevalent methods to estimate ATE in the observational study. However, the DML estimators can suffer an error-compounding issue and even give an extreme estimate when the propensity scores are misspecified or very close to 0 or 1. Previous studies have overcome this issue through some empirical tricks such as propensity score trimming, yet none of the existing literature solves this problem from a theoretical standpoint. In this paper, we propose a Robust Causal Learning (RCL) method to offset the deficiencies of the DML estimators. Theoretically, the RCL estimators i) are as consistent and doubly robust as the DML estimators, and ii) can get rid of the error-compounding issue. Empirically, the comprehensive experiments show that i) the RCL estimators give more stable estimations of the causal parameters than the DML estimators, and ii) the RCL estimators outperform the traditional estimators and their variants when applying different machine learning models on both simulation and benchmark datasets. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.01805&r= |
By: | Leogrande, Angelo; Costantiello, Alberto; Laureti, Lucio |
Abstract: | In this article we estimate the value of “Non-R&D Innovation Expenditures” in Europe. We use data from the European Innovation Scoreboard-EIS of the European Commission from the period 2010-2019. We test data with the following econometric models i.e.: Pooled OLS, Dynamic Panel, Panel Data with Fixed Effects, Panel Data with Random Effects, WLS. We found that “Non-R&D Innovation Expenditures” is positively associated among others to “Innovation Index” and “Firm Investments” and negatively associated among others to “Human Resources” and “Government Procurement of Advanced Technology Products”. We use the k-Means algorithm with either the Silhouette Coefficient and the Elbow Method in a confrontation with the network analysis optimized with the Distance of Manhattan and we find that the optimal number of clusters is four. Furthermore, we propose a confrontation among eight machine learning algorithms to predict the level of “Non-R&D Innovation Expenditures” either with Original Data-OD either with Augmented Data-AD. We found that Gradient Boost Trees Regression is the best predictor for OD while Tree Ensemble Regression is the best Predictor for AD. Finally, we verify that the prediction with AD is more efficient of that with OD with a reduction in the average value of statistical errors equal to 40,50%. |
Keywords: | Innovation, and Invention: Processes and Incentives; Management of Technological Innovation and R&D; Diffusion Processes; Open Innovation. |
JEL: | O30 O31 O32 O33 O34 |
Date: | 2022–09–11 |
URL: | http://d.repec.org/n?u=RePEc:pra:mprapa:114526&r= |
By: | Tae-Hwy Lee; Ekaterina Seregina |
Abstract: | In this paper we develop a novel method of combining many forecasts based on a machine learning algorithm called Graphical LASSO. We visualize forecast errors from different forecasters as a network of interacting entities and generalize network inference in the presence of common factor structure and structural breaks. First, we note that forecasters often use common information and hence make common mistakes, which makes the forecast errors exhibit common factor structures. We propose the Factor Graphical LASSO (Factor GLASSO), which separates common forecast errors from the idiosyncratic errors and exploits sparsity of the precision matrix of the latter. Second, since the network of experts changes over time as a response to unstable environments such as recessions, it is unreasonable to assume constant forecast combination weights. Hence, we propose Regime-Dependent Factor Graphical LASSO (RD-Factor GLASSO) and develop its scalable implementation using the Alternating Direction Method of Multipliers (ADMM) to estimate regime-dependent forecast combination weights. The empirical application to forecasting macroeconomic series using the data of the European Central Bank's Survey of Professional Forecasters (ECB SPF) demonstrates superior performance of a combined forecast using Factor GLASSO and RD-Factor GLASSO. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.01697&r= |
By: | Emanuel Kohlscheen; Richhild Moessner |
Abstract: | We analyse the drivers of European Power Exchange (EPEX) wholesale electricity prices between 2012 and early 2022 using machine learning. The agnostic random forest approach that we use is able to reduce in-sample root mean square errors (RMSEs) by around 50% when compared to a standard linear least square model. This indicates that non-linearities and interaction effects are key in wholesale electricity markets. Out-of-sample prediction errors using machine learning are (slightly) lower than even in-sample least square errors using a least square model. The effects of efforts to limit power consumption and green the energy matrix on wholesale electricity prices are first order. CO2 permit prices strongly impact electricity prices, as do the prices of source energy commodities. And carbon permit prices impact has clearly increased post-2021 (particularly for baseload prices). Among energy sources, natural gas has the largest effect on electricity prices. Importantly, the role of wind energy feed-in has slowly risen over time, and its impact is now roughly on par with that of coal. |
Date: | 2022–08 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2208.14650&r= |
By: | Isaac K. Ofori (University of Insubria, Varese, Italy); Camara K. Obeng (University of Cape Coast, Cape Coast, Ghana); Simplice A. Asongu (Yaoundé, Cameroon) |
Abstract: | The question of what really drives economic growth in sub-Saharan Africa (SSA) has been debated for many decades now. However, there is still a lack of clarity on variables crucial for driving growth as prior contributions have been executed at the backdrop of preferential selection of covariates in the midst several of potential drivers of economic growth. The main challenge with such contribution is that even tenuous variables may be deemed influential under some model specifications and assumptions. To address this and inform policy appropriately, we train algorithms for four machine learning regularization techniques— the Standard lasso, the Adaptive lasso, the Minimum Schwarz Bayesian information criterion lasso, and the Elasticnet to study patterns in a dataset containing 113 covariates and identify the key variables affecting growth in SSA. We find that only 7 covariates are key for driving growth in SSA. Estimates of these variables are provided by running the lasso inferential techniques of double-selection linear regression, partialing-out lasso linear regression, and partialing-out lasso instrumental variable regression. Policy recommendations are also provided in line with the AfCFTA and the green growth agenda of the region. |
Keywords: | Economic growth; Elasticnet; Lasso; Machine learning; Partialing-out IV regression; sub-Saharan Africa |
JEL: | C52 C53 C55 O11 O4 O55 |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:exs:wpaper:22/061&r= |
By: | Isaac K. Ofori (University of Insubria, Varese, Italy); Camara K. Obeng (University of Cape Coast, Cape Coast, Ghana); Simplice A. Asongu (Yaoundé, Cameroon) |
Abstract: | The question of what really drives economic growth in sub-Saharan Africa (SSA) has been debated for many decades now. However, there is still a lack of clarity on variables crucial for driving growth as prior contributions have been executed at the backdrop of preferential selection of covariates in the midst several of potential drivers of economic growth. The main challenge with such contribution is that even tenuous variables may be deemed influential under some model specifications and assumptions. To address this and inform policy appropriately, we train algorithms for four machine learning regularization techniques— the Standard lasso, the Adaptive lasso, the Minimum Schwarz Bayesian information criterion lasso, and the Elasticnet to study patterns in a dataset containing 113 covariates and identify the key variables affecting growth in SSA. We find that only 7 covariates are key for driving growth in SSA. Estimates of these variables are provided by running the lasso inferential techniques of double-selection linear regression, partialing-out lasso linear regression, and partialing-out lasso instrumental variable regression. Policy recommendations are also provided in line with the AfCFTA and the green growth agenda of the region. |
Keywords: | Economic growth; Elasticnet; Lasso; Machine learning; Partialing-out IV regression; sub-Saharan Africa |
JEL: | C52 C53 C55 O11 O4 O55 |
Date: | 2022–01 |
URL: | http://d.repec.org/n?u=RePEc:agd:wpaper:22/061&r= |
By: | Berend Gort; Xiao-Yang Liu; Xinghang Sun; Jiechao Gao; Shuaiyu Chen; Christina Dan Wang |
Abstract: | Designing profitable and reliable trading strategies is challenging in the highly volatile cryptocurrency market. Existing works applied deep reinforcement learning methods and optimistically reported increased profits in backtesting, which may suffer from the false positive issue due to overfitting. In this paper, we propose a practical approach to address backtest overfitting for cryptocurrency trading using deep reinforcement learning. First, we formulate the detection of backtest overfitting as a hypothesis test. Then, we train the DRL agents, estimate the probability of overfitting, and reject the overfitted agents, increasing the chance of good trading performance. Finally, on 10 cryptocurrencies over a testing period from 05/01/2022 to 06/27/2022 (during which the crypto market crashed two times), we show that the less overfitted deep reinforcement learning agents have a higher Sharpe ratio than that of more over-fitted agents, an equal weight strategy, and the S&P DBM Index (market benchmark), offering confidence in possible deployment to a real market. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.05559&r= |
By: | Shrub, Yuliya; Rieger, Jonas; Müller, Henrik; Jentsch, Carsten |
Abstract: | The prompt availability of information on the current state of the economy in real-time is required for prediction purposes and crucial for timely policy adjustment and economic decision-making. While important macroeconomic indicators are reported only quarterly and also published with substantial delay, other related data are available more frequently, that is monthly, weekly, daily or even more often. In this regard, the goal of nowcasting methods is to make use of such more frequently collected variables to update predictions of less often reported variables such as e.g. GDP growth. In this paper, we propose a mixed-frequency model to investigate the potential of using text data in form of newspaper articles for nowcasting German GDP growth. Newspaper text data appears to be very helpful in this regard as it directly explains economic and social progress influencing GDP growth and as it is updated frequently without any substantial delay. We compare several setups based on commonly used macro variables with and without additionally included information from text data (extracted in an unsupervised manner) as well as a setup only based on such text data. To deal with the high dimensionality of the considered data, we make use of principal component regression, penalization techniques and random forest. Comparing our results leads to the conclusion that there are certain benefits achievable when text data are included for nowcasting, but the unsupervised extraction of information from text data tends to still contain too much irrelevant noise hampering the performance of the resulting nowcasting approach. |
Keywords: | Topic model,latent Dirichlet allocation,text mining,econometrics,gross domestic product,prediction,forecast |
JEL: | C52 C53 C55 E37 |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:zbw:rwirep:964&r= |
By: | Jessica Birkholz |
Abstract: | This study explores the regional innovation system characteristics that build the basis for the regional absorptive capacity of entrepreneurial knowledge. Regionalized patent data is combined with firm level and regional information for German regions over the period 1995 until 2015. Network analysis is applied to identify regional innovation system characteristics on three different layers: 1) cooperation between incumbent firms, 2) learning regimes, and 3) the technological knowledge base. Random forest analyses on basis of conditional inference classification trees are used to identify the most important characteristics for the regional absorption of entrepreneurial knowledge in general and on different efficiency levels. It is shown that characteristics on all three layers impact the regional absorption of entrepreneurial knowledge. Further, the direction and magnitude of the effect regional innovation system characteristics have on the regional knowledge absorption vary across different levels of absorption rates. It is concluded that for a successful implementation of policies to increase the impact of entrepreneurial knowledge on regional development, the regional innovation system needs to be monitored and adapted continuously. |
Keywords: | Entrepreneurship, Regional absorptive capacity, Smart specialization |
JEL: | L26 O33 D85 |
Date: | 2022–04 |
URL: | http://d.repec.org/n?u=RePEc:atv:wpaper:2205&r= |
By: | Lena Sasal; Tanujit Chakraborty; Abdenour Hadid |
Abstract: | Deep learning utilizing transformers has recently achieved a lot of success in many vital areas such as natural language processing, computer vision, anomaly detection, and recommendation systems, among many others. Among several merits of transformers, the ability to capture long-range temporal dependencies and interactions is desirable for time series forecasting, leading to its progress in various time series applications. In this paper, we build a transformer model for non-stationary time series. The problem is challenging yet crucially important. We present a novel framework for univariate time series representation learning based on the wavelet-based transformer encoder architecture and call it W-Transformer. The proposed W-Transformers utilize a maximal overlap discrete wavelet transformation (MODWT) to the time series data and build local transformers on the decomposed datasets to vividly capture the nonstationarity and long-range nonlinear dependencies in the time series. Evaluating our framework on several publicly available benchmark time series datasets from various domains and with diverse characteristics, we demonstrate that it performs, on average, significantly better than the baseline forecasters for short-term and long-term forecasting, even for datasets that consist of only a few hundred training samples. |
Date: | 2022–09 |
URL: | http://d.repec.org/n?u=RePEc:arx:papers:2209.03945&r= |
By: | Jessica Birkholz; Jarina Kühn; Mariia Shkolnykova |
Abstract: | This paper contributes to the discussion on exploration and exploitation by analyzing the innovation behavior of SMEs and large firms during the first year of the COVID-19 pandemic in Germany. It provides a novel way to measure the type of firm innovation behavior in a dynamically changing environment. After collecting news articles about innovation activities conducted by firms, we applied text mining techniques to identify the positioning of each firm on the continuum from exploitation to exploration. The results of our analyses indicate three main dynamics: 1) all studied firms tend to conduct more explorative innovation activities during the COVID-19 crisis, 2) large and "technology-intensive" firms are more prone to perform explorative innovation activities than SMEs and firms that are not "technology-intensive", and 3) technology intensity is associated with explorative innovation behavior during the crisis. Our results suggest that considering technology intensity and the size of firms is important for designing effective policies during crises. |
Keywords: | COVID-19, Crisis, Innovation, SME, Text Mining, News Data, Exploration, Exploitation |
JEL: | O31 O33 L25 |
Date: | 2022–04 |
URL: | http://d.repec.org/n?u=RePEc:atv:wpaper:2203&r= |
By: | Mr. Matthieu Bellon; Emanuele Massetti; Berkay Akyapi |
Abstract: | A growing literature estimates the macroeconomic effect of weather using variations in annual country-level averages of temperature and precipitation. However, averages may not reveal the effects of extreme events that occur at a higher time frequency or higher spatial resolution. To address this issue, we rely on global daily weather measurements with a 30-km spatial resolution from 1979 to 2019 and construct 164 weather variables and their lags. We select a parsimonious subset of relevant weather variables using an algorithm based on the Least Absolute Shrinkage and Selection Operator. We also expand the literature by analyzing weather impacts on government revenue, expenditure, and debt, in addition to GDP per capita. We find that an increase in the occurrence of high temperatures and droughts reduce GDP, whereas more frequent mild temperatures have a positive impact. The share of GDP variations that is explained by weather as captured by the handful of our selected variables is much higher than what was previously implied by using annual temperature and precipitation averages. We also find evidence of counter-cyclical fiscal policies that mitigate adverse weather shocks, especially excessive or unusually low precipitation episodes. |
Keywords: | climate; extreme weather; macro economy; big data; weather variable; weather shock; OLS estimation outcome; climate variable; weather data sources; Natural disasters; Climate change; Global; Middle East; Sub-Saharan Africa; East Asia; East Africa; GDP effect; weather observation |
Date: | 2022–07–29 |
URL: | http://d.repec.org/n?u=RePEc:imf:imfwpa:2022/156&r= |
By: | Albers, Thilo N. H. (HU Berlin); Kappner, Kalle (HU Berlin) |
Abstract: | Historical city directories are rich sources of micro-geographic data. They provide information on the location of households and firms and their occupations and industries, respectively. We develop a generic algorithmic work flow that converts scans of them into geo- and status-referenced household- level data sets. Applying the work flow to our case study, the Berlin 1880 directory, adds idiosyncratic challenges that should make automation less attractive. Yet, employing an administrative benchmark data set on household counts, incomes, and income distributions across more than 200 census tracts, we show that semi-automatic referencing yields results very similar to those from labour-intensive manual referencing. Finally, we discuss potential applications in economic history and beyond. |
Keywords: | city directories; data extraction; granular spatial data; |
Date: | 2022–01–29 |
URL: | http://d.repec.org/n?u=RePEc:rco:dpaper:315&r= |
By: | Lange, Kai-Robin; Reccius, Matthias; Schmidt, Tobias; Müller, Henrik; Roos, Michael W. M.; Jentsch, Carsten |
Abstract: | Identifying narratives in texts is a challenging task, as not only narrative elements such as the factors and events have to be identified but their semantic relation has to be explained as well. Despite this complexity, an effective technique to extract narratives from texts can have a great impact on how we view political and economical developments. By analyzing narratives, one can get a better understanding of how such narratives spread across the media landscape and change our world views as a result. In this paper, we take a closer look into a recently proposed definition of a collective economic narrative that is characterized by containing a cause-effect relation which is used to explain a situation for a given world view. For the extraction of such collective economic narratives, we propose a novel pipeline that improves the RELATIO-method for statement detection. By filtering the corpus for causal articles and connecting statements by detecting causality between them, our augmented RELATIO approach adapts well to identify more complex narratives following our definition. Our approach also improves the consistency of the RELATIO-method by augmenting it with additional pre- and post-processing steps that enhance the statement detection by the means of Coreference Resolution and automatically filters out unwanted noise in the form of uninterpretable statements. We illustrate the performance of this new pipeline in detecting collective economic narratives by analyzing a Financial Times data set that we filtered for economic and inflation-related terms as well as causal indicators. |
Keywords: | Econometrics,narrative,text mining,coreference resolution,named entity recognition,causal linking |
JEL: | C18 C55 C87 E70 |
Date: | 2022 |
URL: | http://d.repec.org/n?u=RePEc:zbw:rwirep:963&r= |
By: | Sara Colella (AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique); Frédéric Dufourt (AMSE - Aix-Marseille Sciences Economiques - EHESS - École des hautes études en sciences sociales - AMU - Aix Marseille Université - ECM - École Centrale de Marseille - CNRS - Centre National de la Recherche Scientifique); Vincent A Hildebrand (York University [Toronto], Dalla Lana School of Public Health, University of Toronto); Rémi Vivès (York University [Toronto]) |
Abstract: | We derive a mental health indicator measuring the frequency of words expressing anger, anxiety and sadness from a fixed population of Twitter users located in France. During the first COVID-19 lockdown, our indicator did not reveal a statistically significant mental health response, while the second lockdown triggered a sharp and persistent deterioration in all three emotions. In addition, DID and event study estimates show a more severe mental health deterioration among women and younger users during the second lockdown. Our results suggest that successive stay-at-home orders significantly worsen mental health across a large segment of the population. |
Keywords: | COVID-19,lockdown,mental health,Twitter data,well-being |
Date: | 2022–07–27 |
URL: | http://d.repec.org/n?u=RePEc:hal:wpaper:hal-03740701&r= |