Multimodal Multi-Speaker Merger \& Acquisition Financial Modeling: A New Task, Dataset, and Neural Baselines
Risk prediction is an essential task in financial markets. Merger and Acquisition (M{\&}A) calls provide key insights into the claims made by company executives about the restructuring of the financial firms. Extracting vocal and textual cues from M{\&}A calls can help model the risk associated with such financial activities. To aid the analysis of M{\&}A calls, we curate a dataset of conference call transcripts and their corresponding audio recordings for the time period ranging from 2016 to 2020. We introduce M3ANet, a baseline architecture that takes advantage of the multimodal multi-speaker input to forecast the financial risk associated with the M{\&}A calls. Empirical results prove that the task is challenging, with the pro-posed architecture performing marginally better than strong BERT-based baselines. We release the M3A dataset and benchmark models to motivate future research on this challenging problem domain.
PDF Abstract