Skip to content

Latest commit

 

History

History
36 lines (32 loc) · 10.4 KB

File metadata and controls

36 lines (32 loc) · 10.4 KB

Financial Text Mining

Learning Financial Domain Word Embedding based on BERT

Reference Title Data source (open-sourced?) Model Type Evaluation Metirc(s) Time Span Primary Research Problem Venue
Liu et al. (2020) FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining English Wikipedia and BooksCorpus, (General Domain), financial Web like CommonCrawl News dataset, YahooFinance, and RadditFinanceQA. (Financial Domain), totally over 61GB text. (open-sourced) BERT Financial Sentence Boundary Detection: outperforms the baseline by 0.085 Mean score. Financial Sentiment Analysis: Accuracy 0.94 F1 score 0.93. Financial Question Answering: Normalized Discounted Cumulative Gain (NDCG) 0.76. Mean reciprocal rank (MRR): 0.68 -/07/2013- -/12/2019 Due to the lack of labeled training data, applying deep learning on financial text mining is often unsuccessful IJCAI-20
Yang et al. (2020) A Pretrained BERT Model for Financial Communications Corporate Reports 10-K & 10-Q: 2.5B tokens; Earnings Call Transcripts: 1.3B tokens; Analyst Reports: 1.1B tokens (Financial Domain) Avaliable at link BERT - - FinBERT is a BERT model pre-trained on financial communication text. The purpose is to enhance finaincal NLP research and practice. It is trained on the following three finanical communication corpus. The total corpora size is 4.9B tokens. Arxiv Paper
Araci (2019) Financial Sentiment Analysis with BERT Resources There are two datasets used for FinBERT. The language model further training is done on a subset of Reuters TRC2 dataset. Avaliable at link BERT - - FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. Arxiv Paper
McClelland et al.(1986) In: Parallel distributed processing: Explorations in the microstructure of cognition \ \ \ \ The first paper that raise the idea to encode the knowledge using script- or frame-like representations Book
David et al. (1986) Learning representations by back-propagating errors \ NN Difference between the actual output vector of the net and the desired output vector. \ The earliest paper that represent words as continuous vectors. Nature
Du et al. (2019) AIG Investments.AI at the FinSBD Task: Sentence Boundary Detection through Sequence Labelling and BERT Fine-tuning FinSBD-2019, Pre-trained word embedding: glove.6B, public domain implementation BERT, LSTM 1) F1 scores for predicting beginning (BS) and ending (ES) tokens separately as well as 2) the mean of two separate, F1 scores, precision, recall \ Financial document sentence boundary detection Proceedings of the First Workshop on Financial Technology and Natural Language Processing

Text Mining

Reference Title Data source (open-sourced?) Model Type Evaluation Metirc(s) Time Span Primary Research Problem Venue
Guo et al. (2020) Deep Semantic Compliance Advisor for Unstructured Document Compliance Checking Stanford Natural Language Inference (SNLI) dataset (open-sourced), a real English contract data (NOT open source) Graph Neural Network,attention-based RNN It takes a legal professional 4+ hours for each contract checking, DSCA can return the checking results with detail comparison info in one minute. \ Unstructured document checking, sentiment analysis IJCAI-20
Guo et al. (2020) IGNITE: A Minimax Game Toward Learning Individual Treatment Effects from Networked Observational Data Create semi-synthesis data to mimic the real-world situation (NOT open-sourced) \ \ \ Learn Individual Treatment Effects (ITEs) from network information Eco
Wang & Zhu (2020) Interpretable Multimodal Learning for Intelligent Regulation in Online Payment Systems WeChat Pay of Tencent (NOT open source) Attention mechanism 85.9% Accuracy, and triplet loss is 0.01 lower than baseline model 01/07/2019- 31/08/2019 Try to investigate the relationship between transactions and texts on e-commerce system IJCAI-20
David et al. (2020) Leveraging Contextual Text Representations for Anonymizing German Financial Documents Bundesanzeiger11 (BANZ) (Open sourced) Bi-directional Character-based Recurrent Neural Network 98.9% Precision, 0.973 Recall, 0.972 F1 \ App of anonymizing the sensitive components in financial document AAAI-20
Kiyoshi et al. (2020) Economic News Impact Analysis, Using Causal-Chain Search from Textual Data Tokyo Stock Exchange (open sourced) Casual Chain Search VS Absolute Return in Stock Market Both related (Using similarity of AR) 01/10/2012- 31/05/2018 We created lists of related companies and measured impacts on those stock prices for the two important news about a wheat price in 2018. As a result, the market impacts appeared in the companies related to the ripple effects when the news is about the obvious fact AAAI-20
Edminston et al. (2020) Unsupervised Discovery of Firm-Level Variables in Earnings Call Transcript Embeddings Compustat SAFE - Graph Algorithm SAFE Score Q1-2020 Repurpose algorithm from computational biology. Compares embedding methods across economic variables. FinNLP-2020
Taylor & Keselj (2020) Using Extractive Lexicon-based Sentiment Analysis to Enhance Understanding of the Impact of Non-GAAP Measures in Financial Reporting McDonald (2019) 10-K \ Hypothesis Test 1998-2019 First to use extractive approach for sentiment analysis in Finance FinNLP-2020
Chen & Sarkar (2020) A Semantic Approach to Financial Fundamentals Stage One 10-X Parse Data BERT Cross-industry variation 2006-2018 Introduces the Semantically-Informed Financial Index FinNLP-2020
Bambrick et al. (2020) NSTM: Real-Time Query-Driven News Overview Composition at Bloomberg Not OS NSTM User feedback \ Developed a novel system that composes concise and human readable news overviews given arbitrary user search queries. ACL-2020
Zheng et al. (2019) Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction Chinese Financial Announcements Doc2EDAG Precision, Recall, F1 2008-2018 New model to directly generate event tables. Reformalise DEE task without trigger words. New real-world dataset. EMNLP-2019
Moreno-Sandoval et al. (2019) Tone Analysis in Spanish Financial Reporting Narratives ORBIS & Annual Reports Lexicon/Rule-based F1, Accuracy, Precision, Recall 2014-2017 First corpus of "letters to shareholders" in Spanish. Created a gold standard to evaluate opinion systems. 2019 (FNP)
Tian & Peng (2019) Finance document Extraction Using Data Augmentation and Attention \ Attention-based LSTM Weighted F1 \ Title detection using attention based LSTM 2019 (FNP)
Blumenthal & Graf (2019) Utilizing Pre-Trained Word Embeddings to Learn Classification Lexicons with Little Supervision SST-2 & FNHL Neural Network Accuracy \ Present a novel method to learn classification lexicons from a labeled text corpus that incorporates word sim- ilarities in the form of pre-trained word em- beddings 2019 (FNP)
Gooding & Briscoe (2019) Active Learning for Financial Investment Reports All Street Research Linear SVC F1-Score \ Built a classification pipeline to categorise investment-related content. 2019 (FNP)
Chen et al. (2019) Numeracy-600K: Learning Numeracy for Detecting Exaggerated Information in Market Comments Reuters BiGRU, LR, CNN… F1-Score \ Providing novel challenge and dataset. Set strong baseline. ACL-2019
Dereli & Saraclar (2019) Convolutional Neural Networks for Financial Text Regression 10-K Data - Tsai et al. (2016) CNN Spearmans Rank Correlation \ Reduced dependencies on lexicon. ACL-2019
Sedinkina et al. (2019) Automatic Domain Adaptation Outperforms Manual Domain Adaptation for Predicting Financial Outcomes H4N and L&M OLS t-statistic, R^2 \ Automatic domain adaptation of lexicons outperforms manual. ACL-2019
Chung-chi et al. (2017) NLG301 at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News SemEval-2017 Task 5 SVM Cosine Similarity 01/01/2015 - 31/10/2016 Text Span, Ensemble. SemEval-2017 Task 5
Chung-chi et al. (2018) Fine-Grained Analysis of Financial Tweets FiQA 2018 Task 1 CNN / Bi-LSTM / CRNN Accuracy/MSE/R2 / Aspect, Extension Dataset FiQA 2018 Task 1