Skip to main content
Skip to footer
contact-icon Contact us Contact us

Abstract

Recent advancement in neural network architectures has provided several opportunities to develop systems to automatically extract and represent information from domain-specific unstructured text sources. The Finsim-2021 shared task, collocated with the FinNLP workshop, offered the challenge to automatically learn effective and precise semantic models of financial domain concepts. Building such semantic representations of domain concepts requires knowledge about the specific domain. Such a thorough knowledge can be obtained through the contextual information available in raw text documents on those domains. In this paper, we proposed a transformer-based BERT architecture that captures such contextual information from a set of domain-specific raw documents and then perform a classification task to segregate domain terms into fixed number of class labels. The proposed model not only considers the contextual BERT embeddings but also incorporates a TF-IDF vectorizer that gives a word-level importance to the model. The performance of the model has been evaluated against several baseline architectures.

Research area: Data and decision sciences

Authors: Tushar Goel, Vipul Chauhan, Ishan Verma, Tirthankar Dasgupta, Lipika Dey

Conference/event: Financial Technology on the Web (FinWeb) Workshop at World Wide Web Conference (WWW) – TheWebConf 2021

Conference date: April 2021