Home > Research > Publications & Outputs > Advancements in Financial Document Structure Ex...

Links

Text available via DOI:

View graph of relations

Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023)

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Published

Standard

Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023). / Kang, J.; Patel, M.; Agrawal, A. et al.
Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023. ed. / Jingrui He; Themis Palpanas; Xiaohua Hu; Alfredo Cuzzocrea; Dejing Dou; Dominik Slezak; Wei Wang; Aleksandra Gruca; Jerry Chun-Wei Lin; Rakesh Agrawal. Los Alamitos, CA, USA: IEEE Computer Society Press, 2024. p. 2839-2844 (Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023).

Research output: Contribution in Book/Report/Proceedings - With ISBN/ISSNConference contribution/Paperpeer-review

Harvard

Kang, J, Patel, M, Agrawal, A, Sevitha, S, Srinivasa, R, Bellato, S, Kumar, MA, Tsang, N & El-Haj, M 2024, Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023). in J He, T Palpanas, X Hu, A Cuzzocrea, D Dou, D Slezak, W Wang, A Gruca, JC-W Lin & R Agrawal (eds), Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023. Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023, IEEE Computer Society Press, Los Alamitos, CA, USA, pp. 2839-2844. https://doi.org/10.1109/BigData59044.2023.10386125

APA

Kang, J., Patel, M., Agrawal, A., Sevitha, S., Srinivasa, R., Bellato, S., Kumar, M. A., Tsang, N., & El-Haj, M. (2024). Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023). In J. He, T. Palpanas, X. Hu, A. Cuzzocrea, D. Dou, D. Slezak, W. Wang, A. Gruca, J. C.-W. Lin, & R. Agrawal (Eds.), Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023 (pp. 2839-2844). (Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023). IEEE Computer Society Press. https://doi.org/10.1109/BigData59044.2023.10386125

Vancouver

Kang J, Patel M, Agrawal A, Sevitha S, Srinivasa R, Bellato S et al. Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023). In He J, Palpanas T, Hu X, Cuzzocrea A, Dou D, Slezak D, Wang W, Gruca A, Lin JCW, Agrawal R, editors, Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023. Los Alamitos, CA, USA: IEEE Computer Society Press. 2024. p. 2839-2844. (Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023). Epub 2023 Dec 15. doi: 10.1109/BigData59044.2023.10386125

Author

Kang, J. ; Patel, M. ; Agrawal, A. et al. / Advancements in Financial Document Structure Extraction : Insights from Five Years of FinTOC (2019-2023). Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023. editor / Jingrui He ; Themis Palpanas ; Xiaohua Hu ; Alfredo Cuzzocrea ; Dejing Dou ; Dominik Slezak ; Wei Wang ; Aleksandra Gruca ; Jerry Chun-Wei Lin ; Rakesh Agrawal. Los Alamitos, CA, USA : IEEE Computer Society Press, 2024. pp. 2839-2844 (Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023).

Bibtex

@inproceedings{44571eb7d4614d6c88bbc46e250ee551,
title = "Advancements in Financial Document Structure Extraction: Insights from Five Years of FinTOC (2019-2023)",
abstract = "In this comprehensive paper, we present a detailed overview of the Financial Table Of Content extraction shared task series, FinTOC, conducted over a span of five years from 2019 to 2023. This paper serves as a retrospective analysis of the key developments in the field of financial document structure extraction. The FinTOC series, hosted within the framework of the Financial Narrative Processing (FNP) workshop, has been instrumental in shaping the landscape of Natural Language Processing (NLP) in the financial domain. Our analysis delves into the diverse methodologies proposed by participants across all editions, shedding light on the innovative strategies employed to tackle the intricate challenge of extracting structured information from financial documents. We explore the evolution of techniques, from traditional rule-based approaches to cutting-edge deep learning models, showcasing the dynamic nature of NLP advancements. Furthermore, our study investigates the introduction of multilingual datasets by the organizers, highlighting the importance of cross-lingual analysis in financial document processing. We also examine the contributions made by participants in augmenting the training data with external sources, showcasing the collaborative spirit of the NLP community in enhancing the quality and size of the shared training dataset.",
keywords = "training, deep learning, text analysis, instruments, conferences, layout, training data",
author = "J. Kang and M. Patel and A. Agrawal and S. Sevitha and R. Srinivasa and S. Bellato and Kumar, {M. Anand} and N. Tsang and M. El-Haj",
year = "2024",
month = jan,
day = "22",
doi = "10.1109/BigData59044.2023.10386125",
language = "English",
series = "Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023",
publisher = "IEEE Computer Society Press",
pages = "2839--2844",
editor = "Jingrui He and Themis Palpanas and Xiaohua Hu and Alfredo Cuzzocrea and Dejing Dou and Dominik Slezak and Wei Wang and Aleksandra Gruca and Lin, {Jerry Chun-Wei} and Rakesh Agrawal",
booktitle = "Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023",

}

RIS

TY - GEN

T1 - Advancements in Financial Document Structure Extraction

T2 - Insights from Five Years of FinTOC (2019-2023)

AU - Kang, J.

AU - Patel, M.

AU - Agrawal, A.

AU - Sevitha, S.

AU - Srinivasa, R.

AU - Bellato, S.

AU - Kumar, M. Anand

AU - Tsang, N.

AU - El-Haj, M.

PY - 2024/1/22

Y1 - 2024/1/22

N2 - In this comprehensive paper, we present a detailed overview of the Financial Table Of Content extraction shared task series, FinTOC, conducted over a span of five years from 2019 to 2023. This paper serves as a retrospective analysis of the key developments in the field of financial document structure extraction. The FinTOC series, hosted within the framework of the Financial Narrative Processing (FNP) workshop, has been instrumental in shaping the landscape of Natural Language Processing (NLP) in the financial domain. Our analysis delves into the diverse methodologies proposed by participants across all editions, shedding light on the innovative strategies employed to tackle the intricate challenge of extracting structured information from financial documents. We explore the evolution of techniques, from traditional rule-based approaches to cutting-edge deep learning models, showcasing the dynamic nature of NLP advancements. Furthermore, our study investigates the introduction of multilingual datasets by the organizers, highlighting the importance of cross-lingual analysis in financial document processing. We also examine the contributions made by participants in augmenting the training data with external sources, showcasing the collaborative spirit of the NLP community in enhancing the quality and size of the shared training dataset.

AB - In this comprehensive paper, we present a detailed overview of the Financial Table Of Content extraction shared task series, FinTOC, conducted over a span of five years from 2019 to 2023. This paper serves as a retrospective analysis of the key developments in the field of financial document structure extraction. The FinTOC series, hosted within the framework of the Financial Narrative Processing (FNP) workshop, has been instrumental in shaping the landscape of Natural Language Processing (NLP) in the financial domain. Our analysis delves into the diverse methodologies proposed by participants across all editions, shedding light on the innovative strategies employed to tackle the intricate challenge of extracting structured information from financial documents. We explore the evolution of techniques, from traditional rule-based approaches to cutting-edge deep learning models, showcasing the dynamic nature of NLP advancements. Furthermore, our study investigates the introduction of multilingual datasets by the organizers, highlighting the importance of cross-lingual analysis in financial document processing. We also examine the contributions made by participants in augmenting the training data with external sources, showcasing the collaborative spirit of the NLP community in enhancing the quality and size of the shared training dataset.

KW - training

KW - deep learning

KW - text analysis

KW - instruments

KW - conferences

KW - layout

KW - training data

U2 - 10.1109/BigData59044.2023.10386125

DO - 10.1109/BigData59044.2023.10386125

M3 - Conference contribution/Paper

T3 - Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023

SP - 2839

EP - 2844

BT - Proceedings - 2023 IEEE International Conference on Big Data, BigData 2023

A2 - He, Jingrui

A2 - Palpanas, Themis

A2 - Hu, Xiaohua

A2 - Cuzzocrea, Alfredo

A2 - Dou, Dejing

A2 - Slezak, Dominik

A2 - Wang, Wei

A2 - Gruca, Aleksandra

A2 - Lin, Jerry Chun-Wei

A2 - Agrawal, Rakesh

PB - IEEE Computer Society Press

CY - Los Alamitos, CA, USA

ER -