eXtensible Business Reporting Language (XBRL)

Task Description

In this task, we assess the ability of LLMs to retrieve and interpret eXtensible Business Reporting Language (XBRL) filings and their ability to use this data to generate precise answers to financial math questions. The objective of this task is to evaluate how well LLMs can process the structured financial data encoded in XBRL format.

Companies need to provide accurate and standardized financial information to stakeholders, including investors, regulators, and auditors. XBRL is a globally recognized standard for electronic communication of business and financial data. Using XBRL will help ensure consistency and comparability across different companies and jurisdictions. Using LLMs to retrieve and process financial data in XBRL format is an important application in financial regulation.

Example Input and Output:

XBRL Term

Q: What does the term ‘abstract’ mean in the context of the XBRL standard? Please provide a detailed explanation of this term.

A: An attribute of an element to indicate that the element is only used in a hierarchy to group related elements together. An abstract element cannot be used to tag data in an instance document.

Domain Query to XBRL Reports

Q: Among operations, investing, and financing activities, which brought in the most net (or lowest net) cash flow for Nike in FY2023?

A: Among the three, cash flow from operations was the highest for Nike in FY2023.

Financial Math

Q: A project expects annual cash inflows of $6,000 over 7 years. If the discount rate is 8%, what is the Net Present Value (NPV) of the project?

A: 21462.58

Numeric Query to XBRL Reports

Q: What is the FY2015 unadjusted EBITDA margin for Netflix? Calculate unadjusted EBITDA using unadjusted operating income and D&A.

A: 0.054

Below are examples of XBRL-related questions:

XBRL Example

Fig. 3 XBRL Example

Dataset

The XBRL benchmark dataset is used to evaluate the ability of LLMs to interpret the XBRL standards, retrieve data in XBRL filings, and answer financial math questions. This benchmark dataset comprises XBRL terms, domain queries to XBRL reports, numeric queries to XBRL reports, tag queries to XBRL reports, financial math questions, and financial ratio formula with XBRL tags.

The questions for XBRL terms is to evaluate LLMs’ ability to explain an XBRL term. Domain queries are questions about different domains, such as products and services, in XBRL reports. Numeric queries are questions asking LLMs to retrieve specific data from XBRL reports. Tag queries are questions asking LLMs to retrieve corresponding tags for an item from XBRL reports. Financial math questions asking LLMs to return the result of the given math problem. Financial ratio formula questions ask LLMs to return the calculation formula with corresponding tags for a given financial ratio.

Data statistics

Data

Size

XBRL reports

Data Source

XBRL Term

500

Not provided

XBRL Agent

Domain Query to XBRL Reports

50

Selectively provided

XBRL Agent

Financial Math

90

Not provided

XBRL Agent

Numeric Query to XBRL Reports

50

Selectively provided

XBRL Agent

XBRL Tag Query to XBRL Reports

50

Selectively provided

XBRL filings from SEC

FiNER: Financial Numeric Entity Recognition for XBRL Tagging

100

Selectively provided

FiNER-139 Dataset: https://huggingface.co/datasets/nlpaueb/finer-139, https://github.com/nlpaueb/finer

FNXL: Financial Numeric Extreme Labelling

100

Selectively provided

FNXL Dataset: https://huggingface.co/datasets/ChanceFocus/flare-fnxl, https://arxiv.org/abs/2306.03723

Total

940

Users can fine-tune or evaluate LLMs using this dataset, call additional tools, using additional open-source.

Metrics

We use accuracy for financial math questions, numeric queries to XBRL reports, tag queries to XBRL reports, and financial ratio formulas. We use FActScore for XBRL terms and domain queries to XBRL reports.

Baseline Performance

Model

Method

XBRL Term (FActScore)

Domain and Numeric Query to XBRL Reports (FActScore)

Financial Math (Accuracy)

Tag Query to XBRL Reports (Accuracy)

Score (Average)

Llama 3.1-8B

Zero-shot

0.7083

0.5845

0.7667

0.1667

0.5565

GPT-4o

Zero-shot

0.8503

0.5851

0.8842

0.7778

0.7743

Mistral Large 2

Zero-shot

0.8221

0.6831

0.7444

0.8667

0.7791

Reference

[1] Sewon Min et al. (2023). FactScore: Fine-grained atomic evaluation of factual precision in long-form text generation. arXiv preprint arXiv:2305.14251. Available at: https://arxiv.org/abs/2305.14251

Shijie Han, et al. XBRL-Agent: Leveraging Large Language Models for Financial Report Analysis. Proceedings of the Conference ICAIF ‘24: Proceedings of the 5th ACM International Conference on AI in Finance https://doi.org/10.1145/3677052.3698614.