Workflow

In order to encourage the community to develop LLMs that effectively handle multimodal data and tackle the issues of hallucinations and information, participants will follow this structured workflow:

Use Scenarios: We begin with the use scenarios that reflect professional financial regulatory activities including but not limited to legal research, validation, and compliance.
Design Tasks: By decomposing the above use scenarios, we divide them into capabilities and define corresponding tasks. In this stage, there are nine novel tasks:
- Basic Capabilities
  - Abbreviation Recognition
  - Definition Recognition
  - Named Entity Recognition (NER)
  - Question Answering
  - Link Retrieval
- Passing Certificates
- Understanding the Common Domain Model (CDM)
- Understanding the Model Openness Framework (MOF)
- XBRL Filings
Identify Failure Patterns: After we evaluate LLMs, we will identify failure patterns.
Expand Testing Datasets: The above failure patterns instruct the expansion of testing sets: including questions for observed weaknesses. We create testing questions from multimodal data collected from various data sources, such as legal texts, contracts, and financial statements.
Enhance Tasks: The newly expanded question sets are integrated into the tasks. We will also integrate the new tasks and question sets into the Open FinLLM Leaderboard.

Curation Workflow

The following figure illustrates the curation flow for our testing sets:

This workflow operates as a continuous improvement loop, ensuring that tasks and test sets remain relevant, challenging, and aligned with real-world financial applications.