Overview

The financial industry operates within a complex environment of regulations and standards aimed at maintaining market integrity and ensuring reliable reporting and compliance. These regulations and standards exhibit several key challenges:

  • Complexity: Financial products and markets are highly sophisticated, demanding detailed and sometimes overlapping regulatory provisions.

  • Frequent Updates: Regulatory frameworks adapt continuously to reflect evolving market conditions, emerging technologies, and new risks.

  • Jurisdictional Differences: Countries and regions vary in their legal systems and economic contexts, adding further complications for multinational financial activities.

  • Specialized Terminology: Precision-oriented language can be difficult to interpret, requiring deep expertise to ensure accurate application.

Navigating these intricacies calls for domain-specific knowledge and the ability to interpret nuanced legal language. Large language models (LLMs), such as GPT-4o, Llama 3.1, and Mistral Large 2, have shown remarkable potential in natural language understanding and generation. However, these models face unique hurdles in the financial domain, including handling specialized regulatory language, staying current with rapidly evolving standards, and addressing concerns related to explainability and ethics.

To help researchers and practitioners address these challenges, we have developed a Financial Regulations dataset that covers a wide range of regulatory texts and tasks. This dataset underpins multiple initiatives, including:

  1. The COLING 2025 Regulations Challenge, featuring nine tasks aimed at pushing the limits of LLMs in interpreting and applying financial regulations.

  2. The Finrl Contest 2025, focusing on three core areas:

    • Common Domain Model (CDM)

    • Model Openness Framework (MOF)

    • eXtensible Business Reporting Language (XBRL)

While each event highlights different aspects of the dataset, our overarching goal remains to expand the community’s understanding of financial regulations and to foster the development of robust, transparent, and compliant AI-driven solutions. In this documentation, we provide insights into the dataset’s structure and guidelines for using it in research and real-world applications.

Using This Documentation

This documentation serves as a resource for anyone wishing to work with the Financial Regulations dataset—whether for academic exploration, industry applications, or challenge participation. Each section includes:

  • Task Description A detailed explanation of each task, including its objectives, scope, and real-world relevance to financial regulations. This section provides background information on the regulatory challenge the task addresses.

  • Dataset Detailed information on the sources, formats, and scope of the regulatory texts, including any relevant annotations or metadata.

  • Metrics Instructions and best practices for evaluation, covering key performance measures relevant to financial regulation—such as accuracy, BertScore, FactScore, and more.

We encourage researchers, industry professionals, and students to explore these resources, adapt them for innovative projects, and contribute to a more efficient, trustworthy, and adaptive landscape in financial compliance.