Model Openness Framework (MOF)
Task Description
The Model Openness Framework (MOF) is a comprehensive system for evaluating and classifying the completeness and openness of machine learning models, with a strong focus on licensing requirements. By defining 17 distinct components across a model’s lifecycle, MOF promotes transparency, reproducibility, and responsibility in open-source machine learning. Each MOF component has specific licensing criteria to ensure that contributors uphold intellectual property rights while enabling open collaboration.
Fig. 2 MOF Components
In this task, we test the ability of large language models (LLMs) to accurately answer licensing-related questions under MOF. The task comprises three subtasks:
Subtasks
License Abbreviations - Expand or interpret short license abbreviations.
License OSI Approval - Determine whether a given license is approved by the Open Source Initiative (OSI) or not.
Question Answering - Provide detailed answers about licensing requirements in MOF.
Input Format
Each subtask will present a primary request in the form of:
Primary Request: A concise query about licensing in MOF, such as:
“Expand the following MOF-related abbreviation into its full form: AFL-3.0”
“Is the Microsoft Shared Source License OSI-approved?”
“What type of license is the Apache License, Version 2.0?”
Content: If necessary, relevant reference text (e.g., snippets from the OSI website, MOF documentation). However, many prompts may not include separate content if the request is self-contained.
Output Format
The model should output:
Concise Answer: A short, direct response that addresses the primary request.
Optional Explanation (for Q&A subtask): An additional sentence or two that justifies the answer, referencing MOF or OSI guidelines if needed.
Example Input-Output
Subtask 1: License Abbreviations
Input
"Expand the abbreviation: BSD-3-Clause"
Expected Output
"BSD 3-Clause License"
Subtask 2: License OSI Approval
Input
"Is the Apache License 2.0 OSI-approved?"
Expected Output
"Yes"
Subtask 3: Question Answering
Input
"Which licenses are recommended for model parameters under the Model Openness Framework?"
Expected Output
"MOF recommends OSI-approved licenses (e.g., MIT License or Apache License 2.0)
for model parameters to ensure community collaboration and openness."
Dataset
The dataset covers real-world and simulated licensing scenarios under MOF:
License Abbreviations (41 items) Expand or interpret short license labels (e.g.,AAL ).
License OSI Approval (50 items) Determine whether each license is officially approved by the Open Source Initiative.
Question Answering (70 items) Answer more in-depth questions about licensing requirements for specific MOF components (e.g., model parameters, datasets, or source code).
Data |
Size |
Data Source |
|---|---|---|
License Abbreviations |
41 |
OSI website |
License OSI Approval |
50 |
OSI website |
Question Answering |
70 |
OSI website, MOF docs |
Total |
161 |
Data Usage
Users can:
Fine-tune or evaluate LLMs using this dataset.
Combine OSI references and MOF documentation for improved factual accuracy.
Enhance the model’s capability to address specialized licensing questions by including additional open-source knowledge bases, if desired.
Metrics
Two metrics are used to evaluate performance:
Accuracy For the License Abbreviations and License OSI Approval subtasks, the model’s output is compared to a ground-truth label. Accuracy is computed as:
\[\text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}}\]FActScore[1] For the Question Answering subtask, FActScore measures factual precision. A higher FActScore indicates that the model’s answers align more closely with MOF guidelines and reference materials, while a lower score suggests factual inaccuracy.
Below are evaluation results for three baseline models across the three subtasks in MOF Licenses.
Baseline Model |
Average Score |
License Abbreviations (Accuracy) |
License OSI Approval (Accuracy) |
Detailed QA (FActScore) |
|---|---|---|---|---|
Llama 3.1-8B |
0.5149 |
0.1290 |
0.7200 |
0.6956 |
GPT-4o |
0.6564 |
0.1935 |
0.9600 |
0.8156 |
Mistral Large 2 |
0.4640 |
0.1290 |
0.4400 |
0.8229 |
These baseline results serve as benchmarks for evaluating new model submissions on the MOF Licenses task.
References
[1] Sewon Min et al. (2023). FactScore: Fine-grained atomic evaluation of factual precision in long-form text generation. arXiv preprint arXiv:2305.14251. Available at: https://arxiv.org/abs/2305.14251
Note
For additional details on MOF’s 17 components and specific licensing criteria, see White et al. (2024), Model Openness Framework (MOF), as cited in competition documentation or MOF official docs.