Home > Research > Publications & Outputs > CODE-ACCORD

Links

Text available via DOI:

View graph of relations

CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Published

Standard

CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking. / Hettiarachchi, Hansi; Dridi, Amna; Gaber, Mohamed Medhat et al.
In: Scientific Data, Vol. 12, No. 1, 170, 29.01.2025.

Research output: Contribution to Journal/MagazineJournal articlepeer-review

Harvard

Hettiarachchi, H, Dridi, A, Gaber, MM, Parsafard, P, Bocaneala, N, Breitenfelder, K, Costa, G, Hedblom, M, Juganaru-Mathieu, M, Mecharnia, T, Park, S, Tan, H, Tawil, A-RH & Vakaj, E 2025, 'CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking', Scientific Data, vol. 12, no. 1, 170. https://doi.org/10.1038/s41597-024-04320-x

APA

Hettiarachchi, H., Dridi, A., Gaber, M. M., Parsafard, P., Bocaneala, N., Breitenfelder, K., Costa, G., Hedblom, M., Juganaru-Mathieu, M., Mecharnia, T., Park, S., Tan, H., Tawil, A.-R. H., & Vakaj, E. (2025). CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking. Scientific Data, 12(1), Article 170. https://doi.org/10.1038/s41597-024-04320-x

Vancouver

Hettiarachchi H, Dridi A, Gaber MM, Parsafard P, Bocaneala N, Breitenfelder K et al. CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking. Scientific Data. 2025 Jan 29;12(1):170. doi: 10.1038/s41597-024-04320-x

Author

Hettiarachchi, Hansi ; Dridi, Amna ; Gaber, Mohamed Medhat et al. / CODE-ACCORD : A Corpus of building regulatory data for rule generation towards automatic compliance checking. In: Scientific Data. 2025 ; Vol. 12, No. 1.

Bibtex

@article{16d96fdf144d468ca7d62fdfa060b9b5,
title = "CODE-ACCORD: A Corpus of building regulatory data for rule generation towards automatic compliance checking",
abstract = "Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.",
author = "Hansi Hettiarachchi and Amna Dridi and Gaber, {Mohamed Medhat} and Pouyan Parsafard and Nicoleta Bocaneala and Katja Breitenfelder and Gon{\c c}al Costa and Maria Hedblom and Mihaela Juganaru-Mathieu and Thamer Mecharnia and Sumee Park and He Tan and Tawil, {Abdel-Rahman H.} and Edlira Vakaj",
year = "2025",
month = jan,
day = "29",
doi = "10.1038/s41597-024-04320-x",
language = "English",
volume = "12",
journal = "Scientific Data",
issn = "2052-4463",
publisher = "Nature Publishing Group",
number = "1",

}

RIS

TY - JOUR

T1 - CODE-ACCORD

T2 - A Corpus of building regulatory data for rule generation towards automatic compliance checking

AU - Hettiarachchi, Hansi

AU - Dridi, Amna

AU - Gaber, Mohamed Medhat

AU - Parsafard, Pouyan

AU - Bocaneala, Nicoleta

AU - Breitenfelder, Katja

AU - Costa, Gonçal

AU - Hedblom, Maria

AU - Juganaru-Mathieu, Mihaela

AU - Mecharnia, Thamer

AU - Park, Sumee

AU - Tan, He

AU - Tawil, Abdel-Rahman H.

AU - Vakaj, Edlira

PY - 2025/1/29

Y1 - 2025/1/29

N2 - Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.

AB - Automatic Compliance Checking (ACC) within the Architecture, Engineering, and Construction (AEC) sector necessitates automating the interpretation of building regulations to achieve its full potential. Converting textual rules into machine-readable formats is challenging due to the complexities of natural language and the scarcity of resources for advanced Machine Learning (ML). Addressing these challenges, we introduce CODE-ACCORD, a dataset of 862 sentences from the building regulations of England and Finland. Only the self-contained sentences, which express complete rules without needing additional context, were considered as they are essential for ACC. Each sentence was manually annotated with entities and relations by a team of 12 annotators to facilitate machine-readable rule generation, followed by careful curation to ensure accuracy. The final dataset comprises 4,297 entities and 4,329 relations across various categories, serving as a robust ground truth. CODE-ACCORD supports a range of ML and Natural Language Processing (NLP) tasks, including text classification, entity recognition, and relation extraction. It enables applying recent trends, such as deep neural networks and large language models, to ACC.

U2 - 10.1038/s41597-024-04320-x

DO - 10.1038/s41597-024-04320-x

M3 - Journal article

VL - 12

JO - Scientific Data

JF - Scientific Data

SN - 2052-4463

IS - 1

M1 - 170

ER -