logo
ResearchBunny Logo
FAIR EVA: Bringing institutional multidisciplinary repositories into the FAIR picture

Interdisciplinary Studies

FAIR EVA: Bringing institutional multidisciplinary repositories into the FAIR picture

F. A. Gómez and I. Bernal

Discover FAIR EVA, a groundbreaking tool crafted by Fernando Aguilar Gómez and Isabel Bernal, designed to enhance the FAIRness of digital objects in diverse institutional repositories. This innovative solution adapts to specific repository needs, paving the way for improved data practices within the European Open Science Cloud.... show more
Introduction

The paper addresses the challenge of evaluating and improving the FAIRness (Findable, Accessible, Interoperable, Reusable) of digital research objects in institutional and multidisciplinary repository contexts. While FAIR principles are widely promoted by funders and within the European Open Science Cloud (EOSC) ecosystem, existing indicators (e.g., RDA FAIR Data Maturity Model) are high-level and variably interpreted, and many tools are either domain-specific or too general to give actionable results for multidisciplinary repositories. The study presents FAIR EVA, a tool designed to provide automatic, scalable, and customizable FAIRness assessments tailored to repository software (e.g., DSpace) and institutional contexts, offering feedback for repository administrators and data producers. The importance lies in enabling reproducibility, transparency, and Open Science uptake across diverse disciplines, addressing the growth and dynamism of research data, and supporting stakeholders (researchers, administrators, funders, and developers) with evidence-based assessments and actionable guidance.

Literature Review

The paper situates FAIR EVA within a landscape of FAIR principles, indicators, and tools. It builds on the RDA FAIR Data Maturity Model (41 indicators across F, A, I, R with essential/important/useful priorities) and references prior efforts like FAIRmetrics and FAIR Evaluation Services. It discusses limitations of existing tools (often domain-specific or overly general) and the complexity of cross-disciplinary FAIR adoption in institutional repositories. Comparative analyses (RDA landscaping, EOSC-Synergy D3.5, and a comprehensive comparison of automated FAIRness evaluators) highlight divergences in coverage, transparency, scoring, and emphasis (e.g., F-UJI strong on Reusability; FAIR Evaluator broader on Interoperability). A synthesized comparison shows varying principle coverage among tools (e.g., FEVA, F-UJI, FAIR Evaluator, F.Enough, FAIR Checker) and calls for better interoperability, standardization, and transparency. EOSC FAIR Metrics and Data Quality Task Force notes that tools can produce divergent results from differing interpretations and metadata retrieval, recommending consistent metadata exposure (e.g., Signposting) and clear documentation. The paper underscores repository trust frameworks (TRUST principles) and next-generation repository features (Signposting, ResourceSync) as relevant enabling technologies.

Methodology

FAIR EVA implements the RDA FAIR Data Maturity Model indicators in a modular, scalable architecture tailored via plug-ins to specific repositories or data services. Key elements:

  • Indicator implementation and weighting: All 41 RDA indicators are translated into automatic tests. Default weights reflect RDA priorities: Essential x2, Important x1.5, Useful x1. Overall score Ts is computed as the weighted sum of indicator points over total weights. The system supports rebalancing weights per repository/discipline context.
  • Feedback and advisory function: For each indicator not scoring 100%, FAIR EVA provides targeted feedback distinguishing repository-level technical gaps (for admins/developers) from metadata/content incompleteness (for data producers), with links to guidance and training materials.
  • Architecture: Two Python-based layers: (1) Front-end web app (Flask) and OpenAPI-defined API (fair-api.yaml) enabling both human interaction and machine-actionable requests; (2) Back-end implementing indicators via class inheritance. Indicators are defined generically, then specialized via plug-ins for particular repositories. Components are containerized (Docker) for scalability, with CI via Jenkins pipelines.
  • Plug-in system: A generic OAI-PMH-based plug-in is provided; repository-specific plug-ins (e.g., DIGITAL.CSIC on DSpace CRIS v5.10) redefine data/metadata access and test implementations (e.g., specific metadata terms, PID fields, rights/license fields). Configuration via config.ini (e.g., identifier_term), fair-api.yaml (weights, methods), and localized feedback messages.
  • Workflows: Managers deploy via start.sh or Docker, connect the service to the repository API or database, configure metadata mappings and weights, and customize feedback. Users evaluate objects by providing a PID (DOI/Handle); FAIR EVA retrieves metadata (via DSpace API and landing page), runs tests, and returns an overall FAIR score plus per-principle and per-indicator details. Reports can be exported as PDF.
  • Institutional context: The pilot targets DIGITAL.CSIC, a multidisciplinary institutional repository using DataCite DOIs, Dublin Core/DataCite metadata, and OAI-PMH, with a strong Open Access/FAIR policy framework.
  • Semantics for transparency: FAIR EVA models tests, indicators, and their relationships to FAIR principles using semantic technologies (e.g., SKOS), enabling clearer documentation, comparability, and governance of evolving criteria. Temporary semantic descriptions of RDA indicators are included pending official semantic publication.
Key Findings
  • FAIR EVA effectively assesses FAIRness in multidisciplinary repositories and provides actionable guidance. In pilot tests on DIGITAL.CSIC:
    • Well-described dataset (handle 10261/244749) scored 99.26% overall (Findable 100%, Accessible 100%, Interoperable 100%, Reusable 97.06%). It exhibited best practices: PIDs, rich and qualified metadata, controlled vocabularies (e.g., Getty AAT, UNESCO), explicit relationships, and cross-links (e.g., Wikidata).
    • Minimally described dataset (handle 10261/172425) scored 72.06% overall, with strong repository-driven indicators but lower metadata-dependent dimensions (Interoperable ~49.03%, Reusable ~45.29%). It lacked references to related resources, controlled vocabularies, and rich qualified metadata.
  • The variance between datasets supports the assumption that repository infrastructure elevates F and A indicators, while I and R scores depend more on data creator practices (metadata richness, semantics, documentation).
  • Indicator priority distribution is unbalanced across principles (Table 3): Essential indicators are concentrated in F and A (7 and 8 respectively), fewer in R (5) and none Essential in I; this skews overall scores towards repository features and may underrepresent reuse-centric qualities.
  • Automatic, scalable deployment (Docker, API) and feedback integration facilitated administrator planning (e.g., enabling Signposting and ResourceSync) and researcher awareness/training, with observed improvements in datasets post-assessment.
Discussion

The findings demonstrate that FAIR EVA addresses the core challenge identified at the outset: providing meaningful, context-aware FAIRness assessments for institutional, multidisciplinary repositories. By combining generic RDA-based tests with repository-specific plug-ins and tailored feedback, the tool distinguishes between what repository administrators can improve (infrastructure, protocols, interoperability services) and what data creators must enhance (metadata richness, semantic annotations, licensing, documentation). The disparity between high F/A and lower I/R scores in minimally described datasets underscores the need for training and support to improve interoperability and reuse, aligning with the ultimate objective of FAIR. The observed skew in indicator priorities (more Essential indicators under F and A) affects overall scoring, potentially masking deficiencies in reuse-related aspects and suggesting that balanced weighting or expanded I/R checks would better reflect FAIR’s reuse goal. The semantic modeling of test implementations and relationships increases transparency and comparability across tools, addressing EOSC Task Force concerns about divergent results and metadata retrieval differences. Overall, FAIR EVA proves valuable to multiple stakeholders: administrators gain actionable insights to refine policies and infrastructure; data producers receive concrete, user-friendly guidance; funders obtain evidence-based assessments to gauge compliance; and software developers get cues for next-generation repository features.

Conclusion

FAIR EVA introduces a scalable, modular, and customizable approach to automated FAIRness assessment tailored to institutional and multidisciplinary repositories. Its plug-in architecture, weighted RDA indicator implementation, and advisory feedback bridge gaps between high-level FAIR principles and repository-specific realities, supporting both infrastructure improvements and better metadata practices. The pilot at DIGITAL.CSIC shows high effectiveness in identifying strengths and gaps across datasets and principles, enabling targeted actions (e.g., adoption of Signposting/ResourceSync, vocabulary integrations) and aiding researchers’ compliance with FAIR-related requirements. Future work includes enhancing checks for Interoperability and Reusability (e.g., replication/reproducibility practices, expanded controlled vocabularies and ontologies), factoring preservation and web accessibility considerations, tailoring discipline-specific feedback, strengthening semantic descriptions for transparency and tool comparability, and continued alignment with evolving repository software capabilities and EOSC initiatives.

Limitations
  • Indicator weighting and distribution: Concentration of Essential indicators under Findability and Accessibility and fewer under Interoperability/Reuse can inflate scores based on repository infrastructure, underrepresenting reuse-centric qualities.
  • Repository dependence: Scores may reflect repository capabilities rather than dataset-intrinsic quality; minimally described datasets can still score moderately due to infrastructure.
  • Mode and feature availability: In DIGITAL.CSIC, plug-in runs via API vs. direct DB access can cause transient test failures due to temporary unavailability of specific features, affecting scores.
  • Disciplinary variability and evolving standards: Differences in community vocabularies/ontologies and ongoing evolution of FAIR interpretations can limit cross-domain comparability without semantic alignment and standardized metadata exposure (e.g., Signposting).
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny