SOCIAL SCIENCES & HUMANITIES / DISTRIBUTED

CLARIN ERIC

Common Language Resources and Technology Infrastructure
General Info
headquarters

CLARIN ERIC

Utrecht, The Netherlands

legal status
type

distributed

access

virtual

description
The Common Language Resources and Technology Infrastructure (CLARIN) is a digital infrastructure which provides easy and sustainable access to a broad range of language data and tools to support research in the humanities and social sciences, and beyond. CLARIN provides access to digital language data in all modalities (text, audio, video) and advanced tools which can be used to analyse or combine these datasets. Founded in 2012, CLARIN is an European Research Infrastructure Consortium (ERIC), an international legal entity established by the European Commission in 2009. In 2016, CLARIN received the status of a Landmark on the ESFRI roadmap. CLARIN is a distributed digital infrastructure, with participating centres all over Europe and further afield, which include universities, research centres, libraries and public archives. Tools and data from different centres are interoperable, so that data collections can be combined and tools from different sources can be chained to perform operations at different levels of complexity. Members can access all tools and resources with a single sign-on, and many of the resources are also open access for other interested communities of use, both within and outside of academia. Promoting data registries and data management services that comply with the FAIR principles (Findable, Accessible, Interoperable, Reusable) underpins all aspects of CLARIN’s strategy, and the interoperability paradigm of what is now known as the Open Science agenda has been one of CLARIN’s distinguishing features from the outset. The interoperability of data and services across the CLARIN community has enabled large-scale data sharing and growing reuse of language resources. Interoperability has also proven crucial for the increased support of multidisciplinary collaboration and comparative research agendas. It is CLARIN’s ambition to consolidate its role in supporting the emerging research agendas for the SSH domain and to contribute to the innovation potential of the advanced models for interaction between people, data, and tools for data processing. The vision of borderless and seamless interoperability between data and services is further realised through CLARIN’s alignment with emerging cloud platforms such as the European Open Science Cloud (EOSC) and the SSH Open Marketplace. CLARIN’s core community consists of academic researchers, developers and lecturers from a range of disciplines within the social sciences and humanities, who work with language data and language resources, technology, and knowledge. CLARIN also cooperates with a variety of stakeholders from outside of academia, including industry, governmental organisations, and the GLAM sector (Galleries, Libraries, Archives, and Museums) in the role of contributors as well as users of data, tools and know-how. The collaboration with non-academic parties is forged both at the national level and central level.
TIMELINE & ESTIMATED COSTS
Total Investment 154 M€ Design 0 M€ Preparation 4 M€ Implementation 150 M€ Operation 16 M€/year Project Landmark 2006 2008 2010 2012 2014 2016 2018 2020 2022 2024 2026 2028 2030 2032 2034 2036 2038 RM06 RM08 RM10 RM16 RM18 RM21 LA24
Roadmap Entry
as project: 2006
as landmark: 2016
Total investment
154 M€
Design Phase
0 M€
Preparation Phase
2008-2011
4 M€
Implementation Phase
2011-2023
150 M€
Operation start
2012
16 M€/year
IMPACTS
In terms of scientific excellence and innovation, the resources, technology, and knowledge offered by CLARIN have a large potential for impact. In line with the Open Science Agenda, CLARIN stimulates scholars in the SSH – including the digital humanities – to reuse available research data, increasing their productivity, and allows scholars to open new research avenues within and across disciplines that address the multiple societal roles of language. Thanks to its digital infrastructure and training activities, CLARIN also promotes data-driven research and helps to increase data processing and analysis skills among new generations of SSH students, teachers, and data specialists. More broadly, with CLARIN’s embedding in the very centre of the data science and Artificial Intelligence communities, CLARIN is also an enabler of research to address societal challenges. These include contributing to full digital language equality in Europe, facilitating the development of aids for people with disabilities, as well as enabling the rapid development of communication technologies in the event of national disasters or sociopolitical crises. CLARIN’s potential for societal impact is also reinforced by the growing attention to support measures for language equality, and by use cases and proofs of concept for CLARIN tools in non-academic contexts. In the recently started Horizon Europe (HEU) framework, multidisciplinary agendas aimed at cultural inclusivity are expected to play an important role. CLARIN’s actual and potential economic impact is underlined by the invitation to join the European Language Data Space (LDS), as well as the involvement of many national CLARIN consortia in the recently established (February 2024) ALT EDIC. The infastructure’s resources are key components for the realisation of the mission to improve European competitiveness, increase the availability of European language data and uphold Europe’s linguistic diversity and cultural richness.
SERVICES
CLARIN ERIC offers the following key services: The Virtual Language Observatory (VLO) discovery service comes with a facet browser that allows users to find resources that are available in one of the many repositories for which the harmonised metadata have been harvested by CLARIN. The depositing service allows researchers to archive and share the resources they created in a safe and sustainable manner, if necessary in a protected way. The Language Resource Switchboard helps users to select tools that match the characteristics of the specific datasets they want to process. The Virtual Collection Registry is a platform for the management, publication and reuse of digital bookmarks which allows citing and processing of a set of links with a single persistent identifier. The Federated Content Search allows seamless searching within multiple corpus search engines simultaneously. Data Citation mechanisms that enhance the visibility of resource creators and stimulate them to publish their data. Licensing: CLARIN centres make data available through licensing and clear conditions for use. Guidance is offered to creators of data in order for them to select the most appropriate licensing conditions when publishing their data. Advanced tools and computing facilities. CLARIN offers state-of-the-art tools and online services for many languages. Service Provider Federation that offers ‘single sign-on’ for protected online resources. Researchers gain time, need only use one access code, and get access to protected resources in other countries. CLARIN also serves as an ecosystem for knowledge exchange. Users can contact individual data centres for the depositing of resources and the use of metadata standards and persistent identifiers, they can refer to their national coordinators and other representatives for specific information on support actions, standards, and legal issues, and they can make use of a distributed network of Knowledge Centres (K-centres).
Interconnections
CLARIN ERIC
D I G I T E N E E N V H & F P S E
COOPERATION WITH OTHER RIs
In the broader RI ecosystem, CLARIN is well positioned in the ESFRI cluster of Social and Cultural Innovation and the ERIC Forum. It also has a steering role in the Science Clusters, especially in the SSHOC cluster (particularly strong partnerships with DARIAH, CESSDA and EHRI), and is actively contributing to the European Open Science Cloud (EOSC). CLARIN maintains structural relationships with all SCI RIs, such as DARIAH, CESSDA and EHRI. Collaborations with other ESFRI RIs (such as SoBigData, to name but one) are frequent, especially within the context of EOSC-related projects. CLARIN collaborates especially closely with DARIAH, both at the national as well as the central level: As of 1 September 2023, there were ten joint CLARIN-DARIAH national consortia, often under the name of CLARIAH. At the central level, the main joint development effort was on the DH Course Registry (with DARIAH) and the SSH Open Marketplace (with DARIAH and CESSDA). Recently, CLARIN and EHRI have started a collaboration around the digitisation and processing of written and oral testimonies. Other strategic alliances that help to shape CLARIN’s Open Science agenda are: (i) membership of organisations such as DataCite, EPIC and EUDAT, and (ii) collaboration with European initiatives in the GLAM sector, such as Europeana, LIBER, TimeMachine. Collaboration with global data infrastructures in the Research Data Alliance (RDA) will be continued. The existing contacts with OpenAIRE, EGI, GEANT and EUDAT-CDI are carefully maintained.