This section introduces the core services of the Open Music Observatory. These services are described in the EMO Feasibility study, but we gave them a modern and subjective interpretation that relies more on the novel innovations of data regulation and data science. Our services fully embrace the EIF and EOSC (related) interoperability frameworks and reproducible research techniques that allow for a more timely, less costly, and higher quality data ingestion and processing than manual workflows. Almost all workflows are supported by open-source components, some of which came as the background of the Open Music Europe project, were developed by other projects, or are being developed in different tasks of this project. The introduction of the separate software components is not a subject of this report.
Our main services are related to the collection, processing and dissemination of data.
The Generic Statistical Business Process Model (GSBPM) is an international standard model that “describes and defines the set of business processes needed to produce official statistics.” (UNECE 2019) A conceptual reference or information model accompanies this business process model, the General Statistical Information Model (GSIM)(UNECE 2014). We use the conceptualisation of GSIM so that our results will be similar in quality to official statistics; of course, similar processes allow us to create products that combine well with official statistical products. The Open Music Observatory is not only collecting and disseminating statistically processed data, but also collections datasets, i.e., structured microdata. The GSBPM and GSIM cover both because statistical business processes rely on collection-like datasets, such as registers, codebooks, and metadata thesauri.
In this report of Task 5.1, we concentrate on the inner cycle of the business processes. The outer cycle, specification, design, and build are carried out in other work packages, as is most of the analysis phase. We provide further services, too, which could relate in GSBPM to the 9th Evaluate header, which allows for quality improvements.
The two implementation standards, DDI and SDMX will ensure that the Open Music Observatory can work together with statistical offices, or respectable social sciences data repositories, because both our business processes and the way we organise data is following their standards. The Data Documentation Initiatve (DDI), will ensure that we will remain compatible with official statistical microdata and metadata services and other social sciences archives, like GESIS, the official data archive of all European Commission-mandated survey research dating back over 50 years (Vardigan, Heus, and Thomas 2008). The application of SDMX ensures that our microdata and statistically processed data will be interoperable with official statistics of the UN, OECD, Eurostat, and national statistical services (Stahl and Staab 2018).
Our data improvements, go beyond improvements of statistical quality and application of GSBPM; we aim to fix and improve music industry datasets for rights management or digital curation.
The data enrichment and improvement are innovative solutions that are not part of the services of an open data portal or an observatory. We aim to offer these value added services to create new value and therefore motivation for music industry data owners to work with the observatory.
“Fix-the-data” means improving the data quality by finding or imputing missing values or finding and replacing erroneous data entries. In terms of metadata, adding further machine-actionable information to already existing datasets can improve their usability.
“Data linking”, data fusion, or data matching means correctly joining data from different datasets (data sources.) We ensure that data coming from sources can be meaningfully joined together; the variables have consistent meanings, the codebooks applied are harmonised; the timeframe or geographical frame is consistent.
Aggregation services: we turn your music-related datasets into statistical products or data publications. We clean, validate, and structure it to a format that it can be placed on the EU Open Data Portal, Europeana, or Wikibase for integration with Wikidata/Wikipedia.
Confidential data sharing: our data sharing space can be used for confidential data sharing and cross-pollination (for example, looking up missing ISWC/ISRC identifiers or misspelt names in each other’s datasets) without making the data public.
These planned services will be discussed in Chapter 7.
3.1 Collect: Data Curation & Collection
Data curation is the organisation and integration of data collected from various sources. It involves annotation, publication and presentation of the data so that the value of the data is maintained over time, and the data remains available for reuse and preservation.
Data can only be understood with the broader concepts of information and knowledge, because data in itself is unprocessed, raw knowledge, that cannot be understood. The EMO feasibility study intuitively defines data gaps without an apparent reference to a data or conceptual model, but it recognises and stresses the need for terminological harmonisation.
Note
Four types of data-collection principles have been identified as essential both by various branches of the music sector and also by policymakers at European, national and local levels:
The data-collection service provided by a European Music Observatory should help in mapping, understanding and analysing the main characteristics, trends and idiosyncrasies of the music sector in Europe;
The data collected should be neutral and available to decision-makers, music sector operators, and the public;
The data itself should cover the activities of the music sector across the entire European Union, be comparable between Member States, and rely on identified and stable indicators;
In short, we collect data about music, as defined in the cultural statistics of any European Economic Area and EU candidate statistical office or by a representative European or international music organisation.
In more detail, we a systematic data collection program requires a conceptualisation is an abstract, simplified view of some selected part of the world, containing the objects, concepts, and other entities that are presumed of interest for some particular purpose and the relationships between them.
Note
Usually, when we record information about a musical work, we do not make a copy of the entire work but record some identifying properties of the work, for example, the name of its author and the name (i.e., the title), its unique ISWC identifier, and the data or registration. We work with a concept of a musical work, not with the entire work.
Composers as human beings are represented by their names, IP Names or ISNI identifiers, and date of birth and death. Again, in an information system we obviously work with a concept of an author, and instances of authors represented by their unique data.
The EMO feasibility study catalogues 45 data gaps that a future European music observatory should fill. A data gap can only be formally defined and filled with some reference to conceptual models of the world. A typical data problem plaguing the music sector is the amount of computer and human work needed to connect musical works and their recorded fixation, and eventually, the composers, producers, and performers linked to these objects for royalty payment. We need agreed concepts of the composer, sound recording, work, to answer such questions.
The initial data collection guidelines of the Open Music Observatory are derived from the EMO Feasibility study. We see them as a starting point for further discussion with the Observatory Stakeholder Network. We introduce them with our data catalogue in Section 4.1. These guidelines are supported by our first conceptualisation, which is built on some widely used conceptualisations of creative works and statistics. This is the topic of Chapter 6.
3.1.1 Microdata, Collections, Records
We treat “microdata” as a collection of structured data. A collection is a group of objects, for example, musical works, sound recordings, music enterprises, musician biographies, gathered together for some intellectual, artistic, or curatorial purpose. This is how radio playlists and charts, festival line-ups, local content guideline monitoring works. Such collections form the basis of census or sample surveys for statistical data collection.
Note
Our first large collection is the Slovak Comprehensive Music Database. Our aim is to publish a constantly refreshed database of every music composed or recorded in the territory of the current Slovak Republic, or composed and recorded by people from Slovakia, or sung in the Slovak language. This database will be based on a collection of musical works and a connected collection of sound recordings and sheets as manifestations of those works.
The Data Documentation Initiative is originating for the world of social sciences data archives and more and more in use in statistical organisations for the documentation of microdata. The DDI plays a particularly important role in the creation of statistical surveys, particularly using questionnaires and question banks. The new Records in Context has replaced the international standards on archives in 2023. Its central concept is the record, which is a document according to DDI; a collection is a set of records. Our standardisation of microdata is explained in more detail in Chapter 6.
3.1.2 Primary data collection
The Open Music Observatory is supporting high-quality primary data collection, and itself is carrying out such collection activities.
The indicators derived from the processing of survey questionnaires will be comparable if the same concepts of interest (for example, concert visiting frequencies) are measured via the same questions and answering instructions.
Note
A concert is a standard concept of a live performance of music.
How many times in the previous [12 months] have you been to a concert? is a standard question accompanied by standardised answer options and processing in the Cultural Access and Participation surveys following the ICET model.
Using standardised concepts and question banks, including question and instruction labels with standardised translations, is a cornerstone of ex-ante survey harmonisation. This process is a prerequisite for retrospective survey harmonisation and the subsequent creation of comparable statistical indicators, underscoring the importance of uniformity in data collection.
3.1.3 Metadata
The most common—and perhaps least useful—definition of metadata is that it is “data about data.” As catchy as this definition is, however, it is entirely ambiguous. First of all, what is data? And second, what does “about” mean? (Pomerantz 2015, p19)
The new ISO standard on Information technology — Metadata registries (MDR) defines metadata as data that define and describe other data. As Pomerantz eloquently argues, this is a definition that is not very helpful. We use his more functional (but not contradictory) definition. “Data is only potential information, raw and unprocessed, prior to anyone actually being informed by it. […] Data must be understood not as an abstract concept but as objects that are potentially informative. […] Metadata Is a Statement about a Potentially Informative Object.” (Pomerantz 2015, p26)
Following the metadata definition of “a statement about a potentially informative object,” we believe that any high-quality data can be used as metadata in certain circumstances.
Note
Data or metadata?
The data of birth can be seen as a metadata for disambiguation among authors with the exact same name in a copyright register. It can be seen as data for a curator of a young author prize, or a music sociologist. Either way, the date of birth should be precise, and encoded in a way that makes it portable and interoperable.
From a data management point of view, we do not distinguish between data and metadata. Of course, we acknowledge the fact that some types of data will always remain under the hood and will only serve the proper functioning of an information system.
The music industry’s famous “metadata problems” usually arise when a music enterprise or institution wants to use metadata information from an authoritative source that is somehow corrupted. The Open Music Observatory can help with these metadata problems by disseminating proper, open authoritative data (as registers or collection) or by providing data improvement services that fix the metadata problems of a user.
3.1.4 Statistical indicators and datasets
Warning
We will place our first statistical datasets to the EU Open Data Portal this week (pending their approvals) and will provide a screenshot and access conditions here.
3.2 Process
We use the theory of metadata by Jeffrey Pomerantz, who defines Metadata as “a statement about a potentially informative object.” A dataset without such statements is not findable, accessible, interoperable, and very hard to reuse. Pomerantz distinguishes among descriptive, administrative, structural, preservation, and use metadata. The Generic Statistical Information Model (GSIM) is a common abstract representation of data objects manipulated in official statistical production and elaborated as an overarching model for implementation metadata standards such as SDMX or DDI.
GSIM since its inception aims to bridge two important standards, SDMX and DDI. The Statistical Data and Metadata Exchange has been developed for decades and it is an ISO standard; it is more geared towards the aims of data sharing and preservation in RDM. DDI on the other hand is more focused on the documentation and quality control of primary data collection, or the reuse of often messy data sources, and supports the processes that make the data available for research. As DDI provides information about a much wider range of objects and processes, we are even more selective when we turn to this standard than SDMX; however, we cannot disregard DDI for microdata.
3.2.1 Processing & re-processing microdata
Warning
We will place here an example that goes to the EU Open Data portal
The EU Open Data Portal uses the following namespace definitions; these definitions refer to machine readable, explicit definitions (ontologies) of the way our datasets must be understood by a software agent. To demistify the process, we provide here an example of the metadata that we need to compile from the various steps of the data production pipeline.
First we must translate the metadata of our datasets to any of the standard serialisations (file formats) of the World Wide Web Consortium’s Resource Description Framework definition, which allows the connection of data across the open internet. At the time of writing this report, the EU Open Data Portal was changing its backend, and for testing purposes, we worked with a dataset from the background of the Open Music Europe project (which had been earlier published by Reprex on Zenodo under the title *The turnover of the ration broadcasting industry in Europe*. )
The dataset itself cannot be downloaded from a data catalogue. It is an abstract intellectual work, similar to musical work or a literary work. A musical work is accessible in printed sheets or recordings, and a dataset in a distributed data file.
<https://doi.org/10.5281/zenodo.5652118><a>"dcat:Dataset" ;<dcat:distribution><https://zenodo.org/records/5652118/files/codebook_trb.csv>, "https://zenodo.org/records/5652118/files/codebook_trb.csv" ;<dct:creator><https://orcid.org/0000-0001-7513-6760> ;<dct:description>"\"The turnover of the ration broadcasting industry in Europe.\"@en" ;<dct:identifier><https://doi.org/10.5281/zenodo.5652118> ;<dct:issued>"2022-06-03T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;<dct:modified>"2022-06-04T00:00:00Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;<dct:publisher><https://isni.org/isni/000000050973936X> ;<dct:title>"A rádió szektor forgalma Európában\"@hu", "\"Turnover of the Radio Broadcasting Industry in Europe\"@en" ;<edp:originalLanguage><rdf:resource><http://publications.europa.eu/resource/authority/language/ENG> .
We can provide further provenance information about the dataset; in production, we will provide information on software agents (tools) used, researchers, data managers and curators and their organisations involved. As a bare minimum, we provide machine-readable information about the technical publisher of the dataset, Reprex B.V:
And then we point the user the downloadable files (distributions) of the dataset with the rights statements and licenses. We use the Creative Commons CC BY 4.0 license, similar to Eurostat on the EU Open Data Portal, and we state that the dataset is open for the public.
We will provide the link and screenshot of the documentation for each file that goes public.
3.3 Disseminate
3.3.1 EU Open Data Portal
The portal is a central point of access to European open data from international, European Union, national, regional, local and geodata portals. It consolidates the former EU Open Data Portal and the European Data Portal.
The portal is intended to:
give access and foster the reuse of European open data among citizens, business and organisations.
promote and support the release of more and better-quality metadata and data by the EU’s institutions, businesses, agencies and other bodies, and European countries, enhancing the transparency of European administrations.
educate citizens and organisations about the opportunities that arise from the availability of open data.
It is funded by the EU and managed operationally by the Publications Office of the European Union in cooperation with the Directorate-General for Communications Networks, Content and Technology of the European Commission, responsible for EU open data policy.
We publish our data primarily on the EU open data portal for statistically processed datasets (datasets that contain the generalised characteristics of many data subjects without personal data that could identify them).
3.3.2 Europeana and the European Collaborative Cloud for Cultural Heritage
Europeana is at the heart of the common European data space for cultural heritage, a flagship initiative of the European Union to support the digital transformation of the cultural heritage sector. Millions of cultural heritage items from over 3,500 data providers across Europe are available online via the Europeana website. We work to share and promote this heritage so that it can be used and enjoyed by educators and researchers, creatives and culture lovers across the world.
While there is no agreed, cross-sectoral definition of “collections”, it is widely understood that in many cases, collections themselves are the entities that meet the information needs of music professionals or researchers (Wickett et al. 2013). The creation of collections is an important activity performed by music professionals and scholars as part of their work process. For example, if we want to measure how many European or French works made it ever to an American hitlist, we have to contrast two collections (French works, and the collection of works that were ever on the particular hitlist) to calculate this indicator.
The publishing policies of Europeana are restrictive, and therefore, there currently needs to be more musical works on this important open knowledge graph. After consultation with Europeana Sound, the British Library-based aggregator responsible for the music in the European collection, we decided to pursue two ways to make more extensive European music collections visible.
We will publish collections with publicly viewable audiovisual material on Europeana via the Open Music Observatory’s collections. We will also start a discussion with Europeana’s new project, the European Collaborative Cloud for Cultural Heritage, which has less restrictive data licensing policies, about a more extensive dissemination point for our collection datasets.
The ambition of the European Open Science Cloud (EOSC) is to provide European researchers, innovators, companies and citizens with a federated and open multi-disciplinary environment where they can publish, find and reuse data, tools and services for research, innovation and educational purposes. Naturally we want to ensure that users of the Open Music Observatory participate in these cloud services, either as research providers or as research users. The EOSC is recognised by the Council of the European Union among the 20 actions of the policy agenda 2022-2024 of the European Research Area (ERA) with the specific objective to deepen open science practices in Europe. It is also recognised as the “science, research and innovation data space” which will be fully articulated with the other sectoral data spaces. Given that the OMO is also following a data space architecture that is desigend to follow the European Interoperability Framework, the Open Music Observatory can be federated with, and can fully work with the EOSC.
The Open Music Observatory is connecting to the EOSC via two key services of OpenAIRE. OpenAIRE itself is a Non-Profit Partnership of 50 organisations, established in 2018 as a legal entity, OpenAIRE A.M.K.E, to ensure a permanent open scholarly communication infrastructure to support European research, and it is a key implementer of the European Open Science Cloud. Or connection to OpenAIRE services guarantee our full compliance and use of EOSC.
A key partner of OpenAIRE is CERN, which is manages the Zenodo open library and repository. We rely on the services of Zenodo for document identification, long-term archiving, and offering immediate access to our statistical datasets, visualisations, reports and other library-ready products. Similarly to the forming EU Open Research Repository, a Zenodo-community dedicated to fostering open science and enhancing the visibility and accessibility of research outputs funded by the European Union, managed by CERN on behalf of the European Commission, we have created a similar Open Music Observatory Repository on the platform. Our repository is fully interoperable with the EU Open Research Repository (in pilot phase in June 2024) and with the entire EOSC.
The semantic service of OpenAIRE, the OpenAIRE Graph is a collection of interlinked research objects that aggregates metadata records from more than 70K scholarly communication sources from all over the world for researchers, service providers, research managers and policy makers, by following a participatory approach.
3.4 Metadata
Europeana publishes the metadata in Turtle serialisation. The Open Music Observatory will provide access to the metadata in TTL and CSV distributions.
3.4.1 Wikibase Cloud
Our main dissemination point for microdata is openmusic.wikibase.cloud, available both from our website and via the main Wikibase Cloud website. Wikibase Cloud is an initiative of Wikipedia and Wikidata to bring more specialised data into the Wikimedia ecosystem.
Wikibase is a software system that help the collaborative management of knowledge in a central repository. It was originally developed for the management of Wikidata, but it is available now for the creation of private, or public-private partnership knowledge graphs. It was developed by Wikimedia Deutschland.
Wikidata itself is a gigantic Wikibase instance. Their user interface is similar, but depending on what the administrator of your Wikibase instance allows you to do, you are likely to have more freedom to edit certain elements, like properties, than on Wikidata. Wikidata must protect the integrity of one of the world’s largest knowledge systems, and does not allow editing access to certain elements.
Because of the success of Wikidata, several EU projects and institutions started to use Wikibase, the software that runs Wikidata. They aim to reuse the software to construct institutional or cross-institutional, domain-specific knowledge graphs. Several factors make Wikibase attractive:
Our main dissemination point for non-statistical data is the Wikibase Cloud.
3.4.2 Music Observatory Website
Warning
We will completely revamp the website before submissin and provide here a short overview with screenshots.
3.4.3 API Endpoint
Warning
We will provide here a linked screenshot of our API endpoint before submission
European Commission, Directorate-General for Education, Youth, Sport and Culture, M Clarke, P Vroonhof, J Snijders, A Le Gall, B Jacquemet, et al. 2020. Feasibility Study for the Establishment of a European Music Observatory : Final Report. Publications Office of the European Union. https://doi.org/10.2766/9691.
Stahl, Reinhold, and Patricia Staab. 2018. Measuring the Data Universe: Data Integration Using Statistical Data and Metadata Exchange. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-76989-9.
Vardigan, Mary, Pascal Heus, and Wendy Thomas. 2008. “Data Documentation Initiative: Toward a Standard for the Social Sciences.”International Journal of Digital Curation 3 (1): 107–13.
Wickett, Karen M., Antoine Isaac, Katrina S. Fenlon, Martin Doerr, Carlo Meghini, Carole L. Palmer, and Jacob Jett. 2013. “Modeling Cultural Collections for Digital Aggregation and Exchange Environments.”CIRSS Technical Report 201310-1, October. https://hdl.handle.net/2142/45860.