7  Data Improvement & Innovation

The music sector was one of the early adopters of digitisation and is a highly data-driven sector of the economy. Because it relies on data, business and public policy problems often accompany data problems. In the previous section, we have shown how we aim to increase the data available for the sector. Now, we focus on improving the data’s quality and usability.

7.1 Value-Added Data Services

7.1.1 Data Sharing

“Data sharing” means securely sharing data among parties who do not want to expose their data to third parties or protect the personal data in the datasets. Agreeing and organising data sharing legally, semantically, syntactically, and technically can be challenging. This is the role of the Open Music Dataspace behind our observatory.

The data sharing infrastructure behind our dataspace is the Reprexbase system, which is an extension of the Wikibase system using various open-source (and, in a few cases, non-open-source) software components to connect the Wikibase system with music sector databases and data sources.

Given the sensitive nature of the data we handle, including business confidential and GDPR-protected data, we maintain strict segregation. Data batches from stakeholders are kept in separate instances and are integrated only after thorough review by the data protection officer and curatorial team, ensuring the highest level of data security.

In our Slovak prototype, we keep SOZA’s data in an insulated instance because the copyright management organisation must not release GDPR-protected and business-confidential information. Some of the data needed for NERD operations or the establishment of the Slovak Comprehensive Music Database is then sent to a joint instance concerning those data subjects (i.e. authors or their heirs) who agree with our data handling. This is where they meet public catalogue and database data from public libraries, open knowledge graphs, and the Slovak Music Center.

The data that should be made public is then further exported to Wikibase Cloud, where it becomes public and available for all stakeholders. From Wikibase, it is also synchronised with Wikidata, the world’s largest open knowledge graph.

7.1.2 Fix-the-data

“Fix-the-data” means improving the data quality by finding or imputing missing values or finding and replacing erroneous data entries. In terms of metadata, adding further machine-actionable information to already existing datasets can improve their usability.

Tip

Our fix-the-data services do not increase the size of the data available to our partners, but it increases the quality of their datasets or databases.

7.1.3 Data Linking

“Data linking”, data fusion, or data matching means correctly joining data from different datasets (data sources.) Many fix-the-data problems initially arise from imperfect data linking, for example, mistakes in currency rates, units of measures, coding of geographical entities, misplaced decimal delimiters on the level of data, or misunderstandings of the meaning of “artist income” or “popularity score”, or other non-self evident variables. An even more subtle problem is joining data from two questionnaire surveys created with different sampling algorithms and different standard (measurement) errors.

Tip

Data linking or data fusion is a way to join many small databases into a large, federated dataset. This way, relatively small music organisatiosn can benefit from access to big data.

7.1.4 Registration services

In Open Music Europe, other tasks deal with the policy problem plaguing the music industry: even though it needs access to an exceptionally high number of registers (due to the fragmentation of the copyright and several neighbouring rights), access to such registers is limited or impossible. Often, the registers carry legacy problems that make them less functional in trustworthy data and AI systems.

“A register aims to be a complete list of the objects in a specific group of objects or population.” (Anders and Britt 2007). Statistical data collection and rights management are just two service areas whose workflows depend on well-functioning and accessible registers. The statistical business register is an essential tool for creating survey frames or sample frames, in other words, to organise statistical data collection. A copyright or neighbouring right register is necessary to organise royalty collection.

Note

A statistical register is necessary to decide who should get a data request:

  • For a sample survey, the register is used to draw a lottery of population members who will be invited to provide data.

  • In a census-type survey, all registered members of the population, for example, all music labels, will receive an invitation to an interview or form.

  • In the case of a register-based survey, all members of the register, for example, all collective management societies in the territory, will be requested to send data directly from their databases.

In other work packages of the Open Music Europe project, we are experimenting with statistical data coordination among the music sector and statistical authorities. Without recalling the details here, as digitisation exponentially increases the amount of structured data in the private sector, it is a growing trend in statistical innovation to rely on data held by the private sector to make more granular or timely official statistics. For consumer spending statistics, costly and imprecise surveys of randomly selected citizens putting their purchases in a diary, some statistical authorities directly process data from cash registers or credit card spending. We envision a similar statistical collaboration among statistical offices and collective rights management organisations because it is easier to report music royalty accounts than to ask musicians to talk about their complex income streams in interviews or on questionnaires.

We see the role of the Open Music Observatory in providing a methodology and digital data infrastructure for such statistical collaboration. In other work package tasks, SOZA and Reprex will create so-called satellite business registers to harmonise the data collection of the observatory with the Slovak statistical authority. More about this work: (Antal 2023)

Such services, similar to data linking and some new services that will build on the data and the data API of the Open Music Observatory, rely on the provision of technical services for registration. The Open Music Observatory has its register, too.

Registration is a costly data service with vast economies of scale, so providing more affordable registration services for the European music sector could be an important service.

7.2 Use Cases

In 2023 the Open Music Europe project applied for the Module A of the Horizon Results Booster (HRB) provided by Trust-IT Services↗. The HRB aims to provide a tangible contribution to the dissemination of results and recommendations of research projects related to the European Commission Priority areas.

app

7.2.1 Data Health Services for Collective Management

Entity linking and data linking are among the biggest technical problems in rights management. Because music authors, producers, and performers have three royalty streams and do not share an interoperable registry, the connection of musical works (compositions, ideally identified by an ISWC code), their sound recording manifestations (identified on all digital services with and ISRC code), and the various identifiers of performers require costly manual and technical identification.

There are numerous projects underway in the music industry to resolve this problem going forward. In the United Kingdom, PRS’s Nexus programme↗ is developing a solution with the provisioning of preliminary ISWC registration to keep the recording and composition connected from the birth of a new recording.

The Open Music Europe project, on the other hand, is pioneering a different route for already existing sound recordings, with the linking of public sector catalogues of heritage and library collections with rights management information; particularly with relying on the VIAF shared authority files. SOZA and Reprex are expected to present their MVP on the CISAC Good Governance seminar in December 2025.

Modern registers typically assign a unique identifier, known as a URI, to their data subjects (our registered objects). A ‘Cool URI’, which resembles a URL, offers a practical advantage. When used as a URL, it generates a human-readable HTML file about the registered person or object. This can be particularly useful when processed by a graph application, as it provides crucial information about this person or object in a machine-readable (XML, JSON, TTL, or NQUAD) file.

For example, the VIAF identifier number 89006617 can be placed into the http://viaf.org/viaf/89006617 URL, which provides as access to the cataloging information of works created by, or written about the great etnomusicologist and modern composer, Béla Bartók.

Bartók, Béla, 1881-1945. National Diet Library, JapanNational Library of CataloniaLibrary and Archives CanadaNational Library and Archives of QuébecISNINational Library of PolandNational Library of EstoniaNational Library of IrelandNUK/COBISS.SI, SloveniaNational Library of KoreaNational Library of IsraelNational Library of AustraliaNational Library of NorwayNUKAT Center of Warsaw University LibraryNational Library of PortugalInternational Inventory of Musical Sources (RISM)Library of Congress/NACONational Library of FranceNational Library of SpainNational Library of LithuaniaNational Library of SwedenGerman National LibrarySudoc [ABES], FranceBIBSYSNational and University Library of Iceland (NULI)National Library of BrazilNational Széchényi Library, HungaryNational Library of LatviaNational Library of the NetherlandsNational Library of ChileNational Library of the Czech Republic Bartók, Béla Repertoire International de Litterature MusicaleWikidataNational and University Library in ZagrebDBC (Danish Bibliographic Center)

Modern platforms, such as Spotify, use similar identifiers. For example, the Spotify Artist ID 2fIUlieTjLTaNQUIKHX5B8 resolves to Celeste Buckingham’s available recordings on the platform via the URL https://open.spotify.com/artist/2fIUlieTjLTaNQUIKHX5B8.

The problem is that music creators are often present on more than 200 digital platforms, each of which has its identifier policy and requires the repeated import of the artists’, works’, and recordings’ data. To consistently report such metadata is costly and complex, even for major labels and publishers with a dedicated IT system. No wonder we saw before our project in our own Feasibility study that more than 50% of artist data needed fixing on digital platforms.

Relying on many local identifiers on otherwise interconnected computer systems will always create a costly and error-prone data exchange. Unfortunately, the music industry has never agreed to use genuinely open, high-quality registers. These changes were made during the period of our project. For example, large platforms like Apple, Spotify, and some collective rights management organisations started using the ISO-standard name identifier (ISNI) to avoid the high prevalence of multiple same-name persons and musical groups. This transition is yet to begin, and it is incomplete, so the music sector will likely need to invest large IT resources into entity resolution in the next decade.

7.2.2 Sustainability Reporting for Music Organisations

The Music Innovation Hub and Reprex will develop a CSRD-compliant sustainability reporting tool in 2024-2025. The reporting tool aims to provide an accurate and affordable ESG reporting facility that follows the European ESRS standards for music enterprises that create their financial reports according to the simplified reporting rules allowed by member states for microenterprises.

More than 95% of European music enterprises (in some member states, this reaches 100%) apply simplified financial reporting. For such companies, there are no CSRD-compliant ESG reporting tools.

We identify the reason for this market failure as follows:

The MVP of this service was developed with a MusicAIRE microgrant, and it is the project’s background. A scale-up will be demonstrated with the use the Open Music Observatory’s open data API.

7.2.3 Listen Local

The Feasibility Study On Promoting Slovak Music in Slovakia And Abroad is an important background of our project.

In 2020, with a microgrant from the Slovak Arts Council, we created a Feasibility Study and a demo application called Listen Local (Antal 2020). The study examined why the Spotify algorithm struggled to recommend Slovak music within Slovakia for Slovak people. We also created a demo application that modified the user’s Spotify recommendations to voluntarily comply with the local content guidelines applicable to local radio stations. The user could also listen to a lower or higher percentage of regional works.

Our critical finding was the very pool data coverage and quality of the Slovak repertoire, which is mainly sent to distribution without the professional assistance of a commercial music label. Self-releasing artists and micro labels do not have the necessary metadata know-how, IT and data specialists to prepare their new releases for algorithmic curation by recommender engines of digital streaming platforms, radio stations, or large festivals.

Our conceptual demo application was able to make recommendations on voluntarily meeting the local content guidelines, but it was only supported by a relatively small Slovak Demo Music Database, and could only work with Spotify, which has the most transparent and open API of all streaming providers licensed to the territory of the Slovak Republic.

We aim to develop applications to create a local content-aware public performance music stream.

7.2.4 Unlabel

Unlabel is a planned service aimed at self-releasing artists and micro labels that need a functional data/IT department. Therefore, they are at a disadvantage compared to significant independent and major releases because they usually need to meet the high documentation standards necessary for a successful digital distribution strategy and engagement with algorithmic curation of streaming-, radio-, or festival playlists.

Self-releasing artists and micro labels bring ill-documented new content to digital distributors like ALOADED. Digital distributors must maintain an arm’s length standard for all labels, small or large, independent or major. ALOADED or other distributors cannot cross-finance the data problems of self-releasing and micro-label artists from the client revenues of more prominent labels.

We identify the problem as a market failure and a technical failure:

Our planned “Unlabel” service will provide documentation and metadata improvement services for self-releasing artists. This service, similar to current white-label services, will strictly address market failures and not compete with label services. We aim to provide a necessary level of data consolidation and improvement so that these artists can have equal opportunities in digital distribution services.

The service will be connected to the Slovak Music Dataspace and its Slovak Comprehensive Music Database. We will provide a PPP business model for the onboarding and proper documentation of self-releasing artists on a large scale and the efficient, API-based provision of their digital distributor. Aloaded will provide the distribution services, Reprex will provide the data services, and SOZA and the Slovak Music Center will work out the details of minimal customer service for such labels.

7.3 Use of AI systems

For the entity linking, related to our planned value added Section 7.2.1, we are planning to use in the future AI algorithms, particularly inference engines. The main goal of the system is to help matching correctly named entities, particularly rightsholders, musical works and recordings. The system is not yet in place. An adequate description will be provided for overview and will be brought to the attention of the Ethics Advisor during the upcoming meeting of the Ethics Board.

We cannot provide a full risk assessment because the service is not planned in detail yet. However, our preliminary risk assessment suggests low levels of risk, partly, because we plan to deploy AI in music/culture, which as a domain not seen as a high-risk area by the European regulation, and partly, because our system will not autonomous, will retain human-in-control, and will not influence the decisions or anyhow engage with end-users.

We were conscious of the potential risk involved, and both the control structure and the data governance were planned over the course of 10 months.

We do not consider that the system has wider risks or negative impacts. The algorithm is designed to cure sources of data biases that result in a late or missed payment for some rightsholders.