New Data Curators Wanted
The Digital Music Observatory is looking for new data curators
data curator is a contributor in our open collaboration who will be named as a co-creator of tidy, standardized, reusable, FAIR, datasets in his/her field of expertise. Our curators help us vocalize the needs of their domain, be it data-driven beekeeping, or detecting algorithmic biases of recommender systems, and evaluates if the data that we come up with is directly usable and actionable. A data curator is a similar co-author as a “contributor” to open source software or a co-author of a journal article.
Table of Contents
Boost your career without a conflict of interest
Being a data curator does not mean a commercial affiliation with any observatory partners, it is an affiliation to jointly create intellectual property. All our data curators are identified by their ORCiD ideas and named as co-creators in the open science repositories where we make our data available.
We create CC0 data that can be used for commercial, academic, and policy purposes. However, we want to honor the intellectual investment into a shared intellectual property by
- delaying the release (for remaining competitive in academic publishing, if our curator is using the data in new articles; or NGOs for their campaign)
- creating hybrid assets for commercial users where some elements, particularly the ones that use their proprietary data, may not become open data.
FAIR: Findable, Accessible, Interoperable, and Reusable Digital Assets
Our observatories do not only work with open data.
We gladly add commercially available data to our observatory if we can share a large enough subset that our peer-reviewers can attest to the data’s high quality, usability, and actionability.
How to become a data curator?
- This is an open book that we co-create on GitHub, and if you find any roadblocks, you do not understand something, or have a better idea on how to illustrate or explain things, just make a for to this repo, improve it, add new photos, and send us a pull request. (You need an invite first for editing!)
- Here is a starter repository on GitHub. Not mandatory, but if you use GitHub, start here.
In a nutshell:
- Please read the entire covenant here.
- We need a very brief biography. Name, affiliation, education details, one-line and short biography. Please, send back this bio_template.txt text file. If you know markdown, use this version. The files are identical, but your word processor may not know how to open an .md file.
- Your ORCiD to resolve ambiguity with similarly named people. You may use different library or publication service IDs, such as Google Scholar, Publeon, etc, you may provide them, too, but we do need an ORCiD ID, because most of the EU open science infrastructure and the R ecosystem uses this one. If you do not have it, please create one—it only takes a few minutes. Please add it to the bio_template.txt.
- Your LinkedIn ID, add it to the bio_template.txt.
- You should follow our file naming conventions, and avoid the use
of special characters in any file names at all times:
' tickor backtick.
- You must send a ile picture that is at least 500px wide (jpg or png format.) It can be bigger, and preferably not a very “narrow” cut, as all avatars will be behind a circular mask (see other curators.)
Find inspiration from other contributors
- We Want Machine Learning Algorithms to Learn More About Slovak Music
- Credibility is Enhanced Through Cross Links Between Different Data from Different Domains
- Open Data is Like Gold in the Mud Below the Chilly Waves of Mountain Rivers
- Educate and Train Data Admirers that Data is not
- Developing an Open API is the Right Direction
- Comparing Data to Oil is a Cliché: Crude Oil Has to Go Through a Number of Steps and Pipes Before it Becomes Useful
- We Need More Reliable Datasets on the Urban Heat Resilience and Disaster Risk Reduction
Why data observatories?
data observatories(platform products) cover our R&D and platform costs while giving us access to an expanding range of prime clients. We use 21-st century open-source data engineering solutions, a decentralized data governance method, and web 3.0 technologies to avoid conflicts of interest and prevent the data Sisyphus of error-prone human data wrangling. There is little competition on this service level (there are about 60 UN/EU/OECD recognized data observatories, and almost all of them are managed by a different operator.) This layer is already monetized, and we have proven success. Our unique advantage is a combination of legal and technological skills: understanding legally open data, web 3.0, and data modeling, and the ability to participate in the open-source statistical /scientific software creator community.
open-source software applicationsthat fuel our data observatories with unprocessed, open, linked data. We create software for the R statistical environment, which is used in both official statistics and in many business and academic organizations. The production of R software components is a competitive field, but we believe that our position is strong: the vast majority of R packages are lightly or not at all serviced because of the lack of financing.
bespoke analytics solutionsto our institutional partners in our data observatories. Such bespoke solutions iterate over our existing software components, helping us design better applications within an ever-expanding ecosystem. Providing tailored data-science services would require a large organization without a clear focus. We provide these services on an ad-hoc basis only among institutional partners and users of our data observatories. In these circles, which are often prime clients, we face little or no competition because we are trusted partners and data and solution providers. This is a key to our revenue and market growth.
We develop high-value
software-as-service applicationsthat leverage our data observatory assets and our software solution into a novel, commercially valuable uses. Our applications are built around our family of open-source software and generalize our bespoke analytics solutions. We are in a late prototype phase where we already have some revenue and are trying to prepare for scaling up at the correct price with three of our applications. All of our applications are entering into highly competitive market segments. We are building on our ‘unfair’ advantage that we are bundling our solutions with data that is not accessible to competitors, and we can test them in the protected ecosystems of our observatories.
Good to know
- FAIR Principles: improve the Findability, Accessibility, Interoperability, and Reuse of digital assets.
- DataCite: A persistent, standardized approach to access, identification, sharing, and re-use of datasets—this is our favored way of describing data for future use according to the FAIR principles. Many EU open science repositories will ask your publications with this documentation.
- Biblatex is a standard text file used by citation engines, bibliography management tool, and in scientific publication templates. (See for example the Overleaf Biblatex tutorial.
- Dublin Core is an older international standard than DataCite, but the two standards greatly overlap. Dublin Core was originally developed by libraries. You often may need to fill out Dublin Core properties for publication.
Watch Our 2-min Introduction
⚙️/ Subtitles/ 🇳🇱 🇬🇧 🇧🇦 🇨🇿 🇭🇺 🇩🇪 🇱🇹 🇫🇷 🇸🇰 🇪🇸 🇹🇷 + Catalan.