How “Join the open data movement: a beginner Wikidata course” changed our view and use of collection data

Digital Collections Associate Lisa Barrier and Digital Collections Manager Kathryn Gronsbell from Carnegie Hall explain what to expect when taking Wiki Education’s beginner’s Wikidata course and discuss their linked data plans for the future.

Introduction

Lisa Barrier and Kathryn Gronsbell.

We both started Join the open data movement: a beginner Wikidata course with limited Wikidata knowledge. While we understood the basic concept of Wikidata as an information source, we had minimal editing experience and even less familiarity of how Wikidata is created, maintained, and used. Throughout the course, we learned the technical skills needed to make edits, create items, and query items as well as the underlying concepts and community practices that best explain how Wikidata works and why it continues to expand. We went into the course expecting to receive an introduction to editing (and maybe querying) but came out of it with a new understanding of how to interpret, share, and grow Carnegie Hall’s archival collection and performance data.

Participant Background

Lisa (Digital Collections Associate, Carnegie Hall): I initially chose to take the course to become comfortable editing Wikidata items and to learn more about using and creating Wikidata queries. My main day-to-day activities at Carnegie Hall include: creating authority records in our performance history database for entities and venues related to collections; cataloging and monitoring asset metadata in the digital asset management system (DAMS); and working with members of other internal departments to successfully upload, tag, and find their assets. I hoped to be able to create new Wikidata items for underrepresented Carnegie Hall collection data as I did not know how to approach this seemingly overwhelming task. I did not understand the flexibility and community structure of Wikidata and thought that I would personally have to create perfect, complete items for each collection entity.

Prior to this Wiki Education course, I took an in-person Wikidata training course on creating items with data from the Metropolitan Museum of Art. The course introduced adding statements and references to newly created items, but primarily covered art-related data models and did not explain how to search for other properties and available statements. With the aid of this first course and the Wikidata understanding of my colleague Rob Hudson (Archives Manager at Carnegie Hall), the extent of my early experience with Wikidata included adding Carnegie Hall Agent IDs to items and referencing Wikidata Concept URIs in authority records created in the performance history database [which Rob set up to manifest in Carnegie Hall’s Linked Open Data (CH-LOD) as SKOS:exact match].

Kathryn (Digital Collections Manager, Carnegie Hall): I was excited to take this course with Lisa, and for the opportunity to learn more about Wikidata’s structure and communities. My aim was to apply that knowledge to help expand and improve Carnegie Hall’s performance history data as it relates to Digital Collections material. I’m responsible for Carnegie Hall’s Digital Collections – both the material and the digital asset management system (DAMS) where the collections are managed. We recently announced public access to a portion of the historic material at collections.carnegiehall.org and are excited to start modeling collection objects for inclusion in our linked open data.

Before starting this Wiki Education course, I had extremely limited experience in Wikidata. So little that my only contribution was done anonymously years ago – I corrected a statement that inaccurately assigned a person’s cause of death (P509) as a geographic location. I knew Wikidata grew significantly in past few years, and there was a lot of opportunity to participate. Lisa and our colleague Rob shared their experiences learning about and contributing to Wikidata, and I was excited to join in on the fun.

Course Takeaways

Our most significant takeaways were the exposure to community practices and how to find inspiration within existing data projects and groups. Our instructors, Will Kent and Ian Ramjohn, discussed the culture of page ownership and how there may be “primary” editors who will often engage on the Talk pages and lead decision-making about edits. On a meta-level, we were introduced to WikiProjects to explore where discussions take place and how to participate in those conversations. These projects are community run and can include goals, chats, data models, and vary in maturity and comprehensiveness. There are several WikiProjects related to archival collections, performing arts organizations, and other concepts that directly overlap with our daily work – we are excited to jump in to participate and maybe create our own. Class chats over Zoom touched upon active groups and recent conferences related to Wikidata, including: LD4 Linked Data for Libraries, WikidataCon, WikiCon North America, Wikimania, and the International Semantic Web Conference. We were introduced to showcase items (high-quality, well-developed examples) which enabled us to contribute more confidently to Wikidata. Benchmarks for data quality were a hot topic – we had an enlightening conversation about how bots contribute to Wikidata, understanding what role the bot may have, and identifying and correcting inaccuracies that may arise from automated data creation.

Along with the class discussion, we found the following technical skills fundamental to our use of Wikidata. We learned how to:

  • Edit items by adding statements, references, and identifiers (and best practices for doing so);
  • View an item’s change log and edit history;
  • Set up notifications on watch items;
  • Use suggested property lists and recoin to create more complete items;
  • Find and build queries with editable examples (and some great sample query lists);
  • Make edits, batch edits, and queries a little bit easier with tools such as Cradle, TABernacle, OpenRefine, and QuickStatements, and how to potentially use these tools to onboard new contributors.

We understood that increasing our experience with and exposure to Wikidata would help us plan for upcoming data projects. We can now work on continuing alignment between Wikidata and Carnegie Hall’s performance history data (CH-LOD) and create items for under-described or lesser known entities (including performers, composers, and artists) who may not be described in other datasets. We hope to engage with some of the existing WikiProjects around performing arts concepts and content, and potentially use WikiProjects as a space to model our public collection data.

Upcoming Data Projects at Carnegie Hall

The Carnegie Hall Archives is undertaking an exploratory project to understand how using Wikibase may help manage some of our data. Wikibase is the software that Wikidata runs on. Anyone can set up their own instance of a Wikibase to house their data. Thanks to this course, we have a better grasp on what we can query to pull in to a local Wikibase from Wikidata and have a better basis to understand what might be useful to contribute back out to Wikidata after a standalone Wikibase is established.

One collection that we think will most benefit from a Wikibase instance (and possible collaborative WikiProject) is our Tenants and Studios Collection. Currently, this collection data lives in a spreadsheet that lists individuals and groups that lived and/or worked in artist studios that no longer exist in Carnegie Hall’s current configuration. While this spreadsheet format is acceptable, we want to model the collection data semantically to increase usability and discover connections in the data that is not possible using a spreadsheet. Creating items for names and studios would allow more control over the structure and visibility of the information, as well as allow for us to capture spatial and temporal variances over the years that are not easily described or captured in flat or relational data structures. Now that we understand what data is useful to push to Wikidata and which information should be kept as a local resource, we can better create and edit items for research and reference purposes. We ultimately envision a Wikibase instance for the Tenants and Studios Collection as an opportunity to combine Carnegie Hall’s history with the data and stories of external resources and academics.

Upcoming data projects, like the one described above, will be under our newly established Carnegie Hall Data Lab. The Data Lab is a learning space for Carnegie Hall to expand our understanding of information innovation through experiments with linked open data, semantic technologies, and data-driven strategies that leverage the resources of the Carnegie Hall Archives. Having the experience and exposure we received in this Wiki Education course allowed us to more confidently initiate and participate in Data Lab experiments.

We are grateful to our classmates for their participation and willingness to share, and the guidance and insight from Will Kent and Ian Ramjohn throughout the course. Thank you Wiki Education!


Registration for our upcoming Wikidata courses is open! New to linked data? Join the open data movement in our beginner’s course. Have more experience with linked data or Wikidata? Sign up for our intermediate course that focuses on possible applications. Or visit data.wikiedu.org for more information.


Thumbnail image by Lmbarrier, CC BY-SA 4.0, via Wikimedia Commons.

Categories

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.