The State of Wikidata and Cultural Heritage: 10 Years In

Wiki Education is hosting webinars all of October to celebrate Wikidata’s 10th birthday. Below is a summary of our first event. Watch Tuesday’s webinar in full on Youtube. Sign up for our next three events here.

Never before has the world had a tool like Wikidata. The semantic database behind Wikipedia and virtual assistants like Siri and Alexa is only ten years old this month, and yet with almost 100 million unique items, it’s the biggest open database ever. Wiki Education’s “Wikidata Will” Kent gathered key players in the Wikidataverse to reflect on the last ten years and set our sights on the next ten. Kelly Doyle, the Open Knowledge Coordinator for the Smithsonian Institution; Andrew Lih, Wikimedian at Large with Smithsonian Institution and Wikimedia strategist with the Metropolitan Museum of Art; and Lane Rasberry, Wikimedian in Residence at University of Virginia’s Data Science Institute discussed the “little database that could” (not so little anymore!).

Illustrated notes featuring our speakers by Dr. Jojo Karlin via Twitter. Rights reserved.

In our webinar (one of four this month celebrating Wikidata’s birthday), audience members joined us from libraries, universities, museums, galleries, and Wikimedia projects from all around the world. Kelly posed an important question to us: as knowledge professionals and stewards, what is our responsibility in building, curating, and tending to a database that reaches millions of people?

“We’ve really never had this opportunity,” said Andrew. “Folks from all different academic backgrounds, from different languages and cultures, can treat Wikidata’s taxonomy as a malleable lump of clay and try to converge on some version of consensus for how to model the world.” As the founder of Wikidata Denny Vrandecic and the Product Manager for Wikidata Lydia Pintscher have said, “Wikidata is an ontological playground.”

This playground is becoming more and more embedded in our online knowledge structures, connecting everyone to everything, everywhere. “Wikidata is the portal to the linked open web,” said Lane. “As soon as content gets into Wikidata, it reaches huge audiences around the world. Big tech companies index it. They start sending it in every direction. As does anyone else who wants access to a free and open database. Anyone can copy this stuff; anyone can recirculate it.”

Data science is a forming field, and it’s no different with Wikidata. As Andrew mentioned, it’s this malleability that makes the open repository so powerful. “If you get tapped into Wikidata, you get tapped into an ethical network,” Lane added. Even with its gaps and inaccuracies, there’s nothing else like it. “Who’s doing better at this?” Lane asked. “Who else has convened the global community to get together and have conversations about this? There is no ideal data set out there, but where are you going to find one better?”

Sure, we’re a long way to go from having the perfect repository. It will never exist, as Lane pointed out. But the radical beauty of Wikidata is how the community goes about striving for it anyway. As Will said, it’s the humanity inherent in Wikidata’s structure and culture that makes it different from other data repositories.

Even so, attempting to model the world through consensus is messy. “As anyone who dives into Wikidata knows, we’ve got a lot of inconsistencies, missing parts,” Andrew pointed out. “But boy, we’ve never had this opportunity before to try to do it collectively and collaboratively. When it works, it really works in ways that nothing else can. I think that’s one of the miracles of Wikidata.”

Becoming a Wikidata contributor enters you into this community that grapples with data ethics every day. The community, which spans countries and languages, discusses issues and precedents with transparency and openness. As problems appear, the community is designed to chew through them together. This is how Wikidata has come so far in a short time.

Will, our host, shared his own perspective as Wiki Education’s Wikidata Program Manager. Knowledge institutions, he suggested, are actually missing out if they’re not participating. “In my capacity, I teach a lot of courses and we work with a lot of professional institutions, and it might sound simplistic, but representation is huge. If you’re not on Wikidata, you can’t be linked to all these other things. So being more deliberate about what’s there versus what’s not is actually pretty radical. And being more thorough and accurate with all the data has a huge impact.” Knowledge institutions like the Met and MoMA consider it the authoritative place to disambiguate data. Their webpages feature Wikidata Q numbers now, rather than traditional powerhouses like Getty, because Wikidata is the biggest arts database out there. “The good news is that it wasn’t even hard to convince the Met of that,” said Andrew. “Now it’s just a matter of implementation.”

Kelly stressed that working on Wikidata is an efficient way for a single person or team to start a ripple effect in the informational stratosphere, especially since Wikidata is a semantic database. “For institutions like the Smithsonian or the Met who want to batch upload into Wikidata, that data can be read in over 200 languages with just one person doing the work,” Kelly shared. “Multilingual collaboration is real,” Andrew added. “It’s not just a theory. It actually happens with Wikidata. And it happens every second of every day.” “And it’s then impacting those language Wikipedias,” Kelly continued. “Especially in the gender gap space, where I primarily work, the question is why would we host an edit-a-thon if some of this content might be taken down or is not considered notable enough. We’re going to do all this research and it might not be able to stay on Wikipedia. But this is a great pivot to Wikidata because we can batch upload these lists of names and all of the biographical information behind it and have that in Wikidata because the notability threshold is lower. That’s really significant because we can then use what we put in Wikidata to build a case for later Wikipedia article creation.”

Wikidata and Wikipedia editing isn’t just beneficial to institutions. It’s a skillset that is becoming more relevant in the spheres of knowledge curation, creation, and archiving. “Wiki skills are professional skills,” Will chimed in. “For a lot of you attending this webinar, you do things that other people don’t do in your line of work. And that’s an asset.” One of Wiki Education’s goals with this Wikidata Speaker Series itself is to share innovative ways professionals are accomplishing their goals through Wikidata and hopefully inspire others to join this community as it influences more and more of the content people get on the internet.

And its impact will only continue to grow. “It’s important to remind folks how crucial Wikidata is to the fabric of knowledge now,” said Andrew. “Wikidata is being massively used in AI now for training, for trying to understand the world, for better or worse. It can be a little scary to think that they’re depending on Wikidata for the future of humanity. But it is the best assembly of human knowledge so far.”

So what’s next? The internet looks incredibly different now than it did ten years ago, and it will continue to adapt to meet people’s information needs. “Wikidata, Wikipedia, and the ecosystem gets a billion unique visitors a year,” Lane pointed out. “Big tech is doing some things in Wikidata. I’d like to counterbalance that with more museums and more universities getting involved.” That way, we can ensure a diverse group of experts will shape this ontological playground and share the best possible knowledge billions of times.

“When you add all of this together: all this attention on Wikidata, how Wikidata handles the social and ethical aspects of data, and all the data sets we can get from traditional and conventional resources, then you get absolute magic,” Lane continued. “You put all this in Wikidata, it mixes together, and you get new creative data, remixed data, things that would be unthinkable to create in any other way. It can only happen if you have everybody in the world, community representatives from all these institutions socializing in Wikipedia and Wikidata, remixing this, and then spreading it out. That’s the big cookie.”

Want to be a part of the big cookie? If you’d like Wikidata training, the Wikidata Institute has three upcoming training courses starting in November, January, and March. Consider also signing up for our next 3 webinar events celebrating Wikidata’s birthday all of October. 

Watch Tuesday’s webinar in full on our Youtube

Thumbnail image by Matt Britt CC BY 2.5, via Wikimedia Commons.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.