An Intelligent System: What I learned through taking an introductory Wikidata course

Anne-Christine Hoff is an associate professor of English at Jarvis Christian University.

Back in January of this year, I took a three-week, six-hour introductory course on Wikidata through the nonprofit Wiki Education. Before the course’s start, I knew little to nothing about Wikidata, and I had several preconceived notions about the database and its uses before I began the course.

My first impression about Wikidata was that AI bots ran the system by sweeping Wikipedia pages and then used that information to create data sets under various pre-defined headings. In my conception, Wikidata’s information updated only when editors on Wikipedia changed or added pages. I thought of Wikidata as a closed system, and I thought the point of the course would be to learn how to run queries, so that we students could figure out how to access the data collected through Wikipedia. 

I remember asking my Wiki Education instructor about the role of AI in Wikidata, and he very pointedly responded that bots cannot program anything on their own. Instead, humans program Wikidata, and through this programming capability, both humans and machines can read and edit the system.

Anne-Christine Hoff
Anne-Christine Hoff
Image courtesy Anne-Christine Hoff, all rights reserved.

Wired writer Tom Simonite provided an example of this phenomenon in his article “Inside the Alexa Friendly World of Wikidata”:

“Some information is piped in automatically from other databases, as when biologists backed by the National Institutes of Health unleashed Wikidata bots to add details of all human and mouse genes and proteins.” 

This same article also discusses a further example, published in a paper by Amazon in 2018, of Wikidata teaching Alexa to recognize the pronunciation of song titles in different languages.

Both of these examples do a good job of illustrating another one of my misconceptions about Wikidata. As mentioned before, I thought the system was centralized and, apart from periodic updates, static. I did not conceive of the difference between data collected through documents (like Wikipedia) and a database with an open and flexible, relational communication system. 

What I discovered was vastly more interesting and complex than what I imagined. It was not a bot-driven data collecting system drawn from Wikipedia entries, but instead Wikidata was a communication system that can use multiple languages to add data. An editor in Beijing may enter information in Chinese, and that data will immediately be available in all the languages used by Wikidata. This feature allows for a self-structuring repository of data by users adding localized data from all over the world.

In 2013, Wikidata’s founder, Denny Vrandečić, wrote about the advantages that a database like Wikidata has over documents because “the information is stored centrally from where it can be accessed and reused independently and simultaneously by multiple websites without duplication.” In his article “The Rise of Wikidata,” Vrandečić made clear that Wikidata is not just a database for Wikipedia and other Wikimedia projects. It can also be used “for many different services and applications, from reusing identifiers to facilitate data integration, providing labels for multilingual maps and services, to intelligent agents answering queries and using background knowledge” (Vrandecic, 2013, p. 90). 

This raises the question as to how Wikidata intelligently reads the information stored on its platform. My first misconception had to do with my belief that Wikidata was a flat collection of data based on Wikipedia’s entries. What I didn’t understand is that the crux of Wikidata’s intelligence comes from its ability to understand data in a relational way. As noted in “Familiar Wikidata: The Case for Building a Data Source We Can Trust,” Wikidata’s semantic structure is based on rules, also known as Wikidata ontology. According to this ontology, a person may have a relationship to a “born in” place, but a place cannot have a “born in” relationship to other entities. For example, Marie Curie can be born in Warsaw, but Warsaw cannot be born in Marie Curie. 

This knowledge-based structure is the key to understanding how Wikidata’s identifiers are used to connect to one another. In Wikidata’s logical grammar, two entities connect to one another by a relationship, also known as a “triple.”  It is this triple structure that creates the structural metadata that allows for intelligent mapping.  A fourth item, a citation, turns each triple into a “quad.” The fourth item is crucial to Wikidata’s ability to further arrange the data relationally, by making clear where the data in the triple originates, then arranging the data hierarchically based on its number of citations. 

Having access to the Wiki Education dashboard, I was able to see the edits of the other students taking the class. One student whom I’ll call Miguel was adding missing information about Uruguayan writers on Biblioteca Nacional de Uruguay’s catalog. As of this writing, he has completed more than 500 edits on this and other subjects, such as the classification of the word “anathema” as a religious concept. Two Dutch archivists were adding material on Dutch puppet theater companies in Amsterdam and Dutch women in politics. An Irish student was updating information on a twelfth century Irish vellum manuscript and an English translation of the Old Irish Táin Bó Cúailnge by Thomas Kinsella. 

What I saw when I perused the subjects of edits was exactly what the article “Much more than a mere technology” mentions, that is, that Wikidata is capable of linking local metadata with a network of global metadata. This capability makes Wikidata an attractive option for libraries wanting to “improve the global reach and access of their unique and prominent collectors and scholars” (Tharani, 2021). 

Multiple sources contend that Wikidata is, in fact, a centralized storage database, and yet the intelligence of Wikidata makes this description ring hollow. It is not a database like the old databases for documents. Its ontological structure allows for it to understand the syntax of data and arrange that information relationally into comprehensible language. Like the example of the biologists from the National Institutes of Health who programmed bots who programmed Wikidata bots to add genetic details about humans, mice and proteins to external databases, it can also be programmed for uses on external databases. Its linking capabilities make it possible for librarians and archivists from around the world to connect their metadata to a network of global metadata. Its multilingual abilities have a similar decentralizing effect, allowing users to create structured knowledge about their own cultures, histories, and literature in their own languages. 

If you are interested in taking a Wikidata course, visit Wiki Education’s course offerings page to get started.


Explore the upcoming Wikidata Institute, Wikidata Salon, and other opportunities to engage with Wikidata at learn.wikiedu.org.

Categories

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.