Gina Solares, MLIS, is a Librarian at the University of San Francisco. Her work is focused on cataloging and metadata projects such as special collections cataloging and data cleanup.
I recently had the opportunity to participate in the March 2021 Wikidata Institute, a Wiki Education course designed to increase engagement with Wikidata. The course introduces participants to Wikidata structure, practice, and community. Participants are encouraged to evaluate, edit, and create items in Wikidata, participate in the Wikidata community, and develop projects related to their area of interest.
I started editing Wikidata in 2020, my interest sparked by participation in the Program for Cooperative Cataloging’s Wikidata Pilot, a library-centric initiative to explore Wikidata as an identity management hub. So before taking the Wikidata Institute course, I had some basic knowledge of items and statements in Wikidata. I was drawn to the course for many reasons, but one of the goals of this course especially resonated with me: “Our course can help you: … Develop an equitable and inclusive model for linked data.”
So as the course progressed and I learned more about items, data models, WikiProjects, and querying Wikidata, I kept thinking about what it meant to develop equitable and inclusive models for linked data. Inspired by the work of WikiProject Black Lives Matter, I wanted to look at how Wikidata items represented events connected to the Black Lives Matter movement.
Finding items to edit
I started by looking at the Wikidata item that represents Black Lives Matter (Q19600530). I noticed that the founded by (P112) statements in the Black Lives Matter item did not include Alicia Garza (Q19609542), Patrisse Khan–Cullors (Q20090524), or Opal Tometi (Q19885291) as founders — so I added them, provided references, and marked them as having a preferred rank. I could see the effect of this change in the graph view of this item, which now highlights the three primary founders.
After making that small edit, I wanted to explore a larger set of items related to Black Lives Matter. One of the tools that we discussed in the course was Scholia, “a service that creates visual scholarly profiles for topic, people, organizations, species, chemicals, etc using bibliographic and other information in Wikidata.” As an academic librarian, I am interested in metadata for scholarly resources, so I searched for the topic of Black Lives Matter on Scholia. The only items and authors returned in the Scholia display were related to the Wikidata item for the BREATHE Act (Q99372635). There were scholarly articles related to Black Lives Matter, but the queries on the Scholia page were looking for a main subject (P921) statement so they weren’t included in the display.
So, I added main subject (P921) statements to article items and used a tool called Author Disambiguator link author and article items in Wikidata. The Scholia display for Topic: Black Lives Matter now shows 153 scholarly articles, a co-occurring topics graph, 119 venues publishing works about the topic, and other views of the data drawn from Wikidata items.
After seeing how these small edits enhanced the relationships between Wikidata items and more fully represented the contributions of participants and scholars, I was eager to find more ways to contribute.
Item evaluation and looking for the gaps
In 2017, Marlon Twyman, Brian C. Keegan, and Aaron Shaw conducted a study to “analyze participation and attention to topics connected with the Black Lives Matter movement in the English language version of Wikipedia between 2014 and 2016.” They published their findings in the ACM Conference on Computer Supported Cooperative Work and Social Computing as ‘Black lives matter in Wikipedia: Collaboration and collective memory around online Social movements.’ In that article, they wrote that “social computing systems increasingly function as spaces of collective knowledge production, sense-making, and commemoration.”
The authors analyzed Wikipedia articles that described seminal events in the Black Lives Matter movement, selected from the Wikipedia Template:Black_Lives_Matter:
Shooting of Michael Brown
Shooting of Trayvon Martin
Shooting of Oscar Grant
Charleston church shooting
Black Lives Matter
Death of Eric Garner
Death of Freddie Gray
2015 Baltimore protests
Death of Sandra Bland
Their article sparked my curiosity about the corresponding Wikidata items. So I made a list of the 10 Wikidata items that correspond to the author’s list and added 11 Wikidata items representing significant events since 2016. Then I looked at the number of statements and sitelinks related to each Wikidata item. I also used the recently released “Item Quality Evaluator” tool to find the Weighted ORES Score for each item. The ORES score ranks item quality based on “the number of statements, the number of labels in different languages and the percentage of referenced statements.”
I expected that items representing the most well-known or earliest events would have the most statements. However, I noticed that the Wikidata items representing the killings of Jacob Blake and Daunte Wright have more statements than the items representing the earlier killings of Ahmuad Arbery and Sandra Bland.
While looking at these Wikidata items, I also noticed the variance in the words used in the English label field and the values selected for the instance of (P31) property.
The words that we use to describe people and events matters. Those words shape our ways of thinking and can impact our behavior and actions. Wikipedia talk pages often contain conversations and debate about the language used to describe these events. Stephen Harrison summarized some of this debate in his June 9, 2020 article for Slate entitled ‘How Wikipedia Became a Battleground for Racial Justice: Contributors are rethinking what Wikipedia’s commitment to neutrality actually means.’
The conversations and consensus around language on Wikipedia has a direct effect on Wikidata item labels. The Wikidata help page for Labels states: “In many cases, the best label for an item will either be the title of the corresponding page on a Wikimedia project or a variation of that title.” In the short list above, 18 of the Wikidata item labels are identical to their corresponding Wikipedia article titles.
While the Wikidata label is a free text field, the instance of (P31) statement requires another Wikidata item as a value. The instance of (P31) value can be seen as a method of categorizing Wikidata items, and putting them within a particular context of relationships. The instance of (P31) values from the short list above raises some questions:
Why is the Killing of Rayshard Brooks (Q96263484) represented as an instance of an occurrence (Q1190554) and a death (Q4) but the killing of Atatiana Jefferson (Q71046962) is represented as an instance of a homicide (Q149086) and a shooting (Q2252077)?
Why is the 2014 Ferguson unrest (Q17560945) represented as an instance of a civil disorder (Q686984) while the Kenosha unrest (Q98645805) is represented as an instance of a riot (Q124757) and a protest (Q273120)?
This variance makes querying and retrieving reliable results from Wikidata more complex. In its current state, it is more complicated to construct a query that would retrieve the Wikidata items representing the killings of Rayshard Brooks and Atatiana Jefferson, despite some similar characteristics of those events. This is due partially to the fact that Killing of Rayshard Brooks (Q96263484) only has 6 statements, but the differences in instance of (P31) statements also contributes to the problem.
Shortly after the killing of George Floyd, there was a conversation on Wikidata:Project chat about how to represent the fact of his death in the Wikidata item for George Floyd (Q95677819). Wikidata users noted the difficulty of querying Wikidata to retrieve a list of people “who were killed by police.” That conversation led to a June 2020 property proposal for context of death with these examples of potential use:
|Example 1||George Floyd (Q95677819) → US police violence (Q96442197)|
|Example 2||Tamir Rice (Q57339968) → US police violence (Q96442197)|
As of April 2021, that property proposal is still marked as “Under discussion” with 4 Support, 2 Strong Oppose, 2 Neutral, and 5 Comments. There are some suggestions about potential alternatives, but seemingly no consensus on how this concept should be modeled.
So what’s next?
Through this brief analysis, I can see many opportunities to contribute to Wikidata. Increasing the number of referenced statements in some of the above listed items would be a good first step. Thanks to the Wikidata Institute, I now know about Wikidata’s Project Chat and the process for proposing new Wikidata properties — elements of Wikidata that do offer the possibility of contributing to an “equitable and inclusive model for linked data.” I look forward to doing this work in the Wikidata community and taking what I’ve learned from working in Wikidata back into my library cataloging and metadata work.
Twyman, M., Keegan, B. C., & Shaw, A. (2017). Black lives matter in Wikipedia: Collaboration and collective memory around online Social movements. In CSCW 2017 – Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (pp. 1400-1412). Association for Computing Machinery. https://doi.org/10.1145/2998181.2998232
Harrison, S. (June 9, 2020) How Wikipedia Became a Battleground for Racial Justice: Contributors are rethinking what Wikipedia’s commitment to neutrality actually means. Slate. https://slate.com/technology/2020/06/wikipedia-george-floyd-neutrality.html
With thanks to the University of San Francisco Faculty Association for the funding to attend the Wikidata Institute.