Elevating diversity in academic research software development

One of the best ways to learn about Wikidata is through examples — examples of property usage, query examples, and examples of well-modeled items. When we started our Wikidata program back in 2019, there were far fewer items — and even fewer well-developed items. Even though the early examples of well-developed items are technically excellent, they are not diverse or representative of the world. In the original version of our training slides, we replicated the well-used examples from the Wikidata community. Now, we’ve updated our training slide examples and the items we use in our outreach to emphasize diversity and represent a broader community.

For this project we partnered with Lane Rasberry, Wikidatan-in-Residence at the University of Virginia (UVA), who had recently received funding from the Sloan Foundation to help bring more data about academic research software to Wikidata. Collaborating on this project will allow UVA to improve data on Scholia (a platform that displays academic profiles based on what’s in Wikidata) and it will help diversify the examples Wiki Education uses in outreach and training materials.

Will Kent showing updated examples in our Wikidata course

This project has been an excellent way of elevating some urgent and important research from individuals whose work has been underrepresented on Wikidata and beyond. This is a continuation of our commitment to equity across Wikimedia projects. We recognize that continued engagement is essential to affect community-wide changes. Our hope is these examples can lead to bigger changes and create space for more critical thought and engagement about representation on Wikidata.

We set out to identify academic research software contributors from North America who represent historically marginalized communities in this space: Native Americans, women, and people of the African diaspora. Wiki Education was ideally positioned to identify potential research software developers who fit these descriptions; through our Wikipedia Student Program and Scholars & Scientists Program, we are connected to thousands of academics across the United States and Canada. A few emails and video calls later, and we were able to develop a pool of individuals who met this specific criteria. The next step was to locate sources, publications, unique identifiers, and any other data we could use to update (or create) their items on Wikidata, enhance items related, and link their work to other entities on Wikidata.

One urgent research project happening right now came from Dr. Ben Frey, a linguistics professor who is part of the Eastern Band of Cherokees. He has been conducting research to preserve and teach the Cherokee language, which has around 2,500 speakers left at the time of publication. He has been working with large datasets of Cherokee and English sentences to improve machine learning with the Cherokee language. From there his hope is more instant translation can occur, as well as other uses. Before this work, he and the research software he’s developing didn’t have items on Wikidata. Not only does he have one now, but you can see a set of his publications here.

Another project we decided to focus on is Openscapes, led by Julia Stewart Lowndes. Openscapes endeavors to mentor researchers about open data practices (check out some of their cool work here). In working on these items, we performed merges, created new items, and linked to their Github repositories, which were previously unlinked. In developing these related items, Wikidata users will be able to discover this work through queries or by visiting Wikidata itself.

We also spent time working on several other researchers’ Wikidata items, items representing their research, and linking out to Github repositories, ORCID scholarly communication records, and additional identifier data. Although we can’t explain all of the edits here, the general idea is the same: having research better represented on Wikidata allows for better analysis of this data, more re-use of this data, and the opportunity to discover new insights about this data.  As with all of our examples, we are hopeful that drawing more attention to this kind of research will inspire others to think of work that should belong on Wikidata, but isn’t there yet.

We believe that using a diverse set of examples will draw more attention to the systemic bias that pervades Wikidata and the Wikidata community. Elevating the profile of these accomplished researchers (and their research) is just the beginning of what we need to do. We will continue to update these items, related items, and research papers to ensure the most information possible is on Wikidata. We hope our course participants and other members of the Wikidata community will also consider improving representation across all of Wikidata.

To learn more about Wikidata, follow this link and explore our courses.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.