Wiki Education is a 501(c)3 nonprofit that envisions a world in which students, scholars, scientists, archivists, librarians, and other members of academic and cultural institutions are actively engaged in sharing their knowledge with the general public through Wikipedia, Wikidata, and other open collaboration projects on the web.
From its inception in 2013, Wiki Education has supported instructors at more than 500 universities who have incorporated creating new content for Wikipedia into their curriculum. Students gain 21st century skills, including media literacy, writing and research development, and critical thinking, while content gaps on Wikipedia are filled as a result of students’ contributions. These efforts have resulted in twice as much new content as was in the last print edition of Encyclopædia Britannica. New editors recruited through Wiki Education account for 19% of new contributors to the English Wikipedia.
Since 2018, Wiki Education has been training scholars and scientists from a diverse range of institutions in contributing their subject-matter expertise to high-profile Wikipedia pages that reach millions of readers. It collaborates with institutions like the National Archives or the Smithsonian, as well as with a range of academic associations on projects aimed at better informing the general public. As part of a fee-for-service model, the organization has expanded its course offerings to also include trainings about Wikidata, the open knowledge database that powers virtual assistants like Alexa and Siri.
The team at Wiki Education believes that the world is a better place when projects like Wikipedia and Wikidata are enriched by new content and the public has access to free knowledge that can be relied upon. This is a data-driven, outcomes-focused, high-output environment with strategy and intentionality driving everything they do.
- Role description
As the Data Scientist for Wiki Education’s “Visualizing Impact” project, you’ll be responsible for creating machine learning models for identifying Wikipedia and Wikidata content within particular topic areas. Using your machine learning expertise, along with the full-text corpus of Wikipedia articles and other open data, you’ll build a systematic, repeatable, maintainable way to answer questions like: “What are all of the Wikipedia articles related to Asian-American journalists?” and “What are all the Wikipedia articles about the environment in Texas?”. You’ll work closely with our Chief Technology Officer and others to build on the outputs of your models to create visualizations of how the quality and comprehensiveness of a topic changes over time, and to demonstrate the impact of Wiki Education’s programs on those topic areas. Like all software developed at Wiki Education, the code you write will be free and open source; together we’ll strive to create the kind of maintainable codebase that is fun to work on.
This is a 12-month, project-based contract position.
- Demonstrated ability to solve complex data problems
- Excellent written and verbal communication
- Experience with natural language processing
- Experience building text classification models
- Working knowledge of one or more programming languages (ideally Python)
- Nice to have: Experience researching and/or participating in Wikipedia and/or other online communities
- Compensation and workplace
The salary range for this job is $110,000-130,000 (depending on experience). This is a full-time, 40-hour/week position with benefits, and lasts one year. We are a remote organization, but we expect you to be based in the United States. This position reports to the Chief Technology Officer.
- Fully paid medical, dental, and vision insurance premiums for you and your family
- Employer funded Health Savings Account (HSA)
- 401(k) retirement plan with matched contributions of 4% of annual salary with immediate vesting
- Flexible and generous vacation policy (11 U.S. holidays + 2 floating holidays + 18 vacation days) and sick leave policy (12 days)
- Staff attend a week-long All Staff meeting twice a year in person, typically in San Francisco (virtual option available during the COVID-19 pandemic)
- Project background
Through Wiki Education’s Wikipedia Student Program, we support hundreds of college and university instructors and thousands of students each semester. These instructors and students improve Wikipedia articles in their areas of study, resulting in thousands of expanded, improved or newly created articles. And through our Wiki Scholars & Scientists courses, we train a diverse array of knowledge professionals to apply their expertise to Wikipedia and Wikidata.
We know that the high-quality Wikipedia content that comes out of our programs has a huge impact — on the sciences, the humanities, politics, and virtually every field of human endeavor. Randomized, controlled experiments that focus on improvements to Wikipedia have shown that knowledge from Wikipedia makes its way into the scholarly research process, the legal system, and the lives and decision-making of the general public. However, because our work involves such a wide swath of content, it’s challenging to convey the big picture of how Wikipedia has improved over time in particular topic areas (thanks, in no small part, to the work that Wiki Education is doing). Assembling the evidence to tell that story is the project before us.
Institutional partners or sponsors who fund Scholars & Scientists courses have a desire to understand the big-picture impact of our work together. Yet, so far, we’re lacking a compelling way of visualizing the effects that the engagement of subject-matter experts has on a given topic area on Wikipedia. Questions our partners have asked in the past include:
- What is the quality of Wikipedia’s content for <our area of interest>?
- How has the content about <our area of interest> changed over time?
- How much of that change are we responsible for?
For example, how much Wikipedia and Wikidata content is there about climate change and its negative effects? What does the quality distribution of that content look like? How much have Wiki Education’s programs contributed to that change? While the Wikipedia community and the Wikimedia Foundation have developed ways of associating article content with one specific topic taxonomy, these topic divisions reflect the structure of Wikipedia’s editing community. They don’t often line up well with the focus area of an institutional partner or sponsor, and can’t easily be used to answer our prospective partner’s or funder’s questions.
As part of this project, we will build a data pipeline to define and quantify coverage for an arbitrary topic area, and build visualizations to show how much of that coverage came from Wiki Education’s Scholars & Scientists Program. We will then be able to use these visualizations to tell the story of how much difference it makes to participate in, or to fund, our programs in a particular topic area. Many other organizations, groups and individuals within the global Wikimedia community also do activities focused on improving Wikipedia, Wikidata and their sister projects. We expect this system to be widely useful for others working on and/or studying Wikipedia as well.
This project will be overseen by our Chief Technology Officer. The Data Scientist position will be the primary source of machine learning expertise on the project, and will be responsible for planning and building a system for identifying the set of Wikipedia content that is about a given topic at a given point in time. The Data Scientist will work closely with a project-based Web Developer/Designer to build a usable and robust application around the machine learning system.
Please send a cover letter and resume to email@example.com, with “Data Scientist” in the subject line.