Wikipedia is my favorite source of news. When news breaks, I trust Wikipedia to have an article contextualizing the event. Wikipedia is also my favorite way to catch up on Game of Thrones episodes I’ve missed. Wikipedia has its finger on the pulse of current events!
Sometimes I need to reference it for something more academic. I’ll need to know how grapefruit interacts with medications or why mixing bleach and ammonia doesn’t create a super cleaner. Wikipedia can tell you who died on Game of Thrones. But it also gives me insight into the world around me. The problem is that all too often, the articles on the real world just aren’t as comprehensive.
Wiki Ed is working to change that. Wiki Ed reaches higher ed classrooms from the U.S. and Canada who contribute to Wikipedia. So it makes sense that students are adding a lot of academic content. But how do we know if we are contributing “a lot of content” or “a lot of content”?
That’s what I set out to do. But it’s a tricky question for several reasons.
Defining “academic” content
Before we can figure out how much academic content students create, we need to know what makes an article “academic.”
Unfortunately, there isn’t an “academic content” label on Wikipedia. Subject area isn’t definitive either. Super Bowl 50 and Concussions in American football are both about football. Only one of them is academic.
We need to look at several features of the article, and use them all to make a best guess as to what it is.
This is where machine learning comes in. We fed a machine learning algorithm a bunch of articles that we already knew were academic. Then, we fed it a bunch of articles we knew weren’t academic. The algorithm pieced together a pretty good idea of what “academic content” was. It can figure out that Nirvana is academic, while Nirvana (Band) isn’t.
Tracking student contributions
Now that we had a way of figuring out if an article was academic, we turned to our real question. What portion of academic content are students contributing? Here’s what we learned:
This is a plot of the daily percentage of our students’ academic content creation over four terms. From just glancing at it, we can see some major work. On their best days, our student editors are producing more than 15% of all content added. But it’s easy to be good for a day. How do they perform over the long term?
We can look to our best 30-day period for some ideas. Last semester, we maxed out at 4.6% right near the end of the term, between mid-April and mid-May. Now, before you go thinking that 4.6% is small, remember: Wikipedia is a big place, with more than 5 million articles and thousands of editors. Australia is only 5% of the Earth’s land mass, and nobody considers it small!
Wiki Ed isn’t just trying to develop existing academic content. We’re trying to create new content, and fill out content that was hardly even there to begin with. For example, stubs and start-class articles, which may run a sentence or two in length. These articles are what we consider to be early in their development. Here’s what student editors have done for those early-stage articles:
We see a similar pattern, but with more consistent daily levels of contribution. Our best 30-day period is again between mid-April and mid-May. This time, students contributed a whopping 10% of all content among these early stage articles.
This isn’t just a particularly good month. Extending our window to 90 days, we saw that our student editors still produced a staggering 7.5% of content.
This sort of content will never be the most viewed, but it may be the most consequential. It’s what makes Wikipedia such a go-to knowledge source. The same place you can find your Game of Thrones spoilers, students are using to study. People are using it to make decisions and inform their understanding of policy. Coverage of academic content has far-reaching consequences, so it had better be reliable. I’m happy to say that Wiki Ed is helping to make that happen.
For more on Kevin’s work on the impact student editing has on Wikipedia, see here. Questions? Send us an e-mail: firstname.lastname@example.org.