Visualizing article history with Structural Completeness

You may have noticed a recent addition to the Articles tab of course pages: “structural completeness”. This feature is an experiment in visualizing the history of articles as they develop.

The evolution of an article over the course of a Wikipedia assignment.
The evolution of an article over the course of a Wikipedia assignment.

The structural completeness data comes from the “Objective Revision Evaluation Service” (ORES), a Wikimedia Foundation research project that uses machine learning to analyze Wikipedia articles and individual edits. I started digging into ORES last year to see how well the “wp10” scores — estimates of what score an article would get on the Wikipedia 1.0 scale from Stub to Feature Article at any point in its history — map to the work that student editors do in our classes. What I found was that even small changes in the ORES wp10 score were meaningful in terms of the changes that happened to an article. While the scores don’t account for the intellectual content of articles, they give a great sense of the major — and minor — changes of an individual article over time.

ORES, a data mining project for Wikipedia articles
ORES, a data mining project for Wikipedia articles

In the Dashboard, I’m calling this data “structural completeness”, because the scores are based on how well an article matches the typical structural features of a mature Wikipedia article. The machine learning model calculates scores based on the amount of prose, the number of wikilinks to other articles, the numbers of references, images, headers and templates, and a few other basic features. Down the road, we may be able to use this data to give automated suggestions about what aspects of their article an editor should focus on next — whether adding links to related topics, improving the citations, or breaking it into sections as it grows.

Take a look at how articles by student editors develop. When you spot a big change in the structural completeness score, this usually means something interesting happened to the article that suddenly made it look a lot more (or a lot less) like a typical well-developed Wikipedia article.

I’ll continue to iterate on these visualizations; our goal is to make it as easy as possible to both get an overview of an article’s history and to drill down to the details of individual edits. If you have ideas, comments, or you notice something really interesting with these visualizations, drop me a line!


Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.