Rapid changes in drug use, such as the US opioid epidemic, are a major public health issue. According to the CDC, in 2017, 68% of the 70,237 US drug overdose deaths involved an opioid. Authorities currently rely on annual surveys of drug use, which aren’t frequent enough to keep up with fast-evolving markets. We saw an opportunity to generate much faster drug use statistics by combining two sources of real-time data:
- Drug markets on the “dark web” – the internet unreachable via normal browsers
- The volume of Wikipedia page views for each drug
Combining Dark Web and Wikipedia data to forecast illegal drug use may sound like a pretty wacky idea. And this newly published paper is indeed a typically wacky Turing story.
About a year ago, I was discussing the Dark Web drug markets with Abeer El-Bahrawy (a 2018 Turing Enrichment student) in the Turing Kitchen. It was a pretty standard Tuesday lunchtime. Martin Dittus, an academic from Oxford whom we’d never met before, overhears and generously offers us a dataset from said markets.
One year later we have a published paper using Martin’s dark web data. The credit for this lies with the unusual mix of the authors’ backgrounds – economics, computer science and social anthropology. Without the Turing, we’d probably never have met.
New “designer” drugs are also being developed so quickly that they may not appear on surveys at all—there were 36 new drugs sold in the first half of 2019. It’s getting a bit ridiculous: even as a PhD student you don’t have time to try all of them.
Finally, people are particularly prone to bending the truth during drug surveys. A faster methodology, based on actual sales rather than self-reported use, would therefore be highly valuable for policymakers needing to react quickly to rapid changes in drug use.
"The credit for this lies with the unusual mix of the authors’ backgrounds – economics, computer science and social anthropology. Without the Turing, we’d probably never have met."
How did we collect the data?
The Dark Web markets provide an opportunity to develop faster monitoring. These markets are like Amazon, except they aren’t selling books. Unlike Amazon, buyers usually have to leave feedback upon completing a transaction. This is good for buyers – it enforces a feedback mechanism that keeps Dark Web drug dealers relatively honest compared to their street equivalents.
It is also good for enterprising researchers looking into this problem, because it not only maintains high drug purity, but each piece of feedback represents a purchase with a time and country of location, which allowed us to build a high frequency dataset of drug sales across the globe.
Sadly, data from Dark Web markets isn’t always available due to network issues and law enforcement attacks. Therefore, we cannot rely on them alone for monitoring drug use.
This is where Wikipedia comes in: we hypothesised that people buying drugs online may read about the product they’re about to buy. Therefore, we also collected data on Wikipedia views for each drug, which are reliably available in real-time.
We combined these data sources to build a model that forecasts Dark Web drug sales over time for 9 countries. The findings were encouraging—including Wikipedia data in the model brought down forecast errors by nearly 50%.
We looked at Fentanyl sales in the US during 2016 and 2017, finding the model gives a clear advance warning of the brewing epidemic.
Our study shows the potential of new, unconventional data sources for tackling major policy problems. It required a team with a range of backgrounds, which couldn’t have happened without the Turing. Going forward, we’re hopeful the Institute will continue to foster such interdisciplinary collaborations like this one, sparked from a casual conversation about drugs and data science in the office kitchen.
Read the full paper: Predicting Drug Demand with Wikipedia Views: Evidence from Darknet Markets