In a nutshell, tell me about your research
I work on natural language processing (NLP) and text mining. This involves analysing raw text to try and automatically identify information, patterns and trends within it. At the Turing, I lead a project which uses text mining to analyse brain radiology reports. The aim is to automatically identify different types of tumours and strokes in the reports, referred to by clinicians as phenotypes. My research group has developed highly accurate technology that can now do this.
I’m also working on similar technologies in the context of humanities and am a co-convener of the humanities and data science interest group. For example, I have recently co-led a project analysing historical reports on the third plague pandemic (late 19th/early 20th century) and am now collaborating with colleagues at the University of Edinburgh to develop the first Scottish Gaelic speech recogniser. My research is highly interdisciplinary, something I love.
What aspect of your research is exciting you right now?
We now have access to all brain radiology reports for Generation Scotland patients living within the scope of five major Scottish NHS boards. Generation Scotland is a collection of human biological samples and health data from over 24,000 volunteers, available for medical research. These data enable us to benchmark not only our own tools but also those from other research groups working in this area. This is the first large-scale NLP study on radiology reports across Scottish health boards – we will be leading this work with partners from academia and industry to learn about the performance of different algorithms on this national dataset.
What are the challenges of your research?
A big challenge is regulating data to gain access to clinical datasets. This is necessary given the sensitive nature of the data but can be time-consuming to manage. As a result, we’ve mostly worked so far with consented clinical datasets. To extend our work to a national scale, we are working closely with eDRIS, the National Safe Haven in Scotland.
Another challenge can be finding research fellows with the right skills. Luckily, I work at the University of Edinburgh which is home to the Institute for Language, Cognition and Computation - a leading research institute for computational approaches to language which trains students to become NLP experts. Also, two years ago we (myself, Dr Honghan Wu and Dr William Whitely) founded the Edinburgh Clinical NLP group which has been growing rapidly. Myself and other members of this group are now contributing to the Advanced Care Research Centre, recently established by the University of Edinburgh and Newcastle University and led by the Usher Institute, Edinburgh Medical School at The University of Edinburgh, where NLP is applied as part of data-driven research for advanced care.
What’s been a highlight of your time at the Turing so far?
Receiving funding for my Turing project. This has helped to establish clinical NLP research at Edinburgh and has also been extremely beneficial for my own career.
What blog, podcast or book would you recommend?
I would recommend the podcast More or Less: Behind the Stats. I enjoy it because it tries to fact-check statistics related to current events and answer different questions mathematically in an accessible way. For example, it recently considered why the UK used to do well at the Eurovision Song Contest but is now doing badly. Or it’s looked at how many Olympic-size swimming pools full of the COVID-19 vaccine we will need to vaccinate everyone in the world.
And finally, when not working what can you be found doing?
I love pottering around in my garden, doing photography and spending time with friends and family.