Iacopo Ghinassi is a PhD student at Queen Mary University of London under the DAME programme. The programme is a partnership between Queen Mary University of London and the BBC, aiming to research new data-centric solutions to problems from the broadcasting industry. In this context, his research focus on pre-trained representations from text and audio to automatically segment and annotate TV and radio programmes.
Iacopo has a background in history and computational linguistics and he is an affiliate to the Turing interest group in Humanities and Data Science, as well as a volunteering editor for the Journal of Open Humanities Data.
Iacopo Ghinassi's research aims to investigate efficient ways to topic segmentation and classification by means of pre-trained representations obtained by training deep neural networks on different but related tasks. Topic segmentation is the task of segmenting a document (e.g. the transcript from a news show) into topically coherent segments (e.g. individual news stories). Extracting these smaller segments can then help retrieving information at a more fine-grained level and provide a starting point to related tasks such as summarization and personalization.
The advantages of Iacopo's approach in tackling topic segmentation, known as transfer learning, include a better generalisation to under-resourced domains and savings in energy requirements for training a new system, as such system would build on pre-acquired knowledge. Moreover, the pre-acquired knowledge can be further exploited, for example using it to classify the extracted segments.
Transfer learning is a well-known paradigm in natural language processing, but the benefits and limitations of it have not been thoroughly investigated in the domain of topic segmentation.
Iacopo's research at Turing aims to fill this gap, while exploring the benefits of a similar approach in the audio domain as a new research direction. In parallel, his research also focuses on the potential re-use of the extracted segments both as a tool of research and as the starting point for automatic metadata generation and personalisation for the end user.
These technologies have a range of applications in different organisations dealing with large pools of unstructured data.