This project will describe the rhetorical strategies commonly used in offender-to-offender interactions on criminal chat rooms on the dark web. Through the application of advanced techniques from linguistics and data science, common communicative goals and structures will be identified in a large collection of dark web chats. This information will then be used to build models of how offenders interact online and a taxonomy of user types based on the rhetorical strategies they employ.
Explaining the science
Corpus linguistics is the study of language structure and use based on large samples of natural language data. This project is based on a multi-million word corpus of criminal forums and chat rooms scraped from the dark web, which provides an unprecedented basis for describing language use in these cyber communities.
Move analysis is a data-driven technique for linguistic analysis, whose goal is to capture the functions performed by language in a given communicative context. In this case, over 20,000 'turns' in online chat logs will be manually annotated for their primary communicative functions using this system.
To identify common rhetorical strategies used by offenders on the dark web, the annotated corpus will be analysed using a variety of techniques from text mining - the process of extracting information from large collections of textual data. For example, the distribution of different moves will be used to cluster users based on their communicative goals and build a taxonomy of user types.
Markov chains, probabilistic models that describe sequences of events, will be used to model and visualise the move sequences employed by individual users and groups of users in the corpus. This approach will facilitate the description and comparison of the rhetorical strategies of users in this domain, allowing for generalisations to be made about how different types of users tend to communicate in these dark web chat rooms.
The project will develop a system for describing the rhetorical structure of conversations on criminal dark web chat rooms based on linguistic move analysis - a method for investigating the communicative functions of natural language down to the level of individual utterances. This system will then be used to manually code a large corpus of authentic interactions in dark web chat rooms on a turn-by-turn basis.
Based on this annotated corpus, common communicative goals of offenders in these chat rooms will be identified, as well as the common rhetorical strategies used by offenders to accomplish these goals, through the application of a range of techniques for text mining and stochastic modelling.
Finally, drawing on these results, this project will describe variation in the communicative goals and rhetorical strategies used by different types of users in these chat rooms. This information will then be used to generate a linguistic taxonomy of user types as well as rhetorical profiles for each of these types - for example, differences in how experienced users and inexperienced users interact in this domain.