In direct collaboration with the AIM research consortia, the RSF is investigating ways for researchers to effectively and efficiently utilise the large and rich population health datasets they have access to for their research studies. A commonly used dataset is SAIL Databank. It is challenging for researchers to know which variables within datasets such as SAIL represent the concepts/domains they care about for their research questions. At the start of a project, it is necessary to undertake an initial exercise of mapping available datasets and variables onto research domains of interest and to understand the coverage and data quality of these selected datasets and variables for a cohort of interest.
For SAIL Databank, there are existing resources, publicly available, which offer meta data on the many datasets held. For example, the Health Data Research Gateway and the connected Metadata Catalogue provides this for SAIL, alongside many other UK health-related datasets. These resources list information about each dataset (general description, population coverage, legal basis for access, variable description and type). Related resources are the existing Concept Libraries and Phenotype Libraries.
A resource was created by Theme 2 is the browseMatadata repository. This is an R packaged designed to streamline the initial process of mapping domains of interest used in research to variables available in the SAIL databank, utilising the currently available meta data. It is currently in the early stages of development, being adapted to fit specific research case studies. Documentation exists which runs through a tutorial with demo data provided with the package; feedback from researchers on the utility of the current features is particularly encouraged at this stage.