Chris Day (Head of Modern Domestic Records at The National Archives) spoke at the second Unlocking our Digital Past project. Chris’ talk focused on the use of cataloguing data to undertake topic modelling using a latent Dirichlet allocation model.
Chris’ abstract read as: “The correspondence of the General Board of Health (1848-1871) documents the work of a body set up to deal with cholera epidemics in a period where some English homes were so filthy as to be described as ‘mere pigholes not fit for human beings’. Individual descriptions for each of these over 89,000 letters are available on Discovery, The National Archives (UK)’s catalogue. This presentation will examine how data science can be used repurpose archival catalogue descriptions, initially created to enhance the ‘human findability’ of records (and favoured by many UK archives due to high digitisation costs), for large scale computational analysis. The records of the General Board will be used as a case study: their catalogue descriptions topic modelled using a latent Dirichlet allocation model, visualised, and analysed – giving an insight into how new sanitary regulations were negotiated with a divided public during an epidemic. Questions of the validity and utility of using the descriptions of archival material, as opposed to the records themselves, will also be discussed.”
You can view Chris’ presentation below: