top of page

AI enabled environmental data rescue

  • Writer: Jennifer Crago
    Jennifer Crago
  • Jul 17, 2024
  • 3 min read

Updated: Sep 15

ree

In an era defined by climate volatility, data scarcity is not just a technical challenge—it’s a strategic risk. For the Environment Agency, whose mandate hinges on anticipating and responding to an increasingly unpredictable environmental change, the real-time modelling of river flows, flood risks, and drought scenarios depends not only on modern sensors and satellite feeds, but also on the vast, analogue archives that document centuries of hydrological behaviour.


The challenge? Much of this data exists beyond the reach of digital systems: stored in handwritten logs, microfilm rolls, posters curling in storage rooms, and flip charts drawn during mid-century field expeditions. The opportunity? Generative AI, when used responsibly and strategically, can decode, translate, and revitalise this analogue legacy to inform next-generation environmental planning.


Over the past few months, it has been an honour to work with Defra and the Environment Agency. The goal: to explore the potential for Generative AI to be deployed safely, ethically, and effectively to rescue and re-integrate over 10,000 years' worth of hydrological data.


This blog, initially posted on gov.uk, shares insights from a Sandbox Proof of Concept I have been involved in during my role with Accelerated Capability Environment (ACE).


“The risk of flooding and drought within England is a priority area of focus for the Environment Agency (EA), which strives to protect and enhance the environment, to contribute to sustainable development and to help protect the nation’s security in the face of emergencies. 


Over the years, a vast amount of hydrological data has been collected through manual efforts, amassing an impressive physical archive of approximately 10,000 years’ worth of valuable river level and flow information. This vital data could be used to build more accurate climate and flood modelling and help forecast and minimise the impact of future adverse weather events.  


However, a significant challenge is that much of this historical environmental surveillance data has been stored on biodegradable materials, such as paper charts, microfilm and punch tape. These important documents face the risk of irreversible degradation and therefore need cataloguing urgently. Adding to this challenge, the EA is losing the ability to interpret even this archive as staff retire. 


While manual data extraction is underway, the time-consuming plotting of physical data onto graphs means this process – currently estimated to take 40 years – is unsustainable and a new, faster solution was needed. The Department for Environment, Food & Rural Affairs (Defra) approached the Accelerated Capability Environment (ACE) on behalf of the EA to explore the feasibility of using cutting-edge artificial intelligence (AI) and machine-learning technology to digitise, read and interpret the physical data significantly faster while maintaining accuracy. 


Bringing new technologies to bear 

ACE invited suppliers from its Vivace community to present ideas for a proof of concept (PoC) solution to see if AI could help with either fully automating, or semi-automating, manual data rescue. From seven bids, The London Data Company was selected to determine the exact data rescue requirement, including characterising the features of the physical data being digitalised, identify the best-fit method, and build and test a PoC data-rescue tool. 

Working with domain specialists and data users from across the EA, an initial options analysis identified two suitable open-source tools to take forward for the PoC stage – one which was fully automated, and the second which had a human in the loop – so two, rather than the expected one. 


The first PoC, the fully automated tool, showed low feasibility for effective digitisation, due to limitations in accurately rescuing handwritten information that is crucial for understanding axis labels, chart metadata such as location and start date, and adapting to different chart types, such as those with missing gaps, or smudges caused by water damage. It is recommended that further assessments be made in the future as Optical Character Recognition (OCR) performance improves with time.   


Pivoting to the second, human-in-the-loop tool for AI-assisted data rescue produced better results, and recommendations were also made for feature changes which would adapt and increase the effectiveness of this tool on live datasets, including integrating additional AI elements from the first PoC. 


The importance of collating and analysing good quality historic data and records to better understand climatic trends and management of river catchments cannot be underestimated, as it is key to protecting the UK’s security in the face of emergencies”.


As the Environment Agency works to create better places for people and wildlife and support sustainable development, my colleagues and I at ACE look forward to supporting next steps in this and many other priority areas for Defra and the Environment Agency.


This blog initially appeared on gov.uk, and also appears in the ACE Annual Review 2023/24.

Recent Posts

See All

Comments


Commenting on this post isn't available anymore. Contact the site owner for more info.
bottom of page