Can digital pathology, radiomics and genomics data predict patient outcomes in solid cancers?

Cardiff University

Solid tumour trial datasets are increasing in size and complexity. During their treatment journey and trials participation, patients with an advanced solid cancer will have various scans (CT, MRI, PET) repeated at intervals to assess tumour response (using RECIST guidelines). Their tumours will typically be biopsied and genetically profiled and increasingly serial blood samples will be collected and circulating tumour DNA sequenced. Every patient has data collected about them as an individual, their cancer presentation and about the outcomes from their treatment. The FAKTION breast cancer trial dataset is typical of the richness of data now being collected. This trial, now concluded, has a consent model which provides a comprehensive set of genomics datasets (both germline and via circulating tumour DNA), CT imaging data 2 along with trials data, tumour response, and five-year progression free and overall survival data for research purposes. The FAKTION dataset is noteworthy in including patients with both measurable and non-measurable disease, over multiple timepoints and is supplemented by Wales Cancer Biobank (WCB) who currently hold outcome and treatment data for over 16,000 patients, where around 8000 have histopathology data and 2500 have genomic data generated. WCB’s consent model also allows access to NHS records and images. So FAKTION, aligned with WCB, represents a valuable opportunity to test the hypothesis that digital pathology, radiomics and genomics data can predict patient outcomes in solid cancers through machine learning.

Aims and objectives

Utilising the expertise within the Life Imaging and Data Analytics (LIDA) team (Spezi), with radiologist oversite (Foley), we will first establish AI-based automated workflows for image processing which can be used to manipulate and fuse imaging and anatomical data from both cross-sectional and longitudinal scans. Such workflows will eventually be used to call RECIST criteria within clinical trials. We will make these containers widely available, allowing for their deployment using the federated learning analytics model into environments where the raw imaging data cannot be shared/exported, but where exporting the RECIST criteria meet information governance requirements. LIDA actively participates to Computer Aided Theragnostic projects in Radiation Oncology using the Personal Health Train (PHT) paradigm ( where curated data repositories are made available as Findable, Accessible, Interoperable and Reusable (FAIR) (Wilkinson, et al., 2016) “stations” and allow AI applications (“trains”), to visit each station and learn from the local data (Deist et al., 2020; Theophanous et al., 2022). Through this research we will create a toolkit of imaging AI methods for RECIST response calling and will use the 1000+ CT images available to us from the FAKTION trial to develop guidelines for best practice in algorithm usage and, more importantly, the data curation needed alongside the imaging files. This information will be fed to our collaborators in Wales Cancer Biobank and the NHS All Wales Medical Genomics service as their biobank and genomic linkage datasets increasingly have an imaging dimension.

Materials and methods

To develop novel drugs and combinations, most solid cancer clinical trials rely on surrogate signals of efficacy to encourage and inform larger scale, costly phase III development. The commonest surrogate signal is an assessment of tumour shrinkage as measured by the international RECIST v 1.1 criteria. The method is widely recognised to suffer significant limitations not least because a consultant radiologist must manually select and measure a limited number of lesions leading to inevitable classification variations or disagreements & confusion over how to call the result. Also, RECIST fails to use volumetric or textural “radiomics” evaluations, so metastases not easily measurable are excluded. Gopal et al. (2020) undertook a proof-of-concept study and showed different classes of AI algorithm were able to predict RECIST responses although the lack of standardized definitions and validated reference values have hampered clinical use of high-throughput image-based phenotyping (Zwanenburg et al., 2020). The right choice of algorithm and use of CT imaging data, when accompanied by a standardised way of curating and annotating the image library to allow input into high-throughput analysis pipelines are known to improve and accelerate RECIST response calling.

The student is expected to:

  1. Develop a data integration platform for analysis- selection and preparation.
  2. Develop efficient techniques to cross reference regions of interest on sequential CT scans and undertake textural analysis, explore baseline and follow signatures in relation to clinical benefit parameters.
  3. Detailed genomics signature data integration, evaluation of predictive and prognostic markers in relation to clinical data set. Assessment of correlations between radiomics profile and genomics profile.
  4. Digitisation of H&E slides in collaboration with Wales Cancer Biobank.
  5. Integration and evaluation of correlations between the three omics data sets.
  6. Using accumulated data look for new endpoints for clinical trials.
  7. Proposal to be developed, with approach to pharmaceutical industry to explore post PhD validation in phase III trial data set.

Anticipated results

We will extend the capabilities of Spaarc Pipeline for Automated Analysis and Radiomics Computing ( and integrate with well-known AI-based digital pathology image analysis techniques for pathomics.

How to apply:

All applications should be submitted via the online application portal SIMS.

Further details on the application process can be found in the “how to apply” page with instructions for form completion here.

Online application portal is found at:

Along with the online application the candidate is asked to upload a covering letter, a CV, and two academic references (Reference Form Template). Transcripts of degrees and additional supportive documents can be provided at the interview stage.

Please complete the online application form by 5.00pm on Friday 30th June 2023. If you are shortlisted for interview, you will be notified Friday 7th July. Interview will be held during week commencing Monday 17th July. Notification shortly after that certainly by the end of the month.

Academic Criteria:

Candidates should hold or expect to gain a first-class degree or a good 2.1 (or their equivalent) in Engineering or a related subject. 

Essential skills: Highly numerate, excellent analysis and problem solving, effective written and oral communication, good project management and organization.

Desirable skills: Image processing, Matlab, python.

To help us track our recruitment effort, please indicate in your email – cover/motivation letter where ( you saw this posting.

Job Location