VOCCluster: Untargeted Metabolomics Feature Clustering Approach for Clinical Breath Gas Chromatography - Mass Spectrometry Data

Preprint at ChemRxiv

Abstract
Our unsupervised clustering technique, VOCCluster, prototyped in Python, handles features of deconvolved GC-MS breath data. VOCCluster was created from a heuristic ontology based on the observation of experts undertaking data processing with a suite of software packages. VOCCluster identifies and clusters groups of volatile organic compounds (VOCs) from deconvolved GC-MS breath with similar mass spectra and retention index profiles.

VOCCluster was used to cluster more than 15,000 features extracted from 74 GC-MS clinical breath samples obtained from participants with cancer before and after a radiation therapy. VOCCluster was able to cluster those features into 1081 groups (including endogenous,exogenous compounds and instrumental artifacts) with an accuracy rate of 96% (±0.04 at 95% confidence interval). Results were evaluated against a panel of ground truth compounds, and compared to other clustering methods used in previous metabolomics studies such as DBSCAN and OPTICS.

Authors: 
Alkhalifah, Yaser; Phillips, Iain; Soltoggio, Andrea; Darnley, Kareen; Nailon, William H.; McLaren, Duncan; Eddleston, Michael; Thomas, C.L.Paul and Salman, Dahlia