Publications

EPS: automated feature selection in case–control studies using extreme pseudo-sampling

Abstract

Summary
Finding informative predictive features in high-dimensional biological case–control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use and customize. Our package enhances the original algorithm by providing new features and …

Date
October 1, 2021
Authors
Ruhollah Shemirani, Stephane Wenric, Eimear Kenny, José Luis Ambite
Journal
Bioinformatics
Volume
37
Issue
19
Pages
3372-3373
Publisher
Oxford University Press