EPS: automated feature selection in case–control studies using extreme pseudo-sampling

Abstract

Summary
Finding informative predictive features in high-dimensional biological case–control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use and customize. Our package enhances the original algorithm by providing new features and …

Date: October 1, 2021
Authors: Ruhollah Shemirani, Stephane Wenric, Eimear Kenny, José Luis Ambite
Journal: Bioinformatics
Volume: 37
Issue: 19
Pages: 3372-3373
Publisher: Oxford University Press

View Paper

Information Sciences Institute

Publications

EPS: automated feature selection in case–control studies using extreme pseudo-sampling

Abstract