Publications
EPS: automated feature selection in case–control studies using extreme pseudo-sampling
Abstract
Summary
Finding informative predictive features in high-dimensional biological case–control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use and customize. Our package enhances the original algorithm by providing new features and …
- Date
- October 1, 2021
- Authors
- Ruhollah Shemirani, Stephane Wenric, Eimear Kenny, José Luis Ambite
- Journal
- Bioinformatics
- Volume
- 37
- Issue
- 19
- Pages
- 3372-3373
- Publisher
- Oxford University Press