On the Discretization Methods for Single-Cell RNA-Sequencing Data when Inferring Gene Regulatory Networks via Cartesian Genetic Programming
Palavras-chave:
Gene Regulatory Network, Cartesian Genetic Programming, scRNA-SeqResumo
Gene Regulatory Networks (GRNs) inference from gene expression data (GED) is a hard task and a
widely addressed scientific challenge. The sequencing of single-cell RNA (scRNA-seq) allows for the transcrip-
tome exploration at the cellular level and it is attractive for the GRNs inference. GRNs can be represented as
Boolean values, in which genes activation and inhibition are presented by logic relationships. Cartesian Genetic
Programming (CGP) can be used to evolve GRNs from gene expressions in a binary data form. Therefore, an
appropriate discretization technique is important due to its effects on the quality of models. Here, we analyze the
performance of ten unsupervised methods for GED discretization when applied to CGP for inferring GRNs. We
considered the following discretization methods, based on: statistics, data distribution, ranking, clustering (e.g., k-
means and Bikmeans), time series (Transitional State Discrimination), and a method developed by Gallo et al. for
data discretization. We also perform a sensibility study of the parameter required by ranking-based methods. We
provide a qualitative and quantitative analysis of the discretization approaches in order to obtain a set of methods
and parameters that are good for modeling GRNs from scRNA-seq data using CGP.