Size distribution of function-based human gene sets and the split-merge model
Por:
Li, Wentian, Fontanelli, Oscar, Miramontes, Pedro
Publicada:
1 ago 2016
Categoría:
Multidisciplinary
Resumen:
The sizes of paralogues-gene families produced by ancestral
duplication-are known to follow a power-law distribution. We examine the
size distribution of gene sets or gene families where genes are grouped
by a similar function or share a common property. The size distribution
of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the
power-law, and can be fitted much better by a beta rank function. We
propose a simple mechanism to break a power-law size distribution by a
combination of splitting and merging operations. The largest gene sets
are split into two to account for the subfunctional categories, and a
small proportion of other gene sets are merged into larger sets as new
common themes might be realized. These operations are not uncommon for a
curator of gene sets. A simulation shows that iteration of these
operations changes the size distribution of Ensembl paralogues and could
lead to a distribution fitted by a rank beta function. We further
illustrate application of beta rank function by the example of
distribution of transcription factors and drug target genes among HGNC
gene families.
Filiaciones:
Li, Wentian:
Northwell Hlth, Feinstein Inst Med Res, Robert S Boas Ctr Genom & Human Genet, Manhasset, NY 11030 USA
Fontanelli, Oscar:
Univ Nacl Autonoma Mexico, Fac Ciencias, Dept Matemat, Ciudad Univ, Mexico City 04510, DF, Mexico
Miramontes, Pedro:
Univ Nacl Autonoma Mexico, Fac Ciencias, Dept Matemat, Ciudad Univ, Mexico City 04510, DF, Mexico
Univ Leipzig, Bioinformat Grp, Haertelstr 16-18, D-04107 Leipzig, Germany
Univ Leipzig, Interdisciplinary Ctr Bioinformat, Haertelstr 16-18, D-04107 Leipzig, Germany
|