Size distribution of function-based human gene sets and the split-merge model


Por: Li, Wentian, Fontanelli, Oscar, Miramontes, Pedro

Publicada: 1 ago 2016
Categoría: Multidisciplinary

Resumen:
The sizes of paralogues-gene families produced by ancestral duplication-are known to follow a power-law distribution. We examine the size distribution of gene sets or gene families where genes are grouped by a similar function or share a common property. The size distribution of Human Gene Nomenclature Committee (HGNC) gene sets deviate from the power-law, and can be fitted much better by a beta rank function. We propose a simple mechanism to break a power-law size distribution by a combination of splitting and merging operations. The largest gene sets are split into two to account for the subfunctional categories, and a small proportion of other gene sets are merged into larger sets as new common themes might be realized. These operations are not uncommon for a curator of gene sets. A simulation shows that iteration of these operations changes the size distribution of Ensembl paralogues and could lead to a distribution fitted by a rank beta function. We further illustrate application of beta rank function by the example of distribution of transcription factors and drug target genes among HGNC gene families.

Filiaciones:
Li, Wentian:
 Northwell Hlth, Feinstein Inst Med Res, Robert S Boas Ctr Genom & Human Genet, Manhasset, NY 11030 USA

Fontanelli, Oscar:
 Univ Nacl Autonoma Mexico, Fac Ciencias, Dept Matemat, Ciudad Univ, Mexico City 04510, DF, Mexico

Miramontes, Pedro:
 Univ Nacl Autonoma Mexico, Fac Ciencias, Dept Matemat, Ciudad Univ, Mexico City 04510, DF, Mexico

 Univ Leipzig, Bioinformat Grp, Haertelstr 16-18, D-04107 Leipzig, Germany

 Univ Leipzig, Interdisciplinary Ctr Bioinformat, Haertelstr 16-18, D-04107 Leipzig, Germany
ISSN: 20545703
Editorial
Royal Society, 6-9 CARLTON HOUSE TERRACE, LONDON SW1Y 5AG, ENGLAND, Reino Unido
Tipo de documento: Article
Volumen: 3 Número: 8
Páginas:
WOS Id: 000384411000019
ID de PubMed: 27853602

MÉTRICAS