Compound Poisson approximation of word counts in DNA sequences

Смотреть

Автор	Schbath, Sophie
Дата выпуска	1997
dc.description	Identifying words with unexpected frequencies is an important problem in the analysis of long DNA sequences. To solve it, we need an approximation of the distribution of the number of occurrences N(W) of a word W. Modeling DNA sequences with m-order Markov chains, we use the Chen-Stein method to obtain Poisson approximations for two different counts. We approximate the “declumped” count of W by a Poisson variable and the number of occurrences N(W) by a compound Poisson variable. Combinatorial results are used to solve the general case of overlapping words and to calculate the parameters of these distributions.
Формат	application.pdf
Издатель	EDP Sciences
Копирайт	© EDP Sciences, SMAI, 1997
Тема	DNA sequences / word counts / Poisson approximations / compound Poisson distribution / Chen-Stein method / Markov chains / word periods.
Название	Compound Poisson approximation of word counts in DNA sequences
Тип	research-article
DOI	10.1051/ps:1997100
Electronic ISSN	1262-3318
Print ISSN	1292-8100
Журнал	ESAIM: Probability and Statistics
Том	1
Первая страница	1
Последняя страница	16
Аффилиация	Schbath Sophie; Institut National de la Recherche Agronomique, France

277.2Кб