Abstract
Methods for optimizing the prediction of Escherichia coli RNA polymerase promoter sequences by neural networks are presented. A neural network was trained on a set of 80 known promoter sequences combined with different numbers of random sequences. The conserved -10 region and -35 region of the promoter sequences and a combination of these regions were used in three independent training sets. The prediction accuracy of the resulting weight matrix was tested against a separate set of 30 known promoter sequences and 1500 random sequences. The effects of the network's topology, the extent of training, the number of random sequences in the training set and the effects of different data representations were examined and optimized. Accuracies of 100% on the promoter test set and 98.4% on the random test set were achieved with the optimal parameters.
| Original language | English |
|---|---|
| Pages (from-to) | 1593-1599 |
| Number of pages | 7 |
| Journal | Nucleic Acids Research |
| Volume | 19 |
| Issue number | 7 |
| DOIs | |
| State | Published - Apr 11 1991 |
Funding
We thank Dr. Curtis Johnson (Dept. of Biochemistry and Biophysics, OSU) for providing CPU time on a 386 AT and the staff for BIONETTE.CGRB.ORST.EDU for support on the SUN 3/260. We are grateful to Dr. Richard Schori, Robert Burton, Isa Yubran (Dept. of Mathematics, OSU) and Dr. Christopher Mathews (Dept. of Biochemistry and Biophysics, OSU) for their helpful advice. Both B.D. and G.Z. are graduate students, and are supported by a grant of PHS # 30-262-0600 of Dr. Ken van Holde and by a grant from the Department of the Navy, #30-262-3025 of Dr. Pui Shing Ho, respectively.
| Funder number |
|---|
| 30-262-3025 |