GaMC: genetic algorithm-based multiple classifier to predict the long-range contacts


Server


Abstract:

  In this paper, we apply an evolutionary optimization classifier, referred to as genetic algorithm-based multiple classifier (GaMC), to the long-range contacts prediction. As a result, about 44.1% contacts between long-range residues (with a sequence separation of at least 24 amino acids) are founded around the sequence profile (SP) centre when evaluating the top L/5 (L is the sequence length of protein) classified contacts if the SP centers are known. Meanwhile, with the knowledge of sequence profile centre and the GaMC method, about 20.42% long-range contacts are correctly predicted. Results showed that SP center may be a sound pathway to predict contact map in protein structures.

 

SOFTWARE PACKAGE:

  There are two parts in the package: one is for CGI program and the other is main program. You should run the GaMC.cgi program first and then the Results.cgi. At the same time, Results.cgi calls the GaMC.exe in the directory "/program".

File Description:
  GaMC.cgi -
Main CGI interface program that gets information containing query sequence, job title, etc. All the inputs will be transfer to Results.cgi;
  Results.cgi -
Handle input information, get remote blast output in terms of query sequence, and run the program GaMC.exe;
  fasta.fa -
Query sequence file in fasta format;
  Output.GaMC -
Long-range contacts prediction file in EVA format;
  blast.out -
File with remote blast results;
  blast.profile -
Sequence profile file parsed from balst.out;
  Centre.txt -
Centers for the five contact distance-classes extracted from training dataset;
  Chromosome.txt -
File with transformation rule where "0", "1", and "2" denote "a", "b", and "c" in my paper, respectively.


The package can be downloaded from: GaMC predictor

 

DATASET:

  There are 193 protein chains in the dataset which shows as below:

153l00     1    10   530    10     3     1     1   185 1.60
1a6m00 1 10 490 10 1 1 1 151 1.00
1a8l00 5 1 623 1 1 1 1 226 1.90
1a8q00 3 40 50 1820 12 3 1 274 1.75
1a4400 3 90 280 10 2 3 1 185 1.84
1acf00 3 30 450 30 2 1 1 125 2.00
1af700 5 1 291 1 1 1 1 274 2.00
1agi00 3 10 130 10 4 1 1 125 1.50
1ah700 1 10 575 10 1 1 1 245 1.50
1ajz00 3 20 20 20 1 3 1 282 2.00
1ako00 3 60 10 10 1 1 1 268 1.70
1amm00 5 1 77 1 1 2 1 174 1.20
1amp00 3 40 630 10 1 1 1 291 1.80
1arb00 5 1 21 1 1 1 1 263 1.20
1bd800 1 25 40 20 3 1 1 156 1.80
1bea00 1 10 120 10 1 1 1 127 1.95
1bn6A0 3 40 50 1820 2 2 1 294 1.50
1bolA0 3 90 730 10 4 1 1 222 2.00
1bxaA0 2 60 40 420 1 1 4 105 1.30
1byrA0 3 30 870 10 3 1 1 155 2.00
1c1kA0 5 1 682 10 1 1 1 217 1.45
1c7kA0 3 40 390 10 1 1 1 132 1.00
1c5200 1 10 760 10 9 1 1 131 1.28
1cc8A0 3 30 70 100 1 1 1 73 1.02
1cei00 1 10 1200 20 1 2 1 94 1.80
1cewI0 3 10 450 10 2 1 1 108 2.00
1chd00 3 40 50 180 1 1 1 203 1.75
1cpn00 2 60 120 200 14 3 1 208 1.80
1cpq00 1 20 120 10 4 1 1 129 1.72
1ctj00 1 10 760 10 2 1 1 89 1.10
1cuoA0 2 60 40 420 6 2 1 129 1.60
1CV800 3 90 70 10 7 2 1 173 1.59
1cyo00 3 10 120 10 2 1 1 93 1.50
1cznA0 3 40 50 360 1 2 1 169 1.70
1dbs00 3 40 50 300 1 1 1 224 1.80
1dhn00 3 30 1130 10 1 1 1 121 1.65
1dlwA0 1 10 490 10 11 1 1 116 1.54
1dupA0 2 70 40 10 1 1 1 136 1.90
1e29A0 1 10 760 10 5 1 1 135 1.21
1eb6A0 3 40 390 10 2 1 1 177 1.00
1edt00 3 20 20 80 34 1 1 271 1.90
1eiqA0 5 1 53 1 1 1 1 289 2.00
1esl00 3 10 100 10 15 1 1 162 2.00
1ew4A0 3 30 920 10 1 1 1 106 1.40
1exrA0 5 1 36 1 1 1 1 146 1.00
1ey0A0 2 40 50 90 1 2 1 149 1.60
1fi2A0 2 60 120 10 4 1 1 201 1.60
1fm4A0 3 30 530 20 1 2 1 159 1.97
1fna00 2 60 40 30 3 1 1 91 1.80
1fnc00 5 1 71 1 1 2 1 296 1.70
1fua00 3 40 225 10 2 1 5 215 1.92
1g5tA0 3 40 50 300 39 1 1 170 1.80
1g8aA0 5
1g66A0 3 40 50 1820 1 1 1 207 0.90
1gn0A0 3 40 250 10 1 1 2 108 1.80
1gnuA0 3 10 20 90 3 1 1 117 1.75
1gpr00 2 70 70 10 1 1 1 162 1.90
1gqnA0 3 20 20 70 24 1 1 252 1.78
1gqvA0 3 10 130 10 2 1 1 135 0.98
1hbkA0 1 20 80 10 4 1 1 94 2.00
1hcz00 5 1 45 1 1 3 1 250 1.96
1hfs00 3 40 390 10 3 3 1 160 1.70
1hg7A0 3 90 1210 10 1 2 1 66 1.15
1hh7A0 1 10 760 10 3 1 1 114 1.40
1hka00 3 30 70 560 1 1 1 158 1.50
1hztA0 3 90 79 10 1 1 1 160 1.45
1i1nA 3
1i2kA0 5 1 715 10 1 1 1 269 1.79
1i6pA0 3 40 1050 10 2 1 1 220 2.00
1i6tA0 3 90 80 10 1 1 1 175 1.20
1i9dA0 5 1 936 1 1 1 1 138 1.65
1i9gA0 5 1 1038 1 1 1 1 264 1.98
1iab00 3 40 390 10 7 1 1 200 1.79
1igd00 3 10 20 10 1 1 1 61 1.10
1iqqA0 3 90 730 10 2 1 1 200 1.50
1iqzA0 3 30 70 20 1 1 1 81 0.92
1is6A0 2 60 120 200 4 1 1 135 1.70
1ispA0 3 40 50 1820 7 1 1 181 1.30
1j0oA0 3 90 10 10 1 1 2 107 1.15
1j1tA0 2 60 120 200 23 1 1 228 2.00
1j2lA0 4 10 70 10 1 1 1 70 1.70
1j7gA0 3 50 80 10 1 2 1 144 1.64
1j27A0 3 30 70 1120 1 1 1 98 1.70
1jbc00 2 60 120 200 1 1 1 237 1.20
1jfxA0 3 20 20 80 22 1 1 217 1.65
1jk7A0 3 60 21 10 2 1 1 299 1.90
1jvwA0 3 10 50 40 2 1 1 167 1.70
1k12A0 2 60 120 260 11 1 1 158 1.90
1kao00 3 40 50 300 4 2 1 167 1.70
1khiA0 5 1 1082 10 1 1 1 147 1.78
1klxA0 1 25 40 10 9 1 1 138 1.95
1knb00 2 60 90 10 1 2 1 196 1.70
1kt7A0 2 40 128 20 2 1 1 183 1.27
1kv9A2 1 10 760 10 14 1 1 103 1.90
1l8fA0 2 40 40 10 1 2 1 207 1.80
1lam01 3 40 220 10 1 1 1 165 1.60
1lki00 1 20 1250 10 9 1 1 180 2.00
1lkoA0 5 1 276 1 1 1 1 190 1.63
1lp8A0 5
1ltuA0 1 10 800 10 1 1 1 297 1.74
1lwbA0 1 20 90 10 3 1 1 122 1.05
1lxzA0 2 60 110 10 1 1 2 207 1.25
1ly2A0 5 1 843 1 1 1 1 130 1.80
1md6A0 2 80 10 50 6 1 1 154 1.60
1mgtA0 5 1 605 1 1 1 1 169 1.80
1mh9A0 5 1 1090 11 1 1 1 194 1.80
1mml00 5 1 115 1 1 1 1 251 1.80
1mzl00 1 10 110 10 1 1 1 93 1.90
1nar00 3 20 20 80 31 1 1 290 1.80
1nepA0 2 60 40 770 1 1 1 130 1.70
1nf9A0 3 40 50 850 1 1 1 207 1.50
1nogA0 1 20 1200 10 1 1 1 155 1.55
1nox00 3 40 109 10 1 1 1 205 1.59
1np4A0 2 40 128 20 1 1 1 184 1.50
1npk00 3 30 70 141 1 1 1 154 1.80
1nsj00 3 20 20 70 41 1 1 205 2.00
1ntfA0 3 60 10 10 1 1 1 280 1.80
1nwzA0 3 30 450 20 1 1 1 125 0.82
1o4yA0 2 60 120 200 6 1 1 288 1.48
1o08A0 5 1 1081 10 1 1 1 221 1.20
1oejA0 2
1ok0A0 2 60 40 20 1 1 1 74 0.93
1orb00 5 1 122 1 1 1 1 293 2.00
1ouvA0 1 25 40 10 13 1 1 265 2.00
1p5fA0 3 40 50 880 1 1 1 189 1.10
1pbn00 3 40 50 1580 1 1 2 289 2.00
1plc00 2 60 40 420 4 1 1 99 1.33
1pmi01 2 60 120 10 8 1 1 185 1.70
1pmy00 2 60 40 420 5 2 1 123 1.50
1ppn00 3 90 70 10 3 1 1 212 1.60
1puc00 3 30 170 10 1 1 1 105 1.95
1pxfA0 2 40 50 140 15 1 1 111 1.87
1pz4A0 3 30 1050 10 1 1 1 116 1.35
1qb7A0 3 40 50 2020 4 1 1 236 1.50
1qgiA0 5 1 654 10 1 1 1 259 1.60
1qj8A0 2 40 128 70 2 1 1 148 1.90
1qk8A0 3 40 30 10 6 1 2 146 1.40
1qtwA0 3 20 20 150 2 1 1 285 1.02
1qy1A0 2 40 128 20 11 1 1 174 1.70
1qz9A0 3 90 1150 10 8 1 1 143 1.85
1rj1A 1
1rlhA 3
1rljA0 3 40 50 360 16 1 1 135 2.00
1rocA0 2 60 40 1490 1 1 1 155 1.50
1rv9A 3
1rw7A0 3 40 50 880 10 1 1 243 1.80
1s8nA0 5
1sf9A 2
1sjwA 3
1skf00 3 40 710 10 5 1 6 262 2.00
1smlA0 3 60 15 10 5 1 1 269 1.70
1sur00 3 40 50 620 22 1 1 215 2.00
1sznA2 2 60 40 1180 6 1 1 102 1.54
1tal00 5 1 75 1 1 1 1 198 1.50
1thm00 3 40 50 200 1 5 1 279 1.37
1tif00 3 10 20 80 1 1 1 78 1.80
1tml00 3 20 20 40 3 1 1 286 1.80
1tp6A0 3 10 450 50 5 1 1 126 1.50
1tzvA0 1 10 940 10 1 1 1 141 1.35
1uaiA0 2 60 120 200 2 1 1 223 1.20
1udh00 3 40 470 10 1 3 1 228 1.75
1uekA0 5 1 2099 10 1 1 1 268 1.70
1uohA0 1 25 40 20 6 1 1 226 2.00
1v7rA0 3 90 950 10 1 1 1 186 1.40
1vhh00 3 30 1380 10 1 1 1 162 1.70
1vid00 3 40 50 150 22 1 1 221 2.00
1vls00 1 20 120 30 1 1 1 146 1.85
1wab00 3 40 50 1110 2 1 2 216 1.70
1whi00 2 40 150 20 1 1 1 122 1.50
1who00 2 60 40 760 1 1 1 96 1.90
1xnb00 2 60 120 180 1 2 1 185 1.49
1zin00 3 40 50 300 22 1 1 217 1.60
2abk00 5 1 5 1 1 2 1 211 1.85
2acy00 3 30 70 100 4 1 1 98 1.80
2baa00 5 1 42 1 1 2 1 243 1.80
2cba00 3 10 200 10 1 1 1 260 1.54
2end00 1 10 440 10 1 1 1 138 1.45
2hvm00 3 20 20 80 21 2 1 273 1.80
2izi00 2 40 128 30 1 1 3 123 1.70
2lisA0 1 20 150 10 1 1 1 136 1.35
2mcm00 2 60 40 230 1 2 1 112 1.50
2mhr00 1 20 120 50 1 2 1 118 1.70
2pii00 3 30 70 120 2 1 1 112 1.90
2plc00 3 20 20 190 2 1 1 274 2.00
2pth00 3 40 50 1470 1 1 1 193 1.20
2rn200 3 30 420 10 2 1 3 155 1.48
2tgi00 2 10 90 10 2 1 1 112 1.80
3chy00 3 40 50 2300 1 1 2 128 1.66
3lzm00 1 10 530 40 1 4 37 164 1.70
3vub00 2 30 30 110 2 1 1 101 1.40
4lzt00 1 10 530 10 1 1 1 129 0.95
7fd1A0 3 30 70 20 3 1 1 106 1.30
7yasA0 3 40 50 1820 4 1 1 257 1.75
-----------------------------------------------------------------------

CATH List File (CLF) Format 2.0:
-------------------------------
This file format has an entry for each structural entry in CATH.

Column 1: CATH domain name (seven characters)
Column 2: Class number
Column 3: Architecture number
Column 4: Topology number
Column 5: Homologous superfamily number
Column 6: S35 sequence cluster number
Column 7: S60 sequence cluster number
Column 8: S95 sequence cluster number
Column 9: Domain length
Column 10: Structure resolution (Angstroms)
(999.000 for NMR structures and 1000.000 for obsolete PDB entries)
 


Copyright @ 2004-2009 by Peng Chen

All Rights Reserved