DomSVR: Domain Boundaries Prediction using Support Vector Regression



  Protein domains are structural and fundamental functional units of proteins. The information of protein domain boundaries is helpful in understanding the evolution, structures and functions of proteins and plays an important role in protein classification. In this paper, we propose a support vector regression based method to locate residues at protein domain boundaries using profiles extracted from the AAindex database. Our method is very simple and efficient and it achieved an average sensitivity of 36.5% and an average specificity of 81% on domain boundary identification on our dataset of multi-domain proteins and showed better predictive performance than previous models for protein domain identification.



  There are two parts in the package: one is for CGI program and the other is main program. You should run the DomSVR.cgi program first and then the Results.cgi. At the same time, Results.cgi calls the DomSVR_web_matcom.exe in the directory "/program".

File Description:
  DomSVR.cgi -
Main CGI interface program that gets information containing query sequence, job title, etc. All the inputs will be transfer to Results.cgi;
  Results.cgi -
Handle input information, get remote blast output in terms of query sequence, and run the program DomSVR_web_matcom.exe;
The package can be downloaded from: DomSVR Classifier



  The dataset contains 354 chains with two or more domains. To get more information of protein domains please visit the CATH website more...

2-domain chains (282):

1kf0A 1n16A 1gygA 1a260 1ahjB 1az90 1b4aA 1b8tA 1bc90 1bl0A 1bxrB 1cid0 1civA 1cn4A 1evkA 1kbpA 1nsf0 1orb0 1qfgA 1qgiA 1qiuA 1qleC 1vcaA 2drpA 1g1sA 1hf2B 1oojA 1mwaA 1n32C 1jilA 1fwxA 1af2A 1b90A 1bag0 1cfb0 1clc0 1cvsC 1dd3A 1dpgA 1e6jP 1ei1A 1fgjA 1hfh0 1hnf0 1lam0 1lbu0 1mmsA 1qg3A 1qqgA 1serB 2bb20 1fpoA 1feuA 1ihgA 1kkeA 1kp0A 1mdmA 1kqmB 1nbqA 1ieqA 1c89A 1a02N 1aqt0 1b63A 1bjyA 1bpoA 1bu6O 1c4zA 1cb7B 1dovA 1e1cA 1e7dA 1etpA 1fakL 1fcdC 1j8mF 1jbaA 1lxa0 1pprM 1qexA 1skz0 1tbrR 1hw7A 1fybA 1jg3A 1moeA 1jybA 1k2vN 1kb9B 1a630 1aw90 1bu2A 1bx0A 1bxnA 1dxxA 1epfA 1esp0 1f1zA 1fbl0 1fyzE 1io2A 1jlxA 1phk0 1prr0 1qm9A 1qmuA 1rgs0 1sfe0 2adr0 2ig2H 1e79H 1jbjA 1jfjA 1ho5A 1n32M 1ft4A 1e3oC 1g7oA 1i8dA 1b6u0 1b7fA 1c2aA 1chrA 1e4jA 1ea3A 1edhA 1et0A 1eucA 1ezjA 1fjeB 1fjgE 1g9mC 1h2aS 1lpbB 1mpyA 1opmA 1pgs0 1rl6A 1ycsB 1e2xA 1jj2O 1j7vR 1is7A 1l8pA 1lk3A 1kgqA 1gknA 1l0lA 1eh4A 1iakA 1jk9B 1g6wC 1fw1A 1af70 1bcmA 1bmtA 1cd1A 1cgkA 1dzfA 1fsz0 1iicA 1knyA 1kxf0 1qcoA 1qd1B 1qgnA 1qi7A 1qrjB 1tfb0 1zxq0 4pgaA 1e3aA 1gjwA 1h89C 1lgtA 1obgA 1i7pA 1a2oA 1a8l0 1asyA 1bj4A 1br6A 1cjxA 1cnqA 1dubA 1egaA 1eysM 1f51A 1g38A 1idqA 1ll10 1qfhA 1qo0D 1qs1A 1qu6A 2psg0 1hf8A 1foaA 2hrvA 1i4wA 1lv1A 1n32H 1l3kA 1gq1A 1ky6A 1hr6A 1a1rA 1a8p0 1acmB 1b9hA 1bbuA 1boy0 1bt4A 1cvjA 1dkxA 1fiqA 1fwyA 1hdmA 1pdkA 1qmgA 1toaA 2amg0 2fmtA 1g6q1 1k20A 1jajA 1jyoE 1ji8A 1l0lB 1gkmA 1llqA 1l0lE 1ox0A 1n2eA 1jt0A 1iawA 1a0gA 1abrB 1bif0 1c3gA 1cliA 1cx2A 1cz4A 1d0hA 1dciA 1f3rB 1f5yA 1fguA 1g6nA 1pgn0 1rmd0 1tpg0 1urk0 1yua0 1gsmA 1kbyH 1m56B 1iz6A 1klfB 1h4kX 1kwmA 1nvfA 1e6rA 1i8lA 1fjrB 1a4sA 1a5z0 1adjA 1b43A 1bcpB 1biiA 1bq50 1c0mA 1c7dA 1div0 1dvpA 1f12A 1fftB 1imaA 1mgtA 1j7lA 1i3jA

3-domain chains (50):
1cqxA 1gof0 1hwxA 1powA 1qsaA 1e2tA 1jb7A 1klo0 1smaA 1jr3E 1e6vA 1ei5A 1ekmA 1eut0 1f60A 1bvp1 1ddt0 1fiqB 1fj1E 1fokA 1mvyA 1jqjA 1a65A 1meyC 1f00I 1fx7A 1h4sA 1h6wA 1bmfA 1cuk0 1dizA 1i1rA 1soxA 1ki0A 1dx5I 1e8bA 1egcA 1hm2A 1iraY 1tf3A 1ay0A 1jswA 2cblA 1hs6A 1h8eD 1a81A 1fuiA 1gdtA 1zpdA 1ep3B

chains having more than three domain (22):
1gntA 1h9zA 1aco0 1yge0 1g40A 1fnf0 1qcfA 1qhoA 1c7sA 1hciA 1l9nA 1k7tA 1kfqA 1mq3A 1bxrA 1ordA 1kekA 1e5wA 1clqA 1d0nA 1m9iA 1hn1A


Other datasets:

    Additionally, integrating 963 1-domain chains with the above dataset were used to train our model and the model can be applied to validate 1-domain chains or multi-domain chains, such as chains from CAFASP-4 and CASP7 datasets. Please see the attachment for details: 963 1-domain chains.

    The classification of domain boundaries can be referred to files single963 and multi354.  

    To compare with other predictors, CAFASP-4 and CASP7 datasets are also involved to evaluate the prediction of domain boundaries.

Copyright @ 2004-2009 by Peng Chen

All Rights Reserved