INSTALLATION AND IMPLEMENTATION OF I-TASSER-MD SUITE
(Copyright 2021 by Zhang Lab, University of Michigan, All rights reserved)
(Version 1.0, 2021/09/16)

1. What is I-TASSER-MD Suite?

The I-TASSER-MD Suite is a composite package of programs for protein
structure modeling from cryo-EM denisty maps. The Suite
includes the following programs:

a) I-TASSER-MD: A hierarchical program for multi-domain protein structure modeling
b) FUpred: A deep-learning based program for domain boundary prediction
c) ThreaDom: A threading template based program for domain boundary prediction
d) DeepMSA: A program for multiple sequence alignmnet generation
e) DeepPotential: A deep residual neural-network algorithm for inter-residue spatial restraints prediction
f) D-I-TASSER: A single-domain protein structure modeling algorithm using deep leraning predicted spatial restraints
g) DEMO: A program for multi-domain protein structure assembly
h) LOMETS2: A meta-approach included in I-TASSER for threading templates identification
i) FASPR: A program for protein side-chain packing
j) ModRefiner: Construct and refine atomic model from C-alpha traces
k) NWalign: Protein sequence alignments by Needleman-Wunsch algorithm
l) PSSpred: A program for Protein Secondary Structure PREDiction
m) COACH: A function annotation program based on COFACTOR, TM-SITE and S-SITE
n) COFACTOR: A program for ligand-binding site, EC number & GO term prediction

2. How to install the I-TASSER-MD Suite?

a) download the I-TASSER-MD Suite 'I-TASSER-MD-1.0.tar.gz' from
https://zhanggroup.org/I-TASSER-MD/download/
and unpack 'I-TASSER-MD-1.0.tar.gz by
> tar -zxvf I-TASSER-MD-1.0.tar.gz
The root path of this package is called $pkgdir, e.g.
/home/yourname/I-TASSER-MD-1.0. You should have all the programs under this
directory. You can install the package at any location on your computer.

b) Download I-TASSER-MD library files from
https://zhanggroup.org/I-TASSER-MD/download/
A script 'download_lib.pl' is provided in the package for automated
library download and update of the libraries.
The library needs about 150GB of the disk space.
We recommend putting the library files under the path /home/yourname/ITLIB.

c) Third-party software installation:
While the majority of programs in the package 'I-TASSER-MD-1.0.tar.gz' are
developed in the Zhang Lab herein the permission of use is released,
there are some programs and databases (including blast, nr, GOparser, uniclust30,
uniref90 and metaclust) which were developed by third-party groups. A default
version of blast and nr are included in the package. It is user's obligation to obtain
license permission from the developers for all the third-party software
before using them. In addition, your system needs to have Java, python2,
python3 (which supports pytorch >1.1.0) installed.

To use DeepMSA, you need download uniclust30, uniref90 and metaclust from
http://gwdu111.gwdg.de/~compbiol/uniclust/2017_04/uniclust30_2017_04_hhsuite.tar.gz ,
ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz ,
and https://metaclust.mmseqs.org/2017_05/metaclust_2017_05.fasta.gz. after you unpack
them, put the entire folder to the I-TASSER-MD library folder, (i.e. where the folder
you put your PDB, MTX, DEP folders). Then rename the folder uniclust30_xxx_xxx to
uniclust30, uniref90_xxx to uniref90, metaclust_xxx to metaclust. Then use $pkgdir/
DeepMSA/bin/esl-sfetch to create .ssi index for uniref90 and metaclust, here
$pkgdir means the path where you put the I-TASSER-MD suite package. For example, if
the uniref90 database in uniref90 folder is named as uniref90.fasta, then go to uniref90
folder, run $pkgdir/contact/DeepMSA/bin/esl-sfetch --index uniref90.fasta, you will find
a new file named as uniref90.fasta.ssi after the command done. Then do the same thing to
metaclust database. If you use different version of uniclust30, uniref90 or metaclust,
please go to $pkgdir/run_I-TASSER-MD.py, change the variables:

$hhbdbdir = "$libdir/uniclust30";
$jacdbdir = "$libdir/uniref90";
$hmsdbdir = "$libdir/metaclust";

$hhbdb = "$libdir/uniclust30/uniclust30_2017_04";
$jacdb = "$libdir/uniref90/uniref90.fasta";
$hmsdb = "$libdir/metaclust/metaclust.fasta";

3. Bug report:

Please report and post bugs and suggestions at the message board:
https://zhanggroup.org/forum/

#######################################################
# #
# 4. Installation and implementation of I-TASSER-MD #
# #
#######################################################

4.1. Introduction of I-TASSER-MD

I-TASSER-MD is a hierarchical protocol to predict structures and functions of multi-domain
proteins. It first predicts the domain bounaries by FUpred and ThreaDom based on the deep
learning contact map prediction and the multiple template alignment. Meanwhile, residue-
residue spatial restraints are generated by the deep convolutional neural-network according
to the multiple sequence alignment constructed from the whole-genome and metagenome
databases. Model of each inividual domain is then independently constructured by I-TASSER
guided by the deep learning predicted spatial restraints. Next, the inividual domain models
are assembled into full-length structure by DEMO under the guidance of knowledge-based
inter-domain protentials and deep-learning distance profiles. Finally, the protein function
on both domain level and full-chain level are annotated by COFACTOR based on structures,
sequences, and protein-protein interaction networks. Large-scale benchmark tests have shown
significant advantage of I-TASSER-MD over traditional protein structure prediction methods
for high-accuracy multi-domain protein structure modeling.

4.2. How to run I-TASSER-MD?

a) Main script for running I-TASSER-MD is $pkgdir/run_I-TASSER-MD.py, where "$pkgdir" is the
location of run_I-TASSER-MD.py script.
Run it directly without arguments will output the help information.

b) The following arguments must be set (mandatory arguments). One example is:

"$pkgdir/run_I-TASSER-MD.py protein_name input_dir sequence [Options]"

'protein_name' is the name of the folder containg the protein sequence and cryo-EM density map
'input_dir' is the directory which contains the query folder
'sequence' is the directory of your query sequence

c) Other arguments are optional whose default values have been set.
User can reset one or more of them. One example of command line is:

"$pkgdir/run_I-TASSER-MD.py protein_name input_dir sequence -template XXX.pdb"

-template Provide the template strcuture to guide the domain assembly. The tmeplate
should be in PDB format.
-deepdist [no or yes], flag of predicted distance by DomainDist to guide the assembly.
The default value is "yes".
-EMmap The cryo-EM density map in MRC or CCP4 format.
-reso The resolution of the density map.
-CLink The cross link data (follw the format provided on websever).
-expdom Provide the experimental domain information including domain definition
and PDB domain models if some experimental domain models are available.
See the websever or README for the explanation of the format.
-LBS [false or true], whether to predict ligand-binding site, default is false.
-EC [false or true], whether to predict EC number, default is false
-GO [false or true], whether to predict GO terms, default is false
-runstyle default value is "serial" which means running I-TASSER simulation sequentially.
"parallel" means running parallel simulation jobs in the
cluster using PBS/torque job scheduling system.
"gnuparallel" means running parallel simulation jobs on
one computer with multiple cores using GNU parallel
-run [real, benchmark],"real" will use all templates, "benchmark"
will exclude homologous templates
-libdir means the path of the template libraries for I-TASSER.
The default directory is "$pkgdir/ITLIB". You must use this option
to change the path if you did not put it in the default directory.
-java_home means the path contains the java executable "bin/java"
(your system needs to have Java installed)
-python2 path to python 2, for example /usr/bin/python
-python3 path to python 3 for distance prediction, need to support pytorch 1.1.0,
for example /usr/bin/python3

d) Where are the final predicted results?
The following results are included in "/input_dir/protein_name":

"model*.pdb" the final model created by I-TASSER-MD
"emmodel*.pdb" the final model refined by ModRefiner
"dom*.pdb" the domain model predicted by D-I-TASSER
"FUpred.info" the predicted domain boundary
"seq.ss" the secondary structure predicted by PSSpred
"cscore" the confidence score, estimated TM-score, and estimated RMSD
of the final model

NOTE:
a) Outline of steps for running I-TASSER-MD by 'run_I-TASSER-MD.py':
a1) Prase user provided information
a2) run 'DeepPotential' to predict inter-residue spatial restraints of the full-chain
a3) run 'LOMETS2' to determine the protein type
a3) run 'ThreaDom' or 'FUpred' to predict the domain boundary
a4) run 'D-I-TASSER' to predict the model of each domain. If the protein is
predicted as a single-domain protein, the full-length model will be directly
generated by D-I-TASSER
a5) run 'DEMO' to assemble all domain models into a full-length model
a6) run run 'COACH' and 'COFACTOR' to generate ligand-binding sites, EC number and
GO terms predictions.
b) 'seq.fasta' is the query sequence file in FASTA format, which is the
only needed input file for running I-TASSER-MD. This file should be put
in "./input_dir/protein_name" before running this job.
c) If working on a cluster with multiple nodes, it is recommended to set
$runstyle="parallel". You need have PBS server installed in your system.
Parallel jobs will run faster since jobs are distributed among different
nodes. The default setting $runstyle="serial" will run all the jobs on a
single computer.
d) If the job has been executed partially and encounter some error, you can
rerun the main script without modification. It will check the existing
files and start from the correct position.
e) If you want to provide the cryo-EM density data to guide the assembly, please use
the option "-EMmap" and "-reso" and follw the explanation and example at
https://zhanggroup.org/I-TASSER-MD/explanation_EM.html
f) If you want to provide the cross link data or contact/distance to guide the
assembly, please use the option "CLink" and follw the explanation and example at
https://zhanggroup.org/I-TASSER-MD/explanation_CL.html
g) If you want to provide the experimental models for some domains, please prepare the
file in the following format:
The file starts with the domain definition of the query sequence in the first line.
The experimental domain information starts from the second line with the residue index
range of the domain wirten in the first line. See the detailed explanation and example
at https://zhanggroup.org/I-TASSER-MD/explanation_expdom.html

4.3 System requirement:

a) x86_64 machine, Linux kernel OS, Free disk space of more than 150G.
b) Perl, python, and java interpreters should be installed.
c) Basic compress and decompress package should be installed to support:
tar and bunzip2.
d) If you are using computer clusters, job management software PBS server should
support 'qsub' and 'qstat'. If using other job management software, such as
SGE and Slurm, some changes should be made following the instructions at:
https://zhanggroup.org/bbs/?q=node/3561

4.4. How to cite I-TASSER-MD and I-TASSER-MD Suite?

Xiaogen Zhou, Wei Zheng, Yang Li, Robin Pearce, Chengxin Zhang, Eric W. Bell, Guijun Zhang,
and Yang Zhang. I-TASSER-MD: A deep-learning based platform for multi-domain protein
structure and function prediction. Submitted, 2021.

Xiaogen Zhou, Jun Hu, Chengxin Zhang, Guijun Zhang, and Yang Zhang. Assembling multidomain
protein structures through analogous global structural alignments. Proceedings of the
National Academy of Sciences, 116: 15930-15938 (2019)

#######################################################
# #
# 5. Installation and implementation of FUpred #
# #
#######################################################

5.1. Introduction of FUpred

FUpred is a contact map-based domain prediction method which utilizes a recursion strategy
to detect domain boundary based on predicted contact-map and secondary structure information.
Large scale benchmark analysis shows that FUpred has significantly better ability of domain
boundary prediction than threading-based method and machine learning-based methods.
Particularly, our method has obviously excellent performance in detecting discontinuous
domain boundary than current methods.

5.2. How to install FUpred program?

When you unpack the I-TASSER-MD Suite, the FUpred program is already installed in
$pkgdir/FUpredmod.

5.3. How to run FUpred program?

Usage: $pkgdir/FUpredmod/run_FUpred.pl protein_name input_dir

To run FUpred, you need to prepare following input files:
'protein_name'--Mandatory, the name of the folder containg the sequence and density map
'input_dir'-----Mandatory, the directory which contains the query folder

Output files of FUpred include:
'FUpred.info'---The predicted domain boundary
'FUpred.2c'-----The FUscore for continuous domain detection
'FUpred.2d'-----The FUscore for discontinuous domain detection

A detailed readme file can be found at
https://zhanglab.dcmb.med.umich.edu/FUpred/download/FUpred/readme.txt

5.4. How to cite FUpred?

If you are using the FUpred program, you can cite:

Wei Zheng, Xiaogen Zhou, Qiqige Wuyun, Robin Pearce, Yang Li and Yang Zhang.
FUpred: Detecting protein domains through deep-learning based contact map
prediction. Bioinformatics, 36: 3749–3757, 2020.

#######################################################
# #
# 6. Installation and implementation of ThreaDom #
# #
#######################################################

6.1. Introduction of ThreaDom

ThreaDom (Threading-based Protein Domain Prediction) is a template-based algorithm for
protein domain boundary prediction. Given a protein sequence, ThreaDom first threads the
target through the PDB library to identify protein template that have similar structure
fold. A domain conservation score (DCS) will be calculated for each residue which combines
information from template domain structure, terminal and internal gaps and insertions.
Finally, the domain boundary information is derived from the DCS profile distributions.
ThreaDom is designed to predict both continuous and discontinuous domains.

6.2. How to install ThreaDom program?

When you unpack the I-TASSER-MD Suite, the ThreaDom program is already installed in
$pkgdir/ThreaDommod.

6.3. How to run ThreaDom program?

Usage: $pkgdir/ThreaDommod/runThreaDom.pl protein_name input_dir -libdir libdir

To run ThreaDom, you need to prepare following input files:
'protein_name'--Mandatory, the name of the folder containg the sequence and density map
'input_dir'-----Mandatory, the directory which contains the query folder
'libdir'--------Mandatory, the path of the template libraries

Output file of ThreaDom include:
'protein_name.sd'---The predicted domain boundary

A detailed readme can be found in th package.

6.4. How to cite ThreaDom?

If you are using the ThreaDom program, you can cite:

Yan wang, Jian Wang, Qiang Shi, Ruiming Li，Zhidong Xue, Yang Zhang. ThreaDomEx: a unified
platform for predicting continuous and discontinuous protein domains by multiple-threading
and segment assembly. Nucleic acids research. 45: W400-407, 2017.

#######################################################
# #
# 7. Installation and implementation of DeepMSA #
# #
#######################################################

7.1. Introduction of DeepMSA

DeepMSA is a new open-source method for sensitive MSA construction,
which has homolo- gous sequences and alignments created from multi-sources
of whole-genome and metagenome databases through complementary hidden
Markov model algorithms.

7.2. How to install DeepMSA program?

When you unpack the I-TASSER-MD Suite, DeepMSA program is already installed.

7.3. How to run DeepMSA program?

The DeepMSA main script is $pkgdir/contact/DeepMSA/scripts/build_MSA.py. The running
option of this program is similar to that in runI-TASSER.pl. By running
the program without argument, you can print all the running options.

7.4. How to cite DeepMSA?

If you are using the DeepMSA program, you can cite:

C Zhang, W Zheng, S M Mortuza, Y Li, Y Zhang. DeepMSA: constructing deep multiple sequence
alignment to improve contact prediction and fold-recognition for distant-homology proteins.
Bioinformatics 36:2105-2112 (2020).

#######################################################
# #
# 8. Installation and implementation of DeepPotential#
# #
#######################################################

8.1. Introduction of DeepPotential
DeepPotential is a method to predict the inter-residue spatial restraints
including distances, inter-residue torsion angles, and hydrogen-bonding networks
based on the ensemble of two complementary coevolution features coupling with
deep residual networks.

8.2. How to install DeepPotential?

When you unpack the I-TASSER-MD Suite, the DeepPotential program is already installed in
$pkgdir/distance/DeepPotential.

8.3. How to run DeepPotential program?

Usage: runDistPre.pl -s protein_name -outdir input_dir [Options]

To run DeepPotential, you need to prepare following input files:
'protein_name'--Mandatory, the name of the folder containg the sequence named as "seq.txt"
'input_dir'-----Mandatory, the directory which contains the query folder

Output file of DeepPotential include:
'distance_pca_*.txt'---The predicted CA atom distance
'distance_pcb_*.txt'---The predicted CB atom distance
'distance_pomg_20.txt, distance_pphi_20.txt, and distance_ptheta_20.txt'---The predicted
torsion angles
'distance_paa_.txt, distance_pbb_.txt, distance_pcc_.txt'---The predicted hydrogen-bonding
networks
'distance_ca_contact.txt'---The predicted CA contact
'distance_cb_contact.txt'---The predicted CB contact

A detailed readme can be found in th package.

8.4. How to cite DeepPotential?

If you are using the DeepPotential program, you can cite:

Li Yang, Zhang Chengxin, Zheng Wei, Zhou Xiaogen, Bell W. Eric, Yu Dongjun and Zhang Yang,
Protein inter-residue contact and distance prediction by coupling complementary coevolution
features with deep residual networks in CASP14. Proteins: Structure, Function, and
Bioinformatics, doi:https://doi.org/10.1002/prot.26211, 2021.

#######################################################
# #
# 9. Installation and implementation of D-I-TASSER #
# #
#######################################################

9.1. Introduction of D-I-TASSER

I-TASSER (Iterative Threading ASSEmbly Refinement) is a method for high-accuracy protein
structure and function prediction. Starting from a query sequence, I-TASSER first generates
inter-residue restraints by multiple deep neural-network predictors. It then identifies
structural templates from the PDB by multiple threading approach LOMETS2, with full-length
atomic models assembled by DeepPotential spatial restraints guided replica-exchange Monte
Carlo simulations.

9.2. How to install I-TASSER program?

When you unpack the I-TASSER-MD Suite, the D-I-TASSER program is already installed in
$pkgdir/I-TASSERmod.

9.3. How to run I-TASSER program?

The I-TASSER main script is $pkgdir/I-TASSERmod/runD-I-TASSER.pl. The running
option of this program is similar to run_I-TASSER-MD.py. By running
the program without argument, you can print all the running options.
A detailed readme can be found in th package.

9.4. How to cite I-TASSER?

If you are using the I-TASSER program, you can cite:

1. Wei Zheng, Chengxin Zhang, Yang Li, Robin Pearce, Eric W. Bell, Yang Zhang.
Folding non-homology proteins by coupling deep-learning contact maps with
I-TASSER assembly simulations. Cell Reports Methods, 1: 100014 (2021).
2. Y Zhang. I-TASSER server for protein 3D structure prediction.
BMC Bioinformatics, 9: 40 (2008).
3. A Roy, A Kucukural, Y Zhang. I-TASSER: a unified platform
for automated protein structure and function prediction.
Nature Protocols, 5: 725-738 (2010).
4. J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. The I-TASSER Suite: Protein
structure and function prediction. Nature Methods, 12: 7-8 (2015)

#######################################################
# #
# 10. Installation and implementation of DEMO #
# #
#######################################################

10.1. Introduction of DEMO

DEMO (Domain Enhanced MOdeling) is a method for automated assembly of full-length structural
models of multi-domain proteins. Starting from individual domain structures, DEMO first
identify quaternary structure templates that have similar component domains by domain-level
structural alignments using TM-align. Replica-exchange Monte Carlo simulations are then used
to assemble full-length models, as guided by the inter-domain distance profiles collected
from the top-ranked quaternary templates. The final models with the lowest energy are selected
from Monte Carlo trajectories, followed by atomic-level refinments using fragment-guided
molecular dynamics simulations. DEMO can be used to assemble domains from either experimental
or predicted models for proteins with both continuous and discontinuous domain architectures.

10.2. How to install DEMO program?

When you unpack the I-TASSER-MD Suite, DEMO programs are already installed.

10.3. How to run DEMO program?

Usage: $pkgdir/DEMOmod/DEMO sequence domain_folder [Options]

To run DeepPotential, you need to prepare following input files:
'sequence'-------Mandatory, the full-length sequence of the target
'domain_folder'--Mandatory, the directory which contains the domain model named as "dom1.pdb,
dom2.pdb,..."

Output file of DEMO include:
'fmodel*.pdb'----The full-length model assembled by DEMO

10.4. How to cite DEMO?

If you are using the DEMO program, you can cite:

#######################################################
# #
# 11. Installation and implementation of LOMETS2 #
# #
#######################################################

11.1. Introduction of LOMETS2

LOMETS2 (Local Meta-Threading-Server) is meta-server approach to protein
fold-recognition. It consists of 11 individual threading programs: CEthreader,
mCEthreader, eCEthreader, MUSTER, PPA, dPPA, dPPA2, sPPA, wPPA, wdPPA, wMUSTER.
The mCEthreader and eCEthreader are variances of CEthreader which includes
different scoring functions. The last 7 programs are variances of MUSTER
which includes different optimized energy terms.

11.2. How to install LOMETS2 program?

When you unpack the I-TASSER-MD Suite, LOMETS2 programs are already installed.

11.3. How to run LOMETS2 program?

The LOMETS2 main script is $pkgdir/I-TASSERmod/runLOMETS.pl. The running
option of this program is similar to that in 'runI-TASSER.pl'. By running
the program without argument, you can print all the running options.

11.4. How to cite LOMETS2?

If you are using the LOMETS2 program, you can cite:

Wei Zheng, Chengxin Zhang, Qiqige Wuyun, Robin Pearce, Yang Li, Yang Zhang.
LOMETS2: improved meta-threading server for fold-recognition and
structure-based function annotation for distant-homology proteins.
Nucleic Acids Research, 47: W429-W436 (2019)

S Wu, Y Zhang. LOMETS: A local meta-threading-server for protein
structure prediction. Nucleic Acids Research, 35: 3375-3382 (2007).

#######################################################
# #
# 12. Installation and implementation of FASPR #
# #
#######################################################

12.1. Introduction of FASPR

FASPR is a method for structural modeling of protein side-chain conformations.
Starting from a backbone structure, FASPR samples the side-chain rotamers for
each amino acid from the Dunbrack 2010 rotamer library with the atomic interaction
energies calculated using an optimized scoring function extended from EvoEF2, where
side-chain packing search is performed using a deterministic searching algorithm
combining self-energy checking, dead-end elimination theorems, and tree decomposition.

12.2. How to install FASPR program?

When you unpack the I-TASSER-MD Suite, FASPR program is already installed
at $pkgdir/Assbmod/bin/FASPR

12.3. How to run FASPR program?

Usage: FASPR input.pdb output.pdb

To run FASPR, you need to prepare following input files:
'input.pdb' Mandatory, input pdb file for side-chain packing.
'-s' Optional, the sequence of the input.pdb

Output files of FASPR include:
'output.pdb' output pdb file of the FASPR with side-chain packaged.

A detailed readme file can be found in the FASPR package

12.4. How to cite FASPR?

If you are using the FASPR program, you can cite:

Xiaoqiang Huang, Robin Pearce, Yang Zhang. FASPR: an open-source tool for fast and accurate
protein side-chain packing. Bioinformatics (2020) 36: 3758-3765.

#######################################################
# #
# 13. Installation and implementation of ModRefiner #
# #
#######################################################

13.1. Introduction of ModRefiner

ModRefiner is a standalone program for atomic-level protein structure
construction and refinement. It includes two steps: (1) construct
main-chain models from C-alpha trace; (2) build side-chain models
and atomic-level structure refinement.

13.2. How to install ModRefiner program?

When you unpack the I-TASSER-MD Suite, ModRefiner program is already installed
at $pkgdir/I-TASSERmod/ModRefiner.pl

13.3. How to use ModRefiner program?

ModRefiner supports following four options:

a) add side-chain heavy atoms to main-chain model without refinement
> ModRefiner.pl 1 ID MD IM ON

b) build main-chain model from C-alpha trace model
> ModRefiner.pl 2 ID MD IM RM ON

c) build full-atomic model from main-chain model
> ModRefiner.pl 3 ID MD IM RM ON

d) build full-atomic model from C-alpha trace model
> ModRefiner.pl 4 ID MD IM RM ON

ID: the path of the I-TASSER-MD package, e.g. '/home/yourname/I-TASSER-MD-1.0'
MD: directory which contains the initial model, e.g. '/home/yourname/I-TASSER-MD/5.0/example'
IM: the initial model to be refined, e.g. 'mode1.pdb'
RM: reference model that refined model is driven to, e.g. 'combo1.pdb'.
Only CA trace is needed and the length can be not full which will make
the refinement of the missing region flexible. If you don't have the
reference model, use the name of IM instead.
ON: the output name of the refined model, e.g. 'model1_ref.pdb'

By running the program without argument, you can print a brief description
of how to use the program.

13.4. How to cite ModRefiner?

If you are using the ModRefiner program, you can cite:

D Xu, Y Zhang. Improving the Physical Realism and Structural Accuracy of
Protein Models by a Two-step Atomic-level Energy Minimization.
Biophysical Journal, 101: 2525-2534 (2011)

#######################################################
# #
# 14. Installation and implementation of NWalign #
# #
#######################################################

14.1. Introduction of NWalign

NW-align is simple and robust alignment program for protein
sequence-to-sequence alignments based on the standard Needleman-Wunsch
dynamic programming algorithm. The mutation matrix is from BLOSUM62
with gap opening penalty=-11 and gap extension penalty=-1.

14.2. How to install NWalign program?

When you unpack the I-TASSER-MD Suite, NWalign program is already installed
at $pkgdir/bin/align.

14.3. How to use NWalign program?

> align F1.fasta F2.fasta (align two sequences in fasta file)
> align F1.pdb F2.pdb 1 (align two sequences in PDB file)
> align F1.fasta F2.pdb 2 (align Sequence 1 in fasta and 2 in pdb)
> align GKDGL EVADELVSE 3 (align sequences typed by keyboard)
> align GKDGL F.fasta 4 (align Seq-1 by keyboard and 2 in fasta)
> align GKDGL F.pdb 5 (align Seq-1 by keyboard and 2 in pdb)

By running the program itself, it will print out the usage options of
the program.

14.4. How to cite NWalign?

There is no published paper associated with this program. If you are using
the NWalign program, you can cite it as

Y Zhang, http://zhanglab.dcmb.med.umich.edu/NW-align

#######################################################
# #
# 15. Installation and implementation of PSSpred #
# #
#######################################################

15.1 Introduction of PSSpred

PSSpred (Protein Secondary Structure PREDiction) is a simple neural network
training algorithm for accurate protein secondary structure prediction. It first
collects multiple sequence alignments using PSI-BLAST. Amino-acid frequency and
log-odds data with Henikoff weights are then used to train secondary structure,
separately, based on the Rumelhart error back propagation method. The final
secondary structure prediction result is a combination of 7 neural network
predictors from different profile data and parameters.

15.2 How to install PSSpred program?

When you unpack the I-TASSER-MD Suite, NWalign program is already installed
at $pkgdir/PSSpred

15.3 How to use PSSpred program?

$pkgdir/PSSpred/mPSSpred.pl seq.txt $pkgdir $libdir

Please note that 'seq.txt' should be in current directory and the script will
generate two files 'seq.dat' and 'seq.dat.ss' in the current folder. Here,
$pkgdir is the root path of I-TASSER-MD package.

15.4 How to cite PSSpred?

If you are using the PSSpred program, you can cite:
http://zhanglab.dcmb.med.umich.edu/PSSpred

#######################################################
# #
# 16. Installation and implementation of COFACTOR #
# #
#######################################################

16.1 Introduction of COFACTOR

COFACTOR is a structure-based method for biological function annotation of
protein molecules. COFACTOR threads the structure through three comprehensive
function libraries by local and global structure matches to identify functional
sites and homology. Functional insights, including ligand-binding site,
gene-ontology terms and enzyme classification, will be derived from the best
functional homology template. The COFACTOR algorithm was ranked as the best
method for function prediction in the community-wide CASP9 experiments.

16.2 How to install COFACTOR program?

When you unpack the C-I-TASSER Suite, COFACTOR program is already installed
at $pkgdir/COFACTOR

16.3 How to use COFACTOR program?

$pkgdir/I-TASSERmod/runCOFACTOR.pl

16.4 How to interpret the results

If your input data is at $datadir/model1.pdb, the output of COFACTOR will be at
$datadir/model1/cofactor:
(1)List of similar structures in PDB: similarpdb_model1.lst. The columns are
(PDB_ID, TM-score, RMSD, Cov, Seq_id)
(2)Ligand-binding sites: BSITE_model1/Bsites_model1.dat. The columns are
(Rank, C-score, PDB_ID, TM-score, RMSD, Seq_id, Cov, Lig_name, SITE_num,
BS-score, LTM, BS_ID, BS_cov,BS_err, BS_ID1,BS_ID2, Binding residues)
(3)EC number: ECsearchresult_model1.dat The columns are
(PDB_ID, TM-score, RMSD, Seq_ID, Cov, EC-score, EC number,
Active site residues)
(4)GO terms: GOsearchresult_model1.dat. The columns are
(PDB_ID, TM-score, RMSD, Seq_ID, Cov, GO-score, GO terms)

16.5 How to cite COFACTOR?

If you are using the COFACTOR program, you can cite:

1. A Roy, J Yang, Y Zhang. COFACTOR: An accurate comparative algorithm for
structure-based protein function annotation.
Nucleic Acids Research, 40:W471-W477 (2012).
2. J Yang, A Roy, Y Zhang. BioLiP: a semi-manually curated database for
biologically relevant ligand-protein interactions.
Nucleic Acids Research, 41: D1096-D1103 (2013).

#######################################################
# #
# 17. Installation and implementation of COACH #
# #
#######################################################

17.1 Introduction of COACH

COACH is a meta-server approach to protein function annotations.
Starting from given structure of target proteins, COACH will generate
complementary ligand binding site predictions using two comparative methods:
TM-SITE and S-SITE, which recognize ligand-binding templates from
the BioLiP protein function database by binding-specific substructure and
sequence profile comparisons. These predictions will be combined with results
from COFACTOR to generate multiple function annotations, including
ligand-binding sites, enzyme commission and gene ontology terms.

17.2 How to install COACH program?

When you unpack the C-I-TASSER Suite, COACH program is already installed
at $pkgdir/COACH

17.3 How to use COACH program?

$pkgdir/I-TASSERmod/runCOACH.pl

17.4 How to interpret the results

If your input data is at $datadir/model1.pdb, the output of COACH will be at
$datadir/model1/coach:

(1) Ligand-binding sites: Bsites.dat. The columns are
(C-score, cluster_densitiy, product_of_top_templates_zscore,
Binding residues)
(2) Detailed clustering information: Bsites.inf, Bsites.clr, which list
the templates used in the cluster that generates the prediction in (1).
(3) Ligand-protein complex structures are with name: CH_complex*.pdb
(4) Predicions from COFACTOR, TM-SITE, and S-SITE are at, respectively:
$datadir/model1/cofactor
$datadir/model1/tmsite
$datadir/ssite

17.5 How to cite COACH?

If you are using the COACH program, you can cite:

1. J Yang, A Roy, Y Zhang. Protein-ligand binding site recognition using
complementary binding-specific substructure comparison and sequence profile
alignment. Bioinformatics, 29:2588-2595 (2013).
2. J Yang, A Roy, Y Zhang. BioLiP: a semi-manually curated database for
biologically relevant ligand-protein interactions.
Nucleic Acids Research, 41: D1096-D1103 (2013).