INSTALLATION AND IMPLEMENTATION OF DEMO SUITE
(Copyright 2021 by Zhang Lab, University of Michigan, All rights reserved)
(Version 2.0, 2021/12/2)

1. What is DEMO Suite?

The DEMO Suite is a composite package of programs for multi-domain protein
structure assembly. The Suite includes the following programs:

a) DEMO2: A program for multi-domain protein structure assembly
b) DeepMSA: A program for multiple sequence alignmnet generation
c) DeepPotential: A deep residual neural-network algorithm for inter-residue spatial restraints prediction
d) FASPR: A program for protein side-chain packing

2. How to install the DEMO Suite?

a) download the DEMO Suite 'DEMO-2.0.tar.gz' from
https://zhanggroup.org/DEMO2/download/
and unpack 'DEMO-2.0.tar.gz by
> tar -zxvf DEMO-2.0.tar.gz
The root path of this package is called $pkgdir, e.g.
/home/yourname/DEMO2. You should have all the programs under this
directory. You can install the package at any location on your computer.

b) Download DEMO2 library files from
https://zhanggroup.org/DEMO2/download/
The library needs about 120GB of the disk space.

c) Third-party software installation:
While the majority of programs in the package 'DEMO-2.0.tar.gz' are
developed in the Zhang Lab herein the permission of use is released,
there are some programs and databases (including blast, nr, uniclust30,
uniref90 and metaclust) which were developed by third-party groups. A default
version of blast and nr are included in the package. It is user's obligation to obtain
license permission from the developers for all the third-party software
before using them. In addition, your system needs to have
python3 (which supports pytorch >1.1.0) installed.

To use DeepMSA, you need download uniclust30, uniref90 and metaclust from
http://gwdu111.gwdg.de/~compbiol/uniclust/2017_04/uniclust30_2017_04_hhsuite.tar.gz ,
ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz ,
and https://metaclust.mmseqs.org/2017_05/metaclust_2017_05.fasta.gz. after you unpack
them, put the entire folder to the DEMO2 library folder. Then rename the folder
uniclust30_xxx_xxx to uniclust30, uniref90_xxx to uniref90, metaclust_xxx to metaclust.
Then use $pkgdir/external/hhsuite2/bin/esl-sfetch to create .ssi index for uniref90
and metaclust, here $pkgdir means the path where you put the DEMO suite package.
For example, if the uniref90 database in uniref90 folder is named as uniref90.fasta,
then go to uniref90 folder, run $pkgdir/external/hhsuite2/bin/esl-sfetch --index
uniref90.fasta, you will find a new file named as uniref90.fasta.ssi after the
command done. Then do the same thing to metaclust database. If you use different
version of uniclust30, uniref90 or metaclust, please go to $pkgdir/run_DEMO2.py,
change the variables:

hhblitsdb = "$libdir/uniclust30_2017_04"
jackhmmerdb = "$libdir/uniref90.fasta"
hmmsearchdb = "$libdir/metaclust_2017_05.clean.fasta"

3. Bug report:

Please report and post bugs and suggestions at the message board:
https://zhanggroup.org/forum/

#######################################################
# #
# 4. Installation and implementation of DEMO #
# #
#######################################################

4.1. Introduction of DEMO2

DEMO2 (Domain Enhanced MOdeling, version 2.0) is an improved version of DEMO for automated
assembly of full-length structural models of multi-domain proteins by integrating deep-learning
predicted inter-domain spatial restraints. Starting from individual domain structures,
quaternary structure templates that have similar component domains are identified by
domain-level structural alignments using TM-align. Meanwhile, inter-domain spatial restraints
are predicted by the deep residual neural-network-based predictor DeepPotential. Full-length
models are then created by a fast quasi-Newton optimization for rigid-body domain structure
assembly, which are guided by the DeepPotential predicted inter-domain restraints,
inter-domain distance profiles collected from the top-ranked quaternary templates, and
physics-based steric potentials. The final models are selected from the low energy
conformations and further refined with fragment-guided molecule dynamics simulations.
Large-scaled benchmark tests showed that the performance is significantly beyond its
predecessor.

4.2. How to run DEMO2?

a) Main script for running DEMO2 is $pkgdir/run_DEMO2.py, where "$pkgdir" is the
location of run_DEMO2.py script.
Run it directly without arguments will output the help information.

b) The following arguments must be set (mandatory arguments). One example is:

"$pkgdir/run_DEMO2.py protein_name input_dir sequence [Options]"

'protein_name' is the name of the folder containg the protein sequence and domain models
'input_dir' is the directory which contains the query folder
'sequence' is the full-chain sequence in FASTA format

c) Other arguments are optional whose default values have been set.
User can reset one or more of them. One example of command line is:

"$pkgdir/run_DEMO.py protein_name input_dir sequence -template XXX.pdb"

-template Provide the template strcuture to guide the domain assembly. The tmeplate
should be in PDB format.
-deepdist [no or yes], flag of predicted distance by DomainDist to guide the assembly.
The default value is "yes".
-EMmap The cryo-EM density map in MRC or CCP4 format.
-reso The resolution of the density map.
-CLink The cross link data (follw the format provided on websever).
-run [real, benchmark],"real" will use all templates, "benchmark"
will exclude homologous templates

d) Where are the final predicted results?
The following results are included in "/input_dir/protein_name":

"fmodel*.pdb" the final model assembled by DEMO
"cscore" the confidence score, estimated TM-score, and estimated RMSD
of the final model

NOTE:
a) Outline of steps for running DEMO2 by 'run_DEMO2.py':
a1) Prase user provided information
a2) run 'DeepPotential' to predict inter-residue spatial restraints of the full-chain
a3) run 'DEMO' to assemble all domain models into a full-length model
b) The domain pdb file should be named as dom1.pdb, dom2.pdb, dom3.pdb... in order.
They be put in "./input_dir/protein_name" before running this job.
c) 'seq.fasta' is the query sequence file in FASTA format. This file should be put
in "./input_dir/protein_name" before running this job.
c) If working on a cluster with multiple nodes, it is recommended to set
$runstyle="parallel". You need have PBS server installed in your system.
Parallel jobs will run faster since jobs are distributed among different
nodes. The default setting $runstyle="serial" will run all the jobs on a
single computer.
d) If the job has been executed partially and encounter some error, you can
rerun the main script without modification. It will check the existing
files and start from the correct position.
e) If you want to provide the cryo-EM density data to guide the assembly, please use
the option "-EMmap" and "-reso" and follw the explanation and example at
https://zhanggroup.org/DEMO2/explanation_EM.html
f) If you want to provide the cross link data or contact/distance to guide the
assembly, please use the option "CLink" and follw the explanation and example at
https://zhanggroup.org/DEMO2/explanation_CL.html

4.3 System requirement:

a) x86_64 machine, Linux kernel OS, Free disk space of more than 150G.
b) Perl and python interpreters should be installed.
c) Basic compress and decompress package should be installed to support:
tar and bunzip2.
d) If you are using computer clusters, job management software PBS server should
support 'qsub' and 'qstat'. If using other job management software, such as
SGE and Slurm, some changes should be made following the instructions at:
https://zhanggroup.org/bbs/?q=node/3561

4.4. How to cite DEMO2 or DEMO Suite?

Xiaogen Zhou, Chunxiang Peng, XXX, Guijun Zhang, and Yang Zhang. DEMO2: Multidomain protein
structures assembly by coupling structural analogous templates with deep-learning
inter-domain restraints. Submitted, 2021.

Xiaogen Zhou, Jun Hu, Chengxin Zhang, Guijun Zhang, and Yang Zhang. Assembling multidomain
protein structures through analogous global structural alignments. Proceedings of the
National Academy of Sciences, 116: 15930-15938 (2019)

#######################################################
# #
# 5. Installation and implementation of DeepMSA #
# #
#######################################################

5.1. Introduction of DeepMSA

DeepMSA is a new open-source method for sensitive MSA construction,
which has homolo- gous sequences and alignments created from multi-sources
of whole-genome and metagenome databases through complementary hidden
Markov model algorithms.

5.2. How to install DeepMSA program?

When you unpack the DEMO Suite, DeepMSA program is already installed.

5.3. How to run DeepMSA program?

The DeepMSA main script is $pkgdir/external/hhsuite2/scripts/build_MSA.py. By running
the program without argument, you can print all the running options.

5.4. How to cite DeepMSA?

If you are using the DeepMSA program, you can cite:

C Zhang, W Zheng, S M Mortuza, Y Li, Y Zhang. DeepMSA: constructing deep multiple sequence
alignment to improve contact prediction and fold-recognition for distant-homology proteins.
Bioinformatics 36:2105-2112 (2020).

#######################################################
# #
# 6. Installation and implementation of DeepPotential#
# #
#######################################################

6.1. Introduction of DeepPotential
DeepPotential is a method to predict the inter-residue spatial restraints
including distances, inter-residue torsion angles, and hydrogen-bonding networks
based on the ensemble of two complementary coevolution features coupling with
deep residual networks.

6.2. How to install DeepPotential?

When you unpack the DEMO Suite, the DeepPotential program is already installed in
$pkgdir/external/restriplet3.

6.3. How to run DeepPotential program?

Usage: runDistPre.pl -s protein_name -outdir input_dir [Options]

To run DeepPotential, you need to prepare following input files:
'protein_name'--Mandatory, the name of the folder containg the sequence named as "seq.txt"
'input_dir'-----Mandatory, the directory which contains the query folder

Output file of DeepPotential include:
'distance_pca_*.txt'---The predicted CA atom distance
'distance_pcb_*.txt'---The predicted CB atom distance
'distance_pomg_20.txt, distance_pphi_20.txt, and distance_ptheta_20.txt'---The predicted
torsion angles
'distance_paa_.txt, distance_pbb_.txt, distance_pcc_.txt'---The predicted hydrogen-bonding
networks
'distance_ca_contact.txt'---The predicted CA contact
'distance_cb_contact.txt'---The predicted CB contact
'distance_20.npz'---The predicted restraints (distances and orientations) in npz format

A detailed readme can be found in th package.

6.4. How to cite DeepPotential?

If you are using the DeepPotential program, you can cite:

Li Yang, Zhang Chengxin, Zheng Wei, Zhou Xiaogen, Bell W. Eric, Yu Dongjun and Zhang Yang,
Protein inter-residue contact and distance prediction by coupling complementary coevolution
features with deep residual networks in CASP14. Proteins: Structure, Function, and
Bioinformatics, 89: 1911-1921, 2021.

#######################################################
# #
# 7. Installation and implementation of FASPR #
# #
#######################################################

7.1. Introduction of FASPR

FASPR is a method for structural modeling of protein side-chain conformations.
Starting from a backbone structure, FASPR samples the side-chain rotamers for
each amino acid from the Dunbrack 2010 rotamer library with the atomic interaction
energies calculated using an optimized scoring function extended from EvoEF2, where
side-chain packing search is performed using a deterministic searching algorithm
combining self-energy checking, dead-end elimination theorems, and tree decomposition.

7.2. How to install FASPR program?

When you unpack the DEMO Suite, FASPR program is already installed
at $pkgdir/bin/FASPR

7.3. How to run FASPR program?

Usage: FASPR input.pdb output.pdb

To run FASPR, you need to prepare following input files:
'input.pdb' Mandatory, input pdb file for side-chain packing.
'-s' Optional, the sequence of the input.pdb

Output files of FASPR include:
'output.pdb' output pdb file of the FASPR with side-chain packaged.

A detailed readme file can be found in the FASPR package

7.4. How to cite FASPR?

If you are using the FASPR program, you can cite:

Xiaoqiang Huang, Robin Pearce, Yang Zhang. FASPR: an open-source tool for fast and accurate
protein side-chain packing. Bioinformatics (2020) 36: 3758-3765.