INSTALLATION AND IMPLEMENTATION OF DEMO SUITE
   (Copyright 2021 by Zhang Lab, University of Michigan, All rights reserved)
                    (Version 2.0, 2021/12/2)

1. What is DEMO Suite?
   
   The DEMO Suite is a composite package of programs for multi-domain protein
   structure assembly. The Suite includes the following programs:

   a) DEMO2: A program for multi-domain protein structure assembly
   b) DeepMSA: A program for multiple sequence alignmnet generation
   c) DeepPotential: A deep residual neural-network algorithm for inter-residue spatial restraints prediction
   d) FASPR: A program for protein side-chain packing

   
2. How to install the DEMO Suite?

   a) download the DEMO Suite 'DEMO-2.0.tar.gz' from
      https://zhanggroup.org/DEMO2/download/
      and unpack 'DEMO-2.0.tar.gz by
      > tar -zxvf DEMO-2.0.tar.gz
      The root path of this package is called $pkgdir, e.g. 
      /home/yourname/DEMO2. You should have all the programs under this 
      directory. You can install the package at any location on your computer.
   
   b) Download DEMO2 library files from
      https://zhanggroup.org/DEMO2/download/
	  The library needs about 120GB of the disk space.

   c) Third-party software installation:
      While the majority of programs in the package 'DEMO-2.0.tar.gz' are
      developed in the Zhang Lab herein the permission of use is released,
      there are some programs and databases (including blast, nr, uniclust30,
	  uniref90 and metaclust) which were developed by third-party groups. A default 
	  version of blast and nr are included in the package. It is user's obligation to obtain
      license permission from the developers for all the third-party software 
      before using them. In addition, your system needs to have 
	  python3 (which supports pytorch >1.1.0) installed.

      To use DeepMSA, you need download uniclust30, uniref90 and metaclust from 
      http://gwdu111.gwdg.de/~compbiol/uniclust/2017_04/uniclust30_2017_04_hhsuite.tar.gz ,      
      ftp://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref90/uniref90.fasta.gz ,
      and https://metaclust.mmseqs.org/2017_05/metaclust_2017_05.fasta.gz. after you unpack 
	  them, put the entire folder to the DEMO2 library folder. Then rename the folder 
	  uniclust30_xxx_xxx to uniclust30, uniref90_xxx to uniref90, metaclust_xxx to metaclust. 
	  Then use $pkgdir/external/hhsuite2/bin/esl-sfetch to create .ssi index for uniref90 
	  and metaclust, here $pkgdir means the path where you put the DEMO suite package. 
	  For example, if the uniref90 database in uniref90 folder is named as uniref90.fasta, 
	  then go to uniref90 folder, run $pkgdir/external/hhsuite2/bin/esl-sfetch --index 
	  uniref90.fasta, you will find a new file named as uniref90.fasta.ssi after the 
	  command done. Then do the same thing to metaclust database. If you use different 
	  version of uniclust30, uniref90 or metaclust, please go to $pkgdir/run_DEMO2.py, 
	  change the variables:
	  
      hhblitsdb = "$libdir/uniclust30_2017_04"
      jackhmmerdb = "$libdir/uniref90.fasta"
      hmmsearchdb = "$libdir/metaclust_2017_05.clean.fasta"

     
3. Bug report:

   Please report and post bugs and suggestions at the message board: 
   https://zhanggroup.org/forum/


   #######################################################
   #                                                     #
   #     4. Installation and implementation of DEMO      #
   #                                                     #
   #######################################################
   
4.1. Introduction of DEMO2
   
   DEMO2 (Domain Enhanced MOdeling, version 2.0) is an improved version of DEMO for automated 
   assembly of full-length structural models of multi-domain proteins by integrating deep-learning 
   predicted inter-domain spatial restraints. Starting from individual domain structures, 
   quaternary structure templates that have similar component domains are identified by 
   domain-level structural alignments using TM-align. Meanwhile, inter-domain spatial restraints 
   are predicted by the deep residual neural-network-based predictor DeepPotential. Full-length 
   models are then created by a fast quasi-Newton optimization for rigid-body domain structure 
   assembly, which are guided by the DeepPotential predicted inter-domain restraints, 
   inter-domain distance profiles collected from the top-ranked quaternary templates, and 
   physics-based steric potentials. The final models are selected from the low energy 
   conformations and further refined with fragment-guided molecule dynamics simulations. 
   Large-scaled benchmark tests showed that the performance is significantly beyond its 
   predecessor.

4.2. How to run DEMO2?
   
   a) Main script for running DEMO2 is $pkgdir/run_DEMO2.py, where "$pkgdir" is the
	  location of run_DEMO2.py script.
      Run it directly without arguments will output the help information.

   b) The following arguments must be set (mandatory arguments). One example is: 

      "$pkgdir/run_DEMO2.py protein_name input_dir sequence [Options]"
	
	  'protein_name' is the name of the folder containg the protein sequence and domain models
	  'input_dir'    is the directory which contains the query folder
	  'sequence'	 is the full-chain sequence in FASTA format

   c) Other arguments are optional whose default values have been set.
      User can reset one or more of them. One example of command line is: 

      "$pkgdir/run_DEMO.py protein_name input_dir sequence -template XXX.pdb"

	  -template   Provide the template strcuture to guide the domain assembly. The tmeplate
				  should be in PDB format.
	  -deepdist   [no or yes], flag of predicted distance by DomainDist to guide the assembly. 
				  The default value is "yes". 
	  -EMmap      The cryo-EM density map in MRC or CCP4 format.
	  -reso       The resolution of the density map.
	  -CLink      The cross link data (follw the format provided on websever).
	  -run        [real, benchmark],"real" will use all templates, "benchmark"
				  will exclude homologous templates
				  
   d) Where are the final predicted results?
   	  The following results are included in "/input_dir/protein_name":

      "fmodel*.pdb"  the final model assembled by DEMO
	  "cscore"       the confidence score, estimated TM-score, and estimated RMSD 
					 of the final model

   NOTE:
   a) Outline of steps for running DEMO2 by 'run_DEMO2.py':
      a1) Prase user provided information
	  a2) run 'DeepPotential' to predict inter-residue spatial restraints of the full-chain
      a3) run 'DEMO' to assemble all domain models into a full-length model
   b) The domain pdb file should be named as dom1.pdb, dom2.pdb, dom3.pdb... in order.
	  They be put in "./input_dir/protein_name" before running this job.
   c) 'seq.fasta' is the query sequence file in FASTA format. This file should be put 
	  in "./input_dir/protein_name" before running this job.
   c) If working on a cluster with multiple nodes, it is recommended to set 
      $runstyle="parallel". You need have PBS server installed in your system. 
      Parallel jobs will run faster since jobs are distributed among different 
      nodes. The default setting $runstyle="serial" will run all the jobs on a 
      single computer.
   d) If the job has been executed partially and encounter some error, you can 
      rerun the main script without modification. It will check the existing 
      files and start from the correct position.
   e) If you want to provide the cryo-EM density data to guide the assembly, please use
      the option "-EMmap" and  "-reso" and follw the explanation and example at
	  https://zhanggroup.org/DEMO2/explanation_EM.html
   f) If you want to provide the cross link data or contact/distance to guide the 
	  assembly, please use the option "CLink" and follw the explanation and example at
	  https://zhanggroup.org/DEMO2/explanation_CL.html	  

4.3 System requirement:

   a) x86_64 machine, Linux kernel OS, Free disk space of more than 150G.
   b) Perl and python interpreters should be installed. 
   c) Basic compress and decompress package should be installed to support: 
      tar and bunzip2.
   d) If you are using computer clusters, job management software PBS server should 
      support 'qsub' and 'qstat'. If using other job management software, such as 
      SGE and Slurm, some changes should be made following the instructions at:
      https://zhanggroup.org/bbs/?q=node/3561

4.4. How to cite DEMO2 or DEMO Suite?

      Xiaogen Zhou, Chunxiang Peng, XXX, Guijun Zhang, and Yang Zhang. DEMO2: Multidomain protein 
	  structures assembly by coupling structural analogous templates with deep-learning 
	  inter-domain restraints. Submitted, 2021.
	  
	  Xiaogen Zhou, Jun Hu, Chengxin Zhang, Guijun Zhang, and Yang Zhang. Assembling multidomain 
	  protein structures through analogous global structural alignments. Proceedings of the 
	  National Academy of Sciences, 116: 15930-15938 (2019)
	
	
   #######################################################
   #                                                     #
   #  5. Installation and implementation of DeepMSA      #
   #                                                     #
   #######################################################
   
5.1. Introduction of DeepMSA
   
   DeepMSA is a new open-source method for sensitive MSA construction, 
   which has homolo- gous sequences and alignments created from multi-sources 
   of whole-genome and metagenome databases through complementary hidden 
   Markov model algorithms. 

5.2. How to install DeepMSA program?

   When you unpack the DEMO Suite, DeepMSA program is already installed.

5.3. How to run DeepMSA program?

   The DeepMSA main script is $pkgdir/external/hhsuite2/scripts/build_MSA.py. By running
   the program without argument, you can print all the running options.

5.4. How to cite DeepMSA?

   If you are using the DeepMSA program, you can cite:

   C Zhang, W Zheng, S M Mortuza, Y Li, Y Zhang. DeepMSA: constructing deep multiple sequence
   alignment to improve contact prediction and fold-recognition for distant-homology proteins. 
   Bioinformatics 36:2105-2112 (2020).


   #######################################################
   #                                                     #
   #  6. Installation and implementation of DeepPotential#
   #                                                     #
   #######################################################
   
6.1. Introduction of DeepPotential
	DeepPotential is a method to predict the inter-residue spatial restraints 
	including distances, inter-residue torsion angles, and hydrogen-bonding networks
	based on the ensemble of two complementary coevolution features coupling with 
	deep residual networks.
	
6.2. How to install DeepPotential?

   When you unpack the DEMO Suite, the DeepPotential program is already installed in 
   $pkgdir/external/restriplet3.

6.3. How to run DeepPotential program?

    Usage: runDistPre.pl -s protein_name -outdir input_dir [Options]
	
	To run DeepPotential, you need to prepare following input files:	   
	   'protein_name'--Mandatory, the name of the folder containg the sequence named as "seq.txt"
	   'input_dir'-----Mandatory, the directory which contains the query folder

	Output file of DeepPotential include:
       'distance_pca_*.txt'---The predicted CA atom distance
	   'distance_pcb_*.txt'---The predicted CB atom distance
	   'distance_pomg_20.txt, distance_pphi_20.txt, and distance_ptheta_20.txt'---The predicted
	   torsion angles
	   'distance_paa_.txt, distance_pbb_.txt, distance_pcc_.txt'---The predicted hydrogen-bonding
	   networks
	   'distance_ca_contact.txt'---The predicted CA contact
	   'distance_cb_contact.txt'---The predicted CB contact
	   'distance_20.npz'---The predicted restraints (distances and orientations) in npz format

    A detailed readme can be found in th package.

6.4. How to cite DeepPotential?

   If you are using the DeepPotential program, you can cite:

   Li Yang, Zhang Chengxin, Zheng Wei, Zhou Xiaogen, Bell W. Eric, Yu Dongjun and Zhang Yang, 
   Protein inter-residue contact and distance prediction by coupling complementary coevolution
   features with deep residual networks in CASP14. Proteins: Structure, Function, and 
   Bioinformatics, 89: 1911-1921, 2021.
   
   
   #######################################################
   #                                                     #
   #   7. Installation and implementation of FASPR       #
   #                                                     #
   #######################################################
   
7.1. Introduction of FASPR
   
   FASPR is a method for structural modeling of protein side-chain conformations. 
   Starting from a backbone structure, FASPR samples the side-chain rotamers for 
   each amino acid from the Dunbrack 2010 rotamer library with the atomic interaction 
   energies calculated using an optimized scoring function extended from EvoEF2, where 
   side-chain packing search is performed using a deterministic searching algorithm 
   combining self-energy checking, dead-end elimination theorems, and tree decomposition.

7.2. How to install FASPR program?

   When you unpack the DEMO Suite, FASPR program is already installed
   at $pkgdir/bin/FASPR

7.3. How to run FASPR program?
	
   Usage: FASPR input.pdb output.pdb

   To run FASPR, you need to prepare following input files:
       'input.pdb' 	 Mandatory, input pdb file for side-chain packing.
	   '-s'			 Optional, the sequence of the input.pdb

   Output files of FASPR include:
       'output.pdb'	 output pdb file of the FASPR with side-chain packaged.

   A detailed readme file can be found in the FASPR package

7.4. How to cite FASPR?

   If you are using the FASPR program, you can cite:

   Xiaoqiang Huang, Robin Pearce, Yang Zhang. FASPR: an open-source tool for fast and accurate
   protein side-chain packing. Bioinformatics (2020) 36: 3758-3765.