DeepMSA (version 2) is a hierarchical approach to create high-quality multiple sequence alignments (MSAs) for monomer and multimer proteins. The method is built on iterative sequence database searching followed by fold-based MSA ranking and selection. For protein monomers, MSAs are produced with three iterative MSA searching pipelines (dMSA, qMSA and mMSA) through whole-genome (Uniclust30 and UniRef90) and metagenome (Metaclust, BFD, Mgnify, TaraDB, MetaSourceDB and JGIclust) sequence databases. For protein multimers, a number of hybrid MSAs are created by pairing the sequences from monomer MSAs of the component chains, with the optimal multimer MSAs selected based on a combined score of MSA depth and folding score of the monomer chains. Large-scale benchmark data show significant advantage of DeepMSA2 in generating accurate MSAs with balanced depth and alignment coverage which are most suitable for deep-learning based protein and protein complex stucture and function predictions. To directly predict the structure model, please use DMFold server.

[Example output for monomer] [Example output for multimer]


[Example input of monomer] [Example input of complex]
Advanced options



(Please submit a new job only after your old job is completed)

References:
  • Wei Zheng, Qiqige Wuyun, Yang Li, Chengxin Zhang, P Lydia Freddolino, Yang Zhang. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nature Methods volume 21, pages279–289 (2024). https://doi.org/10.1038/s41592-023-02130-4.
  • Chengxin Zhang, Wei Zheng, S M Mortuza, Yang Li, Yang Zhang. DeepMSA: constructing deep multiple sequence alignment to improve contact prediction and fold-recognition for distant-homology proteins. Bioinformatics, 36: 2105-2112 (2020). [PDF]