ProtGPT3-MSA is an autoregressive protein language model that can be prompted with up to 15 homologous protein sequences to generate new, family-consistent sequences. (read the paper)
It operates in two modalities. In unaligned mode, homologs are supplied as raw sequences and the model returns a plain sequence. In aligned mode, homologs are supplied as a gapped multiple-sequence alignment and the model returns a new aligned (gapped) sequence consistent with that alignment.
Settings
- Upload a FASTA file of homologous sequences. If it contains more than 15 sequences, a random subset of 15 is sampled to build the prompt.
- Temperature sets how stochastic generation is: raise it (e.g. >= 1.0) for more diverse, exploratory sequences, or lower it (e.g. < 1.0) to keep generation conservative and close to the input family.
- Use aligned sequences — tick this box to run in aligned mode, in which the model generates aligned (gapped) sequences. Note that your uploaded homologs must then already be aligned (equal length, with gap characters).
Recommendation. ProtGPT3-MSA performs best when several homologs are provided. With very few sequences (fewer than 5), generation quality may be limited.
For full functionality and programmatic use, ProtGPT3-MSA is available on the Hugging Face Hub.
Cite work Garibbo, M., Boxo, G., Stocco, F., Illanes-Vicioso, R., Middendorf, L., & Ferruz, N. (2026). ProtGPT3: an Open-source family of Promptable and Aligned Protein Language Models. bioRxiv, 2026-06.