Fold & Function Assignment System
Quick link to: Overview
The FFAS03 server provides an interface to the profile-profile alignment and fold recognition algorithm FFAS. A profile-profile alignment utilizes information
present in sequences of homologous proteins to amplify the sequence conservation pattern defining the protein family. This method allows detection of remote
homologies beyond the reach of other sequence comparison methods. Input into the FFAS03 server is a protein sequence provided by the user. From the sequence, a
profile is generated that is then compared to several databases of sequence profiles of proteins and domains from public databases Databases. The
databases are updated with the latest structural and sequence information (see Updates).
Rychlewski, L., Jaroszewski, L., Li, W. & Godzik, A. (2000).
Comparison of sequence profiles. Strategies for structural predictions using sequence information.
Protein Science 9, 232-241
Jaroszewski, L., Rychlewski, L. & Godzik, A. (2000).
Improving the quality of twilight-zone alignments.
Protein Science 9, 1487-1496
Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. & Godzik, A. (2005)
FFAS03: a server for profile-profile sequence alignments.
Nucl. Acids Res. 33, W284-W288
Jaroszewski, L., Li, Z., Cai, XH., Weber C. & Godzik, A. (2011)
FFAS server: novel features and applications.
Nucl. Acids Res. 39, W38-W44
- Xu D., Jaroszewski L., Li Z., Godzik A. (2013) FFAS-3D: Improving fold recognition by including optimized structural features and template re-ranking.
Bioinformatics (2013) doi: 10.1093.
Important: Please also cite original resources that were used to build FFAS profiles if they were accessed from results of FFAS searches and used in your research.
- Step 1: Calculation of a multiple sequence alignment using PSI-BLAST.
Five iterations of PSI-BLAST are performed against the sequence pool database nr85s (see Databases). (PSI-BLAST results can be also provided by the user - see Input).
- Step 2: Calculation of a sequence profile using sequences found by PSI-BLAST. Weights are assigned to sequences based on their uniqueness.
- Step 3: Calculation of an alignment score. FFAS aligns profiles using a standard local-local dynamic programming algorithm. The value of the comparison score between positions n and m from the
two profiles is calculated as a vector*matrix*vector product that includes the n-th column from the first profile, the substitution matrix BLOSUM62, and the m-th column from the second profile. The
alignment score is then calculated using dynamic programming.
- Step 4: Calculation of the final FFAS score. The alignment score is translated into the final FFAS score by comparing it with the
distribution of scores obtained for pairs of unrelated proteins.
FFAS is regularly assessed in CASP
and LIVEBENCH experiments. In the last LIVEBENCH evaluation,
FFAS ranked at the top of all sequence-based methods. In addition, FFAS is continuously tested with benchmarks consisting of pairs of similar structures
derived from the SCOP database.
The current version of FFAS algorithm was optimized in 2003 using SCOP v.1.65 and it was retested in 2011 using only superfamilies that were added in
later versions of SCOP and, thus, not used in any training set or algorithm optimization. Results of benchmarking of FFAS, PSI-Blast, and Blast:
A) benchmark of all SCOP domains clustered at 25% seq. id. B) benchmark of very remote homologs consisting of pairs of
domains from the same fold but from different superfamilies.
The input for the FFAS03 server are amino-acid sequence(s) in FASTA format. Using meaningful FASTA description lines is highly recommended. The server accepts sequences containing from 25
to 2,000 residues. However, the algorithm is optimized for protein sequences from 50 to 500 residues and consisting of one or two domains. Sequences longer than 1,000 residues and sequences expected to contain multiple protein
domains should be split into shorter fragments. FFAS searches can be further customized by calculating profiles from PSI-BLAST results provided by the user. If a PSI-BLAST result is
uploaded (in text format) in the search tab, the FFAS server will not perform PSI-BLAST searches in the nr85s database and the profile will be calculated directly from user's PSI-BLAST result. This option can be used
only for one query sequence; the query sequence used in PSI-BLAST search has to be provided in the upper pane.
- Registered account: Users who use a registered account can submit up to 50 query sequences and can upload a fasta file with more sequences, all their results will be stored as one list and can be accessed later. If you submit a fasta sequence file, your email is required at login because we need to send you email once the job has finished. Moreover, we highly recommend that you create a separate account for this job if these sequences serve as a special purpose, otherwise results for this fasta file will be mixed with the results of all other individual sequences that you have submitted in your account before.
- Public access: Queries containing single protein sequence can be submitted without using a registered account. Users are provided with a direct link to the results page.
Please, also see an example of ffas server input and resulting
After finishing the FFAS search, the server automatically opens search results.
- Registered account: The server opens a list of user's results with the most recent results shown on the top. Results for individual sequences can be opened by clicking links on that page.
- Public access: The server directly open results page for the query sequence.
FFAS output page is organized as a series of tabs. Besides profile-profie search performed with FFAS algorithm, the server performs a PSI-BLAST search (using the PDB-BLAST protocol) and BLAST search. Each tab on
the results contains results of search with one method against one database. Results obtained with the same method are grouped together. A separate tab contains results of the PSI-BLAST search against the nr85s
database which are used to calculate the FFAS profile. Each tab contains master-slave alignments of the query sequence with sequences represented in a database of profiles (gaps in the query sequence are omitted
in the master-slave format). Individual query-template alignments can be displayed via ali links next to each template sequence.
A user can also display FFAS results for each template profile by clicking follow links. This feature may allow detection of
very remote similarities by finding a protein or protein domain which is similar both to the query and to the template. If the template sequence is associated with a
known structure, then the modeling tool can be launched via model links and the list of structures similar to each template can be opened via 3D-neighbors links.
Please, also see an example of ffas server input and resulting
The FFAS03 server supports two main services:
Searching databases of sequence profiles (available via the search tab).
This option calculates a profile from a sequence provided by a user and uses it to collects homologs from one or more template databases of protein profiles.
In addition to homologs detected in profile database(s) by the FFAS method, the FFAS03 server also displays homologs collected by PSI-BLAST using
PDB-BLAST protocol and templates detected with
All homologs collected by PSI-BLAST from the sequence pool (NR85S) database (and used to calculate FFAS profiles)
can also be displayed. See glossary for a list of methods used by the FFAS03 server.
Alignment of any two sequences provided by the user (available via the align 2 sequences, dot plot tab). This option calculates alignment and evaluates sequence similarity between two
protein sequences. It also displays dotplot graphs - maps of local similarities between these profiles. Dotplot graphs are useful in detection of sequence repeats
(of the protein sequence compared to itself) and also give a quantitative assessment of local alignment reliability (when two different sequences are aligned).
Two types of databases are used by the FFAS03 server:
1) Sequence-pool database (nr85s) - a large database of protein sequences that is used to calculate protein profiles.
This database is searched automatically with every sequence submitted to the FFAS03 server.
The results of these searches are available on the nr85s tab of FFAS results.
The following sets of protein sequences are included in the nr85s database:
The above sets of sequences have been clustered with the CD-HIT program to reduce redundancy.
The NR database has been clustered at 85% sequence identity and all metagenomic samples have been clustered at 60% sequence identity.
Regions of low complexity have been masked with SEG program. This forms the default sequence-pool database that ffas uses to calculate the protein profile for the input query protein sequences
When "Option 5" was checked in the ffas input page, the following databases will be added to the above default sequence-pool database to calculate the protein profile.
- Microbiomes from the Integrated Microbial Genomes Project(IMG/m) by DOE Joint Genome Institute(JGI). Currently total of 218 microbiomes are included. Initially there are about 37M protein sequences. They are then clustered at 90% sequence identity by CD-HIT to reduce redundancy. This resulted in about 30M sequences. Thus, please be aware the long waiting time if this option was chosen. You can save the link and come back to check the result later.
2) Template databases - sets of profiles available for FFAS searches. Comparisons between these sets of profiles are available on the tab available lists of results.
The list of available template databases is given below. More information about template databases can be found in the original resources they were downloaded from.
Important: Please cite these resources should be referenced if they were accessed via FFAS searches and used in your research.
Sequence-pool Database: Updates of the sequence-pool database are conducted every four months.
Template Databases:A full update of all template databases of profiles is performed every four months in January, May, and September. Updates include all databases used by
the server and all-to-all comparisons available on the tab available lists of results. In addition, the PDB database is updated every Wednesday. Please note that the incremental
update of the PDB database does not include updates of the related lists of precalculated results and results of user's searches. In order to get updated results of FFAS searches you need to resubmit search
- User accounts
User accounts on the FFAS03 server are password-protected lists of results intended to facilitate and organize work on different projects and protect confidential data. It is
recommended to use a registered account for all database searches submitted to the FFAS server. For instance, queries submitted from a registered account may contain up to 50 sequences,
while queries submitted outside a registered account may contain only one sequence.
Registering a new account - can be done on the sign in/register tab and requires only providing an account name (lgoin) and a password in the right pane.
Existing accounts can be accessed by providing an account name and a password in the left pane of the sign in/register tab.
Once signed in, all results of the searches performed by the user are automatically stored in her/his account.
Results of pairwise alignments are not stored in any results list on the server but can be accessed for three month via their URL.
- Frequently Asked Questions
- How long does the server keep the user's results?
FFAS03 will keep the results for one year in registered accounts. We recommend resubmitting queries again after an update of profile databases.
Searches against PDB may be resubmitted more often to check whether new modeling templates are available from the PDB.
- What if I forgot my password?
FFAS03 stores user passwords in an encrypted form. Thus, we can't restore your password if you forget it.
However, we can assign a new password to your account and send it to you. Please contact us if you forget your password.
- How can you help if I have multiple queries to search?
Calculation of sequence profiles is quite time-consuming even on our new FFAS cluster. Thus, if you have more than 50 queries and you need your results quickly, please contact us.
If you prefer to submit sequences yourself, it's recommended that you divide the queries into portions in order to process them in parallel.
In order to submit more than one sequence at a time it is necessary to register an account. Results submitted from a registered account will be stored as one list and can be browsed,
filtered, and sorted.
- What if I want to search a database not available on the server ?
If you want to search a database which is not available from the server, please contact us.
We welcome suggestions about adding new databases and features to the server.
- Common mistakes
- A missing new line character after the description line in FASTA format.
- Submitting nucleotide sequences instead of protein sequences (amino-acid sequences).
- Using characters other than 20 one-letter residue symbols.
- Submitting more than one sequence in each field of the pairwise alignment form (align 2 sequences, dot plot tab).
We will only use your email address to send you news about FFAS, such as updates or upgrades of the FFAS system, or when any new features are added. Also if you want to submit a fasta file, your email is required because we need to send you email to inform you when your job has finished. The information you provide will be used solely to provide you with better FFAS services. It will not be released to any third party nor made public.
FFAS is supported by the NIH grant R01-GM087218-01
Comments and questions to: webmaster
|| jobs submitted since Jan 1, 2011
Selected papers from Godzik Lab|
Ying Zhang, Ines Thiele, Dana Weekes, Zhanwen Li, Lukasz Jaroszewski, Krzysztof Ginalski, Ashley Deacon, John Wooley, Scott Lesley,
Ian Wilson, Bernhard Palsson, Andrei Osterman, Adam Godzik.
Three-Dimensional Structural View of the Central Metabolic Network of Thermotoga maritima.
Science. 2009 Sep 18;325(5947):1544-9.
Ye Y, Osterman A, Overbeek R, Godzik A.
Automatic detection of subsystem/pathway variants in genome analysis.
Bioinformatics. 2005 Jun 1;21 Suppl 1:i478-i486.
News from CHAVI-ID:|
bNAber: database of broadly neutralizing HIV antibodies.
Alexey M. Eroshkin, Andrew LeBlanc, Dana Weekes, Kai Post, Zhanwen Li, Akhil Rajput, Sal T. Butera, Dennis R. Burton, Adam Godzik.
Nucl. Acids Res.2013; published on November 7, 2013. Full paper.