current user: public





Fold & Function Assignment System

Quick link to: Overview  References  Algorithm  Validation  Input  Output  Services  Databases  Updates  User accounts  FAQ Common mistakes  Privacy Policy 

Overview
The FFAS03 server provides an interface to the profile-profile alignment and fold recognition algorithm FFAS. A profile-profile alignment utilizes information present in sequences of homologous proteins to amplify the sequence conservation pattern defining the protein family. This method allows detection of remote homologies beyond the reach of other sequence comparison methods. Input into the FFAS03 server is a protein sequence provided by the user. From the sequence, a profile is generated that is then compared to several databases of sequence profiles of proteins and domains from public databases Databases. The databases are updated with the latest structural and sequence information (see Updates).
References
  • Rychlewski, L., Jaroszewski, L., Li, W. & Godzik, A. (2000). Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science 9, 232-241  Pubmed
  • Jaroszewski, L., Rychlewski, L. & Godzik, A. (2000). Improving the quality of twilight-zone alignments. Protein Science 9, 1487-1496  Pubmed
  • Jaroszewski, L., Rychlewski, L., Li, Z., Li, W. & Godzik, A. (2005) FFAS03: a server for profile-profile sequence alignments. Nucl. Acids Res. 33, W284-W288   Pubmed
  • Jaroszewski, L., Li, Z., Cai, XH., Weber C. & Godzik, A. (2011) FFAS server: novel features and applications. Nucl. Acids Res. 39, W38-W44   Pubmed
  • Xu D., Jaroszewski L., Li Z., Godzik A. (2013) FFAS-3D: Improving fold recognition by including optimized structural features and template re-ranking. Bioinformatics (2013) doi: 10.1093. PubMed

Important: Please also cite original resources that were used to build FFAS profiles if they were accessed from results of FFAS searches and used in your research.

Algorithm
  • Step 1: Calculation of a multiple sequence alignment using PSI-BLAST. Five iterations of PSI-BLAST are performed against the sequence pool database nr85s (see Databases). (PSI-BLAST results can be also provided by the user - see Input).
  • Step 2: Calculation of a sequence profile using sequences found by PSI-BLAST. Weights are assigned to sequences based on their uniqueness.
  • Step 3: Calculation of an alignment score. FFAS aligns profiles using a standard local-local dynamic programming algorithm. The value of the comparison score between positions n and m from the two profiles is calculated as a vector*matrix*vector product that includes the n-th column from the first profile, the substitution matrix BLOSUM62, and the m-th column from the second profile. The alignment score is then calculated using dynamic programming.
  • Step 4: Calculation of the final FFAS score. The alignment score is translated into the final FFAS score by comparing it with the distribution of scores obtained for pairs of unrelated proteins.

Validation
FFAS is regularly assessed in CASP and LIVEBENCH experiments. In the last LIVEBENCH evaluation, FFAS ranked at the top of all sequence-based methods. In addition, FFAS is continuously tested with benchmarks consisting of pairs of similar structures derived from the SCOP database. The current version of FFAS algorithm was optimized in 2003 using SCOP v.1.65 and it was retested in 2011 using only superfamilies that were added in later versions of SCOP and, thus, not used in any training set or algorithm optimization. Results of benchmarking of FFAS, PSI-Blast, and Blast: A) benchmark of all SCOP domains clustered at 25% seq. id. B) benchmark of very remote homologs consisting of pairs of domains from the same fold but from different superfamilies.

Input
The input for the FFAS03 server are amino-acid sequence(s) in FASTA format. Using meaningful FASTA description lines is highly recommended. The server accepts sequences containing from 25 to 2,000 residues. However, the algorithm is optimized for protein sequences from 50 to 500 residues and consisting of one or two domains. Sequences longer than 1,000 residues and sequences expected to contain multiple protein domains should be split into shorter fragments. FFAS searches can be further customized by calculating profiles from PSI-BLAST results provided by the user. If a PSI-BLAST result is uploaded (in text format) in the search tab, the FFAS server will not perform PSI-BLAST searches in the nr85s database and the profile will be calculated directly from user's PSI-BLAST result. This option can be used only for one query sequence; the query sequence used in PSI-BLAST search has to be provided in the upper pane.

  • Registered account: Users who use a registered account can submit up to 50 query sequences and can upload a fasta file with more sequences, all their results will be stored as one list and can be accessed later. If you submit a fasta sequence file, your email is required at login because we need to send you email once the job has finished. Moreover, we highly recommend that you create a separate account for this job if these sequences serve as a special purpose, otherwise results for this fasta file will be mixed with the results of all other individual sequences that you have submitted in your account before.
  • Public access: Queries containing single protein sequence can be submitted without using a registered account. Users are provided with a direct link to the results page.

Please, also see an example of ffas server input and resulting output.

Output
After finishing the FFAS search, the server automatically opens search results.

  • Registered account: The server opens a list of user's results with the most recent results shown on the top. Results for individual sequences can be opened by clicking links on that page.
  • Public access: The server directly open results page for the query sequence.

FFAS output page is organized as a series of tabs. Besides profile-profie search performed with FFAS algorithm, the server performs a PSI-BLAST search (using the PDB-BLAST protocol) and BLAST search. Each tab on the results contains results of search with one method against one database. Results obtained with the same method are grouped together. A separate tab contains results of the PSI-BLAST search against the nr85s database which are used to calculate the FFAS profile. Each tab contains master-slave alignments of the query sequence with sequences represented in a database of profiles (gaps in the query sequence are omitted in the master-slave format). Individual query-template alignments can be displayed via ali links next to each template sequence. A user can also display FFAS results for each template profile by clicking follow links. This feature may allow detection of very remote similarities by finding a protein or protein domain which is similar both to the query and to the template. If the template sequence is associated with a known structure, then the modeling tool can be launched via model links and the list of structures similar to each template can be opened via 3D-neighbors links.
Please, also see an example of ffas server input and resulting output.

Services
The FFAS03 server supports two main services:
  • Searching databases of sequence profiles (available via the search tab). This option calculates a profile from a sequence provided by a user and uses it to collects homologs from one or more template databases of protein profiles. In addition to homologs detected in profile database(s) by the FFAS method, the FFAS03 server also displays homologs collected by PSI-BLAST using PDB-BLAST protocol and templates detected with BLAST method. All homologs collected by PSI-BLAST from the sequence pool (NR85S) database (and used to calculate FFAS profiles) can also be displayed. See glossary for a list of methods used by the FFAS03 server.
  • Alignment of any two sequences provided by the user (available via the align 2 sequences, dot plot tab). This option calculates alignment and evaluates sequence similarity between two protein sequences. It also displays dotplot graphs - maps of local similarities between these profiles. Dotplot graphs are useful in detection of sequence repeats (of the protein sequence compared to itself) and also give a quantitative assessment of local alignment reliability (when two different sequences are aligned).

Databases
Two types of databases are used by the FFAS03 server:

1) Sequence-pool database (nr85s) - a large database of protein sequences that is used to calculate protein profiles. This database is searched automatically with every sequence submitted to the FFAS03 server. The results of these searches are available on the nr85s tab of FFAS results. The following sets of protein sequences are included in the nr85s database:


The above sets of sequences have been clustered with the CD-HIT program to reduce redundancy. The NR database has been clustered at 85% sequence identity and all metagenomic samples have been clustered at 60% sequence identity. Regions of low complexity have been masked with SEG program. This forms the default sequence-pool database that ffas uses to calculate the protein profile for the input query protein sequences

When "Option 5" was checked in the ffas input page, the following databases will be added to the above default sequence-pool database to calculate the protein profile.
  • Microbiomes from the Integrated Microbial Genomes Project(IMG/m) by DOE Joint Genome Institute(JGI). Currently total of 218 microbiomes are included. Initially there are about 37M protein sequences. They are then clustered at 90% sequence identity by CD-HIT to reduce redundancy. This resulted in about 30M sequences. Thus, please be aware the long waiting time if this option was chosen. You can save the link and come back to check the result later.


2) Template databases - sets of profiles available for FFAS searches. Comparisons between these sets of profiles are available on the tab available lists of results. The list of available template databases is given below. More information about template databases can be found in the original resources they were downloaded from.

  • The Protein Data Bank (PDB) - sequences from SEQRES records of PDB entries were clustered and only one representative was used for a cluster of identical sequences.
  • Structural Classification of Proteins (SCOP) - sequences of SCOP domains clustered at 40% sequence identity were downloaded from the Astral resource.
  • Protein families database (PFAM25U) - this is the PfamA version 25.0 release plus unpublished new families. One sequence representative was selected from each multiple sequence alignment representing each Pfam family and used to build FFAS profiles. FFAS automatically updates the unpublished new Pfam families monthly.
  • Clusters of Orthologous Groups (COGs) - one sequence representative was selected from each multiple sequence alignment representing each COG family and used to build FFAS profiles.
  • The Joint Center for Structural Genomics (JCSG) - sequences of all active targets from JCSG are represented by sequence profiles.
  • Human polymorphisms and disease mutations (HUMSAVAR) - all sequences from this resource are represented by sequence profiles (sequences longer than 1000 residues were split into overlapping fragments of 500 residues).
  • VFDB: Virulence Factors Database (VFDB) - all sequences from this resource are represented by sequence profiles.
  • HGM_OVER - 182 curated protein families overrepresented in human gut microbiome.
  • Human proteome - sequences of canonical isoforms of human proteins were downloaded from Uniprot database of complete proteomes. Sequences longer than 600 residues were split into overlapping fragments of 300 residues. Signal peptides predicted with SignalP program were removed from all sequences.
  • The proteomes of B.antracis, B.burgdorferi, B.thetaiotaomicron, C.acetobutylicum, C.crescentus, C.trachomatis, E.coli, E.rectale, H.pylori, M.genitalium, M.pneumoniae, M.tuberculosis, N.meningitidis, S.aureus, S.cerevisiae, S.typhi, T.maritima, and Y.pestis. Sequences were downloaded from the NCBI database of complete microbial genomes. Predicted proteins from E.coli O104:H4 were downloaded from Era7 Bioinformatics website (authors: Marina Manrique, Pablo Pareja-Tobes, Eduardo Pareja-Tobes, Eduardo Pareja, Raquel Tobes).

    All these proteomes were processed as follows:
    Signal peptides were predicted with the SignalP program and removed from all sequences. Proteins containing more than 1000 residues were split into overlapping fragments of 500 residues.


Important: Please cite these resources should be referenced if they were accessed via FFAS searches and used in your research.

Updates
Sequence-pool Database: Updates of the sequence-pool database are conducted every four months. Template Databases:A full update of all template databases of profiles is performed every four months in January, May, and September. Updates include all databases used by the server and all-to-all comparisons available on the tab available lists of results. In addition, the PDB database is updated every Wednesday. Please note that the incremental update of the PDB database does not include updates of the related lists of precalculated results and results of user's searches. In order to get updated results of FFAS searches you need to resubmit search queries.

User accounts
User accounts on the FFAS03 server are password-protected lists of results intended to facilitate and organize work on different projects and protect confidential data. It is recommended to use a registered account for all database searches submitted to the FFAS server. For instance, queries submitted from a registered account may contain up to 50 sequences, while queries submitted outside a registered account may contain only one sequence.
Registering a new account - can be done on the sign in/register tab and requires only providing an account name (lgoin) and a password in the right pane. Existing accounts can be accessed by providing an account name and a password in the left pane of the sign in/register tab.

Once signed in, all results of the searches performed by the user are automatically stored in her/his account. Results of pairwise alignments are not stored in any results list on the server but can be accessed for three month via their URL.

Frequently Asked Questions
  • How long does the server keep the user's results?
    FFAS03 will keep the results for one year in registered accounts. We recommend resubmitting queries again after an update of profile databases. Searches against PDB may be resubmitted more often to check whether new modeling templates are available from the PDB.

  • What if I forgot my password?
    FFAS03 stores user passwords in an encrypted form. Thus, we can't restore your password if you forget it. However, we can assign a new password to your account and send it to you. Please contact us if you forget your password.

  • How can you help if I have multiple queries to search?
    Calculation of sequence profiles is quite time-consuming even on our new FFAS cluster. Thus, if you have more than 50 queries and you need your results quickly, please contact us. If you prefer to submit sequences yourself, it's recommended that you divide the queries into portions in order to process them in parallel. In order to submit more than one sequence at a time it is necessary to register an account. Results submitted from a registered account will be stored as one list and can be browsed, filtered, and sorted.
  • What if I want to search a database not available on the server ?
    If you want to search a database which is not available from the server, please contact us. We welcome suggestions about adding new databases and features to the server.

Common mistakes
  • A missing new line character after the description line in FASTA format.
  • Submitting nucleotide sequences instead of protein sequences (amino-acid sequences).
  • Using characters other than 20 one-letter residue symbols.
  • Submitting more than one sequence in each field of the pairwise alignment form (align 2 sequences, dot plot tab).

Privacy Policy
We will only use your email address to send you news about FFAS, such as updates or upgrades of the FFAS system, or when any new features are added. Also if you want to submit a fasta file, your email is required because we need to send you email to inform you when your job has finished. The information you provide will be used solely to provide you with better FFAS services. It will not be released to any third party nor made public.

FFAS is supported by the NIH grant R01-GM087218-01
6 1 9 2 3   jobs submitted since Jan 1, 2011
Comments and questions to: webmaster
Locations of visitors to this page

Selected papers from Godzik Lab

Ying Zhang, Ines Thiele, Dana Weekes, Zhanwen Li, Lukasz Jaroszewski, Krzysztof Ginalski, Ashley Deacon, John Wooley, Scott Lesley, Ian Wilson, Bernhard Palsson, Andrei Osterman, Adam Godzik. Three-Dimensional Structural View of the Central Metabolic Network of Thermotoga maritima. Science. 2009 Sep 18;325(5947):1544-9.

Alexey M. Eroshkin, Andrew LeBlanc, Dana Weekes, Kai Post, Zhanwen Li, Akhil Rajput, Sal T. Butera, Dennis R. Burton, Adam Godzik. bNAber: database of broadly neutralizing HIV antibodies. Nucl. Acids Res. 2013; published on November 7, 2013.

Reed JC, Doctor KS, Godzik A. The domains of apoptosis: a genomics perspective. Sci STKE. 2004 Jun 22;2004(239):re9. Review.