Analyse a protein

Bioinformatics Group Home

Analyse a Protein with AnaGram

Department of Genetics - contact - about

Bioinformatics Group Home


Welcome to the AnaGram home. AnaGram is a computational tool for protein function assignment based on detecting small significant fragments by identity that could act as the modular pieces in the peptide construction.

AnaGram assigns function to protein sequences based on finding correlations between short sequence signals and functional annotations in a protein database. The overall procedure is divided into two different successive stages. First, the query sequence is analyzed to find statistically significant subtle amino acid patterns in the database (Thode et al., 1996, Rodriguez et al., 2000), which are called protomotifs (because they do not constitute, separately, motifs with their own structural or functional organization) (Figure 1).

In the second step, the protomotifs are associated with the functional annotations derived from the original SWISS-PROT entries that gave rise to them, and then used with the aim of assigning functions to the analyzed sequence. At present is used the Keywords, Features, and References fields as informative lines on function from SWISS-PROT. The Keywords are used in order to assign the functions and the Features locate these functions in a specific sequence position (Figure 2).

A global scheme of the Keywords assignation is showed on the Figure 3.

Tips for analysing a protein sequence.

When a new protein sequence is analysed for predicting function, first it is recommendable searching for homolog sequences in the databases using a system as BLAST or FASTA. Then it is useful analysing your protein in search of amino acid patterns carrying out searchs in databases as PROSITE or SMART, and even the InterPro integration database. You can also search for the family of your protein using systems as Pfam what define amino acid domains within of your protein or ProtoMap.

Finally, you can extract supplementary information for your protein with systems as this presented here: AnaGram.With this analysis you can obtain information about: general function, domains or important punctual sites for your query protein. And it can be used in experiments of function definition, mutagenesis, drugs design, etc. In addittion, you can have functional information for proteins without similarity with others known proteins (when the above analysis methods are used). With AnaGram you will always have some results about the function of your protein even though being a rare one. This characteristic make also helpful AnaGram for finding genes or pseudogenes within DNA sequences because these sequences will give results while the non-coding sequences will usually not (Thode et al., 1996).


· Thode, G., Garcia-Ranea, J.A., Jimenez, J. (1996) Search for ancient patterns in protein sequences, J. Mol. Evol., 42(2), 224-33.

· Rodriguez, A., Thode, G., Perez A.J., Lopez A.D., Carazo, J.M. & Trelles, O. (2000) Mining Low-level similarity signals from Sequence Databases, SCI'2000 & ISAS'2000. Orlando, USA, July 23-26.

· Pérez, A.J., Rodríguez, A., Trelles, O. & Thode, G. (2002) A computational strategy for protein function assignment which addresses the multidomain problem, Comp. Funct. Gen., 3(5), 423-440. (Supplementary files).

· Pérez, A.J., Thode, G. & Trelles, O. (2004) AnaGram: protein function assignment, Bioinformatics, 20(2), 291-292.

Note: Please, if you use AnaGram in a published work include the last reference (Pérez et al., 2004).

Referenced in: Watson JD, Laskowski RA, Thornton JM (2005) Predicting protein function from sequence and structural data, Curr Opin Struct Biol, 2005 Jun;15(3):275-84.