Exons:
Bits of information for using the MUSC Shared Resource,
the BioMolecular Computing Resource (BCR).

Exon_31 Running the version 3 FASTA programs at MUSC.

Consult the on-line FASTA manual for more details

Modify your ".cshrc" file to include the following line setenv FASTLIBS /usr/common/bcr/fastgbs This will give you access to standalone versions of the fasta search programs. To avoid conflict with GCG versions of the programs, some of the programs have the numeral "34" appended to their names.

Library search programs: FASTA34, FASTX34, FASTS34, TFASTA34, TFASTX34,TFASTS34, SSEARCH34

Local homology programs: LFASTA, PLFASTA, LALIGN, PLALIGN, FLALIGN

Statistical significance: PRDF, RELATE, PRSS, RANDSEQ

Global alignment: ALIGN

These programs can read GCG and fasta format files and search the same databases as the GCG versions of fasta programs. Typing "fasta" and hitting return launches the GCG fasta program. Typing "fasta3" and hitting return launches the native fasta program. In both cases, menus appear which prompt you for information the program needs to run such as input file names, databases to search etc.

Program Summary

Sequence search programs

FASTA universal sequence comparison. Defaults to comparing

protein sequences; if the sequences are > 85% A+C+G+T

or the -n option is used, a DNA sequence is assumed.

FASTX Search a protein sequence library using amino acid

sequence comparison to the forward three frames of a

translated DNA query sequence. (The reverse frames are

specified with the -i option.) Alignment scores allow

frameshifts; the final alignment uses a Smith-Waterman

type alignment routine (no limit on gaps) that allows

frameshifts.

TFASTA Search DNA library for a protein sequence by

translating the DNA sequence to protein in all six

frames (three forward frames with the -3 command line

option). TFASTA with ktup=2 is about as fast as a DNA

FASTA with ktup=4, and is substantially more sensitive.

(also reads the GENBANK library)

TFASTX Search DNA library for a protein sequence by

translating the DNA sequence to protein in all six

frames (three forward frames with the -3 command line

option) calculating similarity scores that allow

frameshifts. TFASTX produces an optimal Smith-Waterman

alignment of the query and translated-library sequence.

SSEARCH Universal sequence comparison using the Smith-Waterman

algorithm ( T. F. Smith and M. S. Waterman (1981) J.

Mol. Biol. 147:195-197). This program uses code

developed by Huang and Miller (X. Huang, R. C.

Hardison, W. Miller (1990) CABIOS 6:373-381) for

calculating the local similarity score and code from

the ALIGN program (see below) for calculating the local

alignment. SSEARCH is about 50-times slower than FASTA

with ktup=2 (for proteins).

ALIGN optimal global alignment of two sequences with no

short-cuts. This program is a slightly modified

version of one taken from E. Myers and W. Miller. The

algorithm is described in E. Myers and W. Miller,

"Optimal Alignments in Linear Space" (CABIOS (1988)

4:11-17).

Local similarity programs

LFASTA local similarity searches showing local alignments.

The algorithm used to calculate the local alignment in

a band has been improved (Chao, Pearson, and Miller,

submitted).

PLFASTA local similarity searches with plot output (on the IBM,

this program requires that the environment variable

BGIDIR be set).

PCLFASTA (unix only) local similarity searches with plot output

using pic commands.

LALIGN Calculates the N-best local alignments using a rigorous

algorithm. (N=10 by default.) The algorithm was

developed by Huang and Miller (X. Huang and W. Miller

(1991) Adv. Appl. Math. 12:337-357), which is a

linear-space version of an algorithm described by M. S.

Waterman and M. Eggert (J. Mol. Biol. 197:723-728).

Like SSEARCH, LALIGN is rigorous, but also very slow.

PLALIGN A version of LALIGN that plots its output to a screen

or to a Tektronix terminal emulator.

PRDF improved version of RDF program that includes accurate

probability estimates for all three scoring methods

(includes local or window shuffle routine)

PRSS A version of PRDF that uses the rigorous Smith-Waterman

calculation used by SSEARCH.

RANDSEQ produces a randomly shuffled sequence from a query sequence.

RELATE significance program described by Dayhoff (Atlas of

Protein Sequence and Structure, Vol. 5, Supplement 3).

Each chunk of 25 residues in one sequence is compared

to every 25 residue fragment of the second sequence.

Sequences which are genuinely related will have a large

number of scores greater than 3 standard deviations

above the mean score of all of the comparisons.

Other analysis programs

AACOMP calculate the amino acid composition and molecular

weight of a sequence.

BESTSCOR calculate the best self-comparison score.

GREASE Kyte-Doolittle hydropathicity profile

TGREASE graphic plot of Kyte-Doolittle profile

FROMGB convert from GenBank LOCUS format (also used by the

IBI-Pustell programs) to Pearson/FASTA format.

GARNIER A secondary structure prediction program using the

method of Garnier, Osgusthorpe, and Robson, J. Mol.

Biol., (1978) 120:97-120.


The BCR at the Medical University of South Carolina
URL: http://bcr.musc.edu/
Last modified: November 29, 2001- ESH

e-mail to Starr comments about this page.