Exons: Bits of information for using the MUSC Shared Resource, the BioMolecular Computing Resource (BCR). Exon_31 Running the version 3 FASTA programs at MUSC.Consult the on-line FASTA manual for more details
Modify your ".cshrc" file to include the following line setenv FASTLIBS /usr/common/bcr/fastgbs This will give you access to standalone versions of the fasta search programs. To avoid conflict with GCG versions of the programs, some of the programs have the numeral "34" appended to their names.
Library search programs: FASTA34, FASTX34, FASTS34, TFASTA34, TFASTX34,TFASTS34, SSEARCH34
Local homology programs: LFASTA, PLFASTA, LALIGN, PLALIGN, FLALIGN
Statistical significance: PRDF, RELATE, PRSS, RANDSEQ
Global alignment: ALIGN
These programs can read GCG and fasta format files and search the same databases as the GCG versions of fasta programs. Typing "fasta" and hitting return launches the GCG fasta program. Typing "fasta3" and hitting return launches the native fasta program. In both cases, menus appear which prompt you for information the program needs to run such as input file names, databases to search etc.
Program Summary
Sequence search programs
FASTA universal sequence comparison. Defaults to comparing
protein sequences; if the sequences are > 85% A+C+G+T
or the -n option is used, a DNA sequence is assumed.
FASTX Search a protein sequence library using amino acid
sequence comparison to the forward three frames of a
translated DNA query sequence. (The reverse frames are
specified with the -i option.) Alignment scores allow
frameshifts; the final alignment uses a Smith-Waterman
type alignment routine (no limit on gaps) that allows
frameshifts.
TFASTA Search DNA library for a protein sequence by
translating the DNA sequence to protein in all six
frames (three forward frames with the -3 command line
option). TFASTA with ktup=2 is about as fast as a DNA
FASTA with ktup=4, and is substantially more sensitive.
(also reads the GENBANK library)
TFASTX Search DNA library for a protein sequence by
translating the DNA sequence to protein in all six
frames (three forward frames with the -3 command line
option) calculating similarity scores that allow
frameshifts. TFASTX produces an optimal Smith-Waterman
alignment of the query and translated-library sequence.
SSEARCH Universal sequence comparison using the Smith-Waterman
algorithm ( T. F. Smith and M. S. Waterman (1981) J.
Mol. Biol. 147:195-197). This program uses code
developed by Huang and Miller (X. Huang, R. C.
Hardison, W. Miller (1990) CABIOS 6:373-381) for
calculating the local similarity score and code from
the ALIGN program (see below) for calculating the local
alignment. SSEARCH is about 50-times slower than FASTA
with ktup=2 (for proteins).
ALIGN optimal global alignment of two sequences with no
short-cuts. This program is a slightly modified
version of one taken from E. Myers and W. Miller. The
algorithm is described in E. Myers and W. Miller,
"Optimal Alignments in Linear Space" (CABIOS (1988)
4:11-17).
Local similarity programs
LFASTA local similarity searches showing local alignments.
The algorithm used to calculate the local alignment in
a band has been improved (Chao, Pearson, and Miller,
submitted).
PLFASTA local similarity searches with plot output (on the IBM,
this program requires that the environment variable
BGIDIR be set).
PCLFASTA (unix only) local similarity searches with plot output
using pic commands.
LALIGN Calculates the N-best local alignments using a rigorous
algorithm. (N=10 by default.) The algorithm was
developed by Huang and Miller (X. Huang and W. Miller
(1991) Adv. Appl. Math. 12:337-357), which is a
linear-space version of an algorithm described by M. S.
Waterman and M. Eggert (J. Mol. Biol. 197:723-728).
Like SSEARCH, LALIGN is rigorous, but also very slow.
PLALIGN A version of LALIGN that plots its output to a screen
or to a Tektronix terminal emulator.
PRDF improved version of RDF program that includes accurate
probability estimates for all three scoring methods
(includes local or window shuffle routine)
PRSS A version of PRDF that uses the rigorous Smith-Waterman
calculation used by SSEARCH.
RANDSEQ produces a randomly shuffled sequence from a query sequence.
RELATE significance program described by Dayhoff (Atlas of
Protein Sequence and Structure, Vol. 5, Supplement 3).
Each chunk of 25 residues in one sequence is compared
to every 25 residue fragment of the second sequence.
Sequences which are genuinely related will have a large
number of scores greater than 3 standard deviations
above the mean score of all of the comparisons.
Other analysis programs
AACOMP calculate the amino acid composition and molecular
weight of a sequence.
BESTSCOR calculate the best self-comparison score.
GREASE Kyte-Doolittle hydropathicity profile
TGREASE graphic plot of Kyte-Doolittle profile
FROMGB convert from GenBank LOCUS format (also used by the
IBI-Pustell programs) to Pearson/FASTA format.
GARNIER A secondary structure prediction program using the
method of Garnier, Osgusthorpe, and Robson, J. Mol.
Biol., (1978) 120:97-120.
The BCR at the Medical University of South Carolina
URL: http://bcr.musc.edu/
Last modified: November 29, 2001- ESH
e-mail to Starr comments about this page.