Exons: Relevant bits of information for using the MUSC Shared Resource, the BioMolecular Computing Resource (BCR). Exon 11-The rapid searches. FASTA answers the question 'are there any sequences in the database which are similar to my query sequence?'. FASTA searches do not provide any formal statistical estimate of the similarities. Using the GCG command SHUFFLE which substitutes a random sequence with the same composition as your query sequence is A way to examine the validity of your FASTA searches. FASTA searches allow or gaps. FASTA searches are reportedly MORE SENSITIVE than comparable BLAST searches for nucleic acid sequences. FASTA will compare a protein sequence against a protein database OR compare a nucleotide sequence against a nucleotide database. You have the option of running these searches interactively (boring) or in batch ( the option or batch is "-batch"). You may: 1. Select all or a range of your sequence as the query. 2. Select all of the GenBank database or any subset. (eg GB-PR:*) 3. Select a locally produced database. ALL databases MUST be identified properly! The complete form is: database: * >database name , a colon and sequence name-the asterisk is a "wild card". examples gb_pr:* >searches ALL primate sequences in Genbank gb_pr:hum* >searches ALL primate sequences in Genbank which begin with the letters "hum" ides:* >searches ALL of the LOCAL database called ide (see the DATASET command and Exon _12) TFASTA answers the question 'what implied protein sequences in a nucleotidedatabase match my protein query? 'TFASTA REQUIRES a protein query sequence. ALL six reading frames of the nucleotide database are searched. TFASTA has a LARGE overhead and should probably NOT be run interactively! The BLAST searches: Blast searches are presently about the fastest means to answer the question "what sequences are similar to my query sequence?" From GCG Type "blast" at the prompt. Enter a query sequence and select a database to search. From these bits of information the GCGBlast interface determines which program to run, bundles your query sequence and sends a message to the NCBI BLAST server in Bethesda, MD. program query type database type BLASTP protein protein TBLASTN protein nucleic acid BLASTX nucleic acid protein BLASTN nucleic acid nucleic acid The GCG v8.0 allows BLAST searches of local databases. These databases are created with the program "TOBLAST". Notifying the system of the existence of your local database requires that you complete several tasks. 1. Create the database with the "toblast" program. TOBLAST can create a 'blastable" database (in a binary form) from any list of sequences. Output from WORDSEARCH and STRINGSEARCH are particularly useful input files for TOBLAST. 2. Fetch ( a GCG command) the file called "blast.sdbs". Type "fetch blast.sdbs". The system then writes the ascii file blast.sdbs to your current directory (it is advisable to write the file to your personal root directory). 3. Edit the blast.sdbs file to point to your newly minted local blastable database(s) by adding a line which includes a path, a name, a type (eg protein or nucleic acid) , and a comment. Example lines: /home/hazards/project1/ides n A nucleotide database /home/hazards/project2/ICE p A protein database You must add a name for each local database you create! 4. At the time you evoke the GCG BLAST interface you must add the option "-dat3=/path/blast.sdbs" This tells the system where to find the file that points it to ALL of your local databases. After indicating your query sequence you will see a screen of databases preceded by numbers. At the bottom of the screen it will say "local databases" and below that will be numbers of each of your databases. Selecting the number of one of your local databases will direct the GCG interface to use your query sequence and search the indicated local database. The UNIX BLAST programs perform all the same functions as the GCG interface EXCEPT that they may NOT search a local database. But you may still invoke the BLAST3 search from the UNIX command line. BLAST3 may NOT be selected from the GCG interface. These programs require that any query sequences be in the so-called fasta format. Any GCG file may be converted to fasta format with the command "tofasta". Any sequence retrieved from Network Entrez may be saved (an Entrez option) in fasta format directly. The UNIX BLAST command line is as follows: program database query sequence(fasta format) options ">" character outfilename and "&" character. eg blastx nr ides.tfa -x > ides.blastx & This line says that the program blastx is being called, the non-redundant PROTEIN database is selected ( selection of database is a consequence of selecting the program; there is also a non-redundant NUCLEIC ACID database), ides.tfa is a nucleic acid sequence in "fasta" format, -x selects a repeat filter, the ">" character is a UNIXism pointing to an output file named ides.blastx and the "&" character is another UNIXism which runs the program in background and frees you to continue issuing commands in the current login shell while the search is in progress. MAC BLAST The HYPERBLAST and BLAST1.4 programs are BLAST packages which accept sequences, configure a BLAST query, send the query and retrieve the results to your MAC. The programs are available (there may also be other sites) via FTP at: amber.mgh.harvard.edu Use "blast" as the username and "internet" as the password. There is a DOS BLAST program which accepts a query sequence, allows selection of the database, bundles the information, sends it to Bethesda, and retrieves the results. There is a WINDOWS program which CONFIGURES a query only. Sending the file is your responsibility.