Exons: Relevant bits of information for using the MUSC Shared Resource, the BioMolecular Computing Resource (BCR). EXON_12: Local Databases for FASTA and BLAST Searches Building local databases. At MUSC we receive the current bi-monthly release of Genbank and the quarterly release of PIR via CDROM; other files are retrieved via FTP. The contents are then re-formatted into the GCG searchable files. This process consists of a straightforward reformat (adding a pair of dots -see Exon_7). In addition the GCG software builds a series of indexes to allow the STRINGSEARCH of the databases. It may turn out that your research focuses on a class or a few classes of molecules/genes. Under these circumstances you might wish to consider building a local database which is specific for your molecule(s). For example you might build up a database of cytochromes C, or immunoglobulins etc. Such databases may be readily built from the GCG formatted databases by using one of two utilities and several search programs. The first step is to create a file of sequence names. This may of course be done in a tedious manual fashion. Alternatively, you might wish to make use of the output of the GCG stringsearch program or the wordsearch program. In the former, you are seeking to find particular character strings in the header fields of the sequence files of the archive. For example you might do a search for "homeobox" genes. Any sequence with the string "homeobox" would cause a record of pointers to that sequence to be written to an output file. Alternatively,you might wish to find a series of sequences which show sequence similarity to a query sequence of yours. WORDSEARCH can find similar sequences and writes the pointers to such sequences to an output file. The output files may be easily edited to add or remove sequences. For example a PIR search may reveal several proteins of a class but a nucleic acid search may reveal additional genes whose transcription products have not yet been added to the PIR archive. By manually translating the gene and manually editing the PIR stringsearch output file you may create a more inclusive listing. The next step is to use the output file or the edited output file as input for the utility program called "dataset". For instance, assume you have a file called 'homeobox_new.strings' which you wish to turn into a searchable database.Issue the command "dataset" and point the program to your file of sequence names, homeobox_new.strings. Like this: %dataset @homeobox_new.strings It's that easy. DATASET generally takes only a few minutes to run so updates are quite painless. The program does ALL the rest. You are prompted for a dataset name and that is as hard as you have to work. The program creates about 8 files, all with the same prefix (ie the name of your database). To make use of your new database you must specify it rather than a subset of GCG, PIR etc. Here's how to do that. Assume you called your database "hombox" When prompted for a set of sequences to search, reply with hombox:* And the program (stringsearch, or more likely FASTA or WORDSEARCH) uses your new local database! The GCG v8.0 allows BLAST searches of local databases. These databases are created with the program "TOBLAST". Notifying the system of the existence of your local database requires that you complete several tasks. 1. Create the database with the "toblast" program. TOBLAST can create a 'blastable" database (in a binary form) from any list of sequences. Output from WORDSEARCH and STRINGSEARCH are particularly useful input files for TOBLAST. 2. Fetch (a GCG command) the file called "blast.sdbs". Type "fetch blast.sdbs". The system then writes the ascii file blast.sdbs to your current directory (it is advisable to write the file to your personal root directory). 3. Edit the blast.sdbs file to point to your newly minted local blastable database(s) by adding a line which reads includes a path, a name, a type (eg protein or nucleic acid), and a comment. Example lines: /home/hazards/project1/ides n A nucleotide database /home/hazards/project2/ICE p A protein database You must add a name for each local database you create! 4. At the time you evoke the GCG BLAST interface you must add the option "-dat3=/path/blast.sdbs" This tells the system where to find the file that points it to ALL of your local databases. After indicating your query sequence you will see a screen of databases preceded by numbers. At the bottom of the screen it will say "local databases" and below that will be numbers of each of your databases. Selecting the number of one of your local databases will direct the GCG interface to use your query sequence and search the indicated local database.