Exons: Relevant bits of information for using the MUSC Shared Resource, the BioMolecular Computing Resource (BCR). EXON _4: Finding Sequence information Each of the programs listed uses a different search algorithm and in some circumstances the user is well advised to perform all of the searches in order to be assured of conducting a thorough search. I. GCG. GCG is started by typing "setgcg" then return then "gcg" followed by return). All of the GCG programs may be run in command line mode as indicated below OR as interactively set up jobs by typing only the program name OR from the graphic user interface GUI. -stringsearch: useful for locating sequences based upon a character string such as "myoglobin" or "alzheimer" or "phospholipase" or accession numbers like "m94967" or locus names such as "rahsb" *a menu: header information only is searched *b menu: full text fields are searched. This can take up to five times longer than the header only search. It's helpful to run such searches n "batch" ( ie with the option "-batch" entered on the command line) example -> erg25% stringsearch -batch -menu=b -infile=gb_pr:* -string=alzheimer \ -outfile=alzheimer.strings explanation -> At the UNIX prompt "erg25%" the user has told the system to run "stringsearch" in batch mode ("-batch") searching the entire text ("-menu=b") of every file in the primate division of genbank (gb_pr:*) for the character string "alzheimer" and to direct the output to a file called "alzheimer.strings". The "\" is the way to inform UNIX that input will continue on the next line. -names: useful for finding files with a known root name eg "humhk*" example -> erg25% names -infile=gb_pr:humhk* -outfile=humhk.names explanation -> At the UNIX prompt "erg25%" the user has told the system to run "names" . Names will search through the primate division of genbank (gb_pr:humhk*) for file beginning with humhk*. -wordsearch: example -> erg25% wordsearch -batch -since=6.93 -infile=favorite.sequence -list=100 -out=favorite.word explanation -> At the UNIX prompt "erg25%" the user has told the system to run "Wordsearch". Wordsearch uses "favorite.sequence" as a query sequence and searches for similar sequences only from among those sequences published since June of 1993(-since=6.93). The number of sequences retained in the list is limited to 100 (-list=100) and the output is written to a file called favorite.word (-out=favorite.word). -fasta: example -> erg25% fasta -check -batch -infile1=favorite.seq -infile2=gb_om:*\ -outfile=favorite.fasta explanation -> At the UNIX prompt "erg25%" the user has told the system to run "fasta". Fasta will run in batch mode (-batch) allowing the user to pursue other activities while the program runs. The user cannot remember what all the command line options are, so a request is made to review the options on screen (-check) prior to running. The user does remember how to indicate the query sequence (-infile1=favorite.seq ) , the database to be searched (other mammals in this case, ie "-infile2=gb_om:* ") and to specify the output file (-outfile= favorite.fasta). II. ATLAS. ATLAS is started by typing "setatlas" and pushing the return key then "atlas" followed by the return key). -bases: defines the databases to be searched. This should be the first command in an ATLAS session. bases lists the available databases bases p* selects the protein databases bases nrl* selects the NRL3d database bases gb* selects the all Genbank sections -find: similar to the GCG STRINGSEARCH "find copper-transporting ATPases" locates copper ATPases. -author: locates sequences associated with the query author name "author Gallo" locates sequences associated with "Gallo". -species: locates sequences associated with the query species "species Amphiuma" locates sequences of the species "Amphiuma". -accession: locates the sequence associated with the query accession number "accession S35704" retrieves the sequence . -list: scrolls the current list "list /output=cu_atpase" Stores the current list of sequences in cu_atpase. Each new search (author, find, species, accesion etc) overwrites the current list. You have to issue a "list /output=filename" command to create a permanent record. III. NETWORK ENTREZ. There are MacIntosh, WINDOWS and UNIX versions of this software. A resource file for the UNIX version is available from Dr. Starr Hazard (2-0715;hazards@musc.edu) upon request. With this file you will need to perform no configuration. Note that there is a MOSAIC site for Entrez as well which supports forms. Medline, Protein, Nucleotide databases may be selected. A different list of search terms may be selected for each database. To use the UNIX entrez you must: 1) have an xserver software i nstalled in your personal computer. OR 2) be sitting at one of the BCR workstations in the library. At the unix prompt type "mosaic" eg; If you running x-server software which is communicating with ERG.MUSC.EDU , then at the UNIX prompt type "Nentrez" eg: erg25% Nentrez IV. NCBI E-MAIL RETRIEVAL SERVICE. Setting up the request message which must be e-mailed to: retrieve@ncbi.nlm.nih.gov 1) the help file may be copied from the BCR section of the MUSC GOPHER or 2) send an e-mail message with one line saying "help" to retrieve@ncbi.nlm.nih.gov OR use the following guidelines With your mail program create a message with the following lines DATALIB GenBank MAXDOCS 30 MAXLINES 10000 BEGIN copper AND transporting AND ATPase The above message directs RETRIEVE to search Genbank, to retrieve a maximum of 30 documents OR a maximum of 10000 lines of output. BEGIN has no arguments and must appear on a line by itself. Sequences combining copper, transporting and ATPase are sought. The qualifier "OR" could also be used where appropriate. Send to "retrieve@ncbi.nlm.nih.gov". V. GOPHER. A UNIX resource file is now available for all BCR users. At the UNIX prompt type "gopher" eg: erg25% gopher The user then has accessed a fully function Gopher client. Typing "v" (see '?' for help') the user can view a pre-selected set of BCR bookmarks. It is now possible to select from among several remote archives for the retrieval of protein sequences and nucleic acid sequences. Bookmark files for WSGopher, and Turbogopher are also available by request. VI. MOSAIC/NETSCAPE. A Mosaic resource file is now available for all UNIX users who have suggested that they were potential GCG users. To use the UNIX Mosaic you must: 1) have an xserver software installed in your personal computer. OR 2) be sitting at one of the BCR workstations in the library. At the unix prompt type "mosaic" eg erg25% mosaic & erg25% netscape & The user may select several remote MOSAIC or gopher archives for the retrieval of protein sequences and nucleic acid sequences. BLAST searches, EMBL searches, sequence submission to Genbank, and a growing host of other services are available via Mosaic. Hotlists (Mosaic bookmarks) are available for MAC and WINDOWS Mosaic programs as well.