Exons: Relevant bits of information for using the MUSC Shared Resource, the BioMolecular Computing Resource (BCR). EXON_21 Running Des Higgons' CLUSTALW multiple sequence alignment program. The CLUSTALW program currently available is version 1.73. CLUSATLW can read GCG multiple sequence files ( the output files produced by GCG PILEUP program eg xxx.msf ), CLUSTAL format aligned sequences (eg yyy.aln) and Pearson 'fasta' format non-aligned files among other formats. A useful procedure might be: 1) Construct file of files names you wish to align. This can be done with FASTA, STRINGSEARCH, LOOKUP WORDSEARCH, or manually. 2) Convert all sequences to 'fasta' format using the GCG program TOFASTA. TOFASTA understands the GCG "@" symbol so typing: tofasta @the_new.list Will produce an output file of concatenated, unaligned sequences which CLUSTALW can read. 3) Alternatively, retrieving files from internet sources with sequences in the "fasta" format. 4) Type "clustalw" and the program will start and display an initial menu allowing you to input sequences, proceed to multiple alignments, produce structurally constrained alignments or produce phylogenetic trees. At this point simply type '1' and enter the name of your GCG multiple alignment, the file of fasta format sequences or the CLUSTAL format files. 5) There is a concise on-line help menu. 6) The output options include one very useful option, the ability to create output alignments in several formats. The most useful are: GCG MSF format, PHYLIP format (see exon 22) and of course the default CLUSTALW "aln" format. 7) Its possible to run CLUSTALW in batch mode but not all of the options of the interactive mode are accessible. Here is an example of a shell file for running CLUSTALW /usr/common/bcr/clustalw \ /infile=xxxx.msf/align/outorder=aligned \ /kimura/outfile=xxxx.phylip/output=PHYLIP The first line starts the program. The "\" character is a UNIX line continuation control character. The second line defines the input file, tells the program to perform an alignment and to write the sequences out in the order of their alignment. The third line selects the kimura method for corrections for multiple mutations per site, names the output file and requests the PHYLIP output format 8) If the name of the above shell file is "shell_clust" then to run the program in batch you would type "at 9:35" then hit return, then type "sh shell_clust" and hit return finally type control-d. 9) The happy result of which should be to start the alignment at 9:30 and produce a phylip format multiple alignment. 10) Most of the time you will find that running clustalw is quite efficiently accomplished via the interactive mode.