CONSED 12.0 DOCUMENTATION CONTENTS: WHAT IS NEW IN CONSED 12.0 WHAT IS AUTOFINISH QUICK TOUR OF CONSED USING AUTOFINISH USING AUTOPCRAMPLIFY ADVANCED PHRAP/CONSED USAGE INSTALLING CONSED NOTE TO SGI USERS NOTE TO SOLARIS USERS FOR PROGRAMMERS AND FELLOW TRAVELLERS ONLY MONITORS AND MICE FOR CONSED HUGE ASSEMBLIES AUTOFINISH AND PRIMER-PICKING PARAMETERS NEW ACE FILE FORMAT WHAT THE COLORS MEAN ---------------------------------------------------------------------------- WHAT IS NEW IN CONSED 12.0 This section is mainly intended for advanced Consed users. Novice users should consult the Quick Tour (below). Every Consed/Autofinish site should immediately discontinue using older versions of Consed/Autofinish that contain bugs fixed in Consed 12.0. Older versions of Consed/Autofinish are no longer supported. If you have customized the scripts, you can continue using your old scripts, except for transferConsensusTags.perl--you must use version 011130 of that (supplied with Consed). Linux Improvements: * Autofinish on Linux can now handle larger amounts of memory (enough for most bacterial genomes). * Gnome users: the blank window problem is now fixed Each of these new features is explained in README.txt supplied with Consed. These are just the highlights. Consult README.txt for more new features. AutoPCRAmplify This is a new automated method of picking PCR primers to amplify particular regions. It has been extensively used already. New Autofinish Features: * Autofinish is now easier to use because there is a point-and-click method of changing Autofinish parameters. * You can mark the end of a BAC and Autofinish will not try to extend past the end of the BAC. * Smarter about which subclones to eliminate from using for read templates. New Consed Features: * Restriction Digest Display: graphically compares a real digest to the digest predicted from the sequence. * Highlights mismatching fragments * The predicted fragments can be sorted by size (like a real gel), or sorted by position in the contig. * Also can print out a text version * Enables to to easily see the sequence of a particular location * Allows zooming in or out * Bird's eye view of the assembly, showing pairs of forward and reverse reads that are consistent and inconsistent with the assembly. * Has various filters to eliminate clutter and pinpoint misassemblies * Indicates the confidence that the assembly is correct at a particular location * Allows zooming in or out * Shows contig order and orientation * Consed now recalculates the quality values when modifying the assembly (adding new reads, joining, tearing, pulling a read out of a contig) * You can now pull *any* read out of a contig, even if the read protrude out from the ends. * Automatically scale traces so you don't have to keep changing the amplification as you move from one part of a read towards another, or as you compare traces from different kinds of sequencing machines * Consed can now handle traces that are very long (up to about 200Mb) * When joining 2 contigs, there is a visual indication of which read came from which old contig * Consed now allows as many tag types as you like * If you wonder why Consed's primer picker didn't choose a particular primer, you can now ask it for the reason. * Large speed up in Consed's primer picking in large assemblies * Many user-friendly features * Many bugs fixed ---------------------------------------------------------------------------- WHAT IS AUTOFINISH? Autofinish automatically chooses reads for finishing. Autofinish sometimes is able to completely finish a project with no human decisions. In other cases Autofinish mostly finishes a project, and a human just needs to do the final difficult problems since all the routine problems have already been completed by Autofinish. Thus a human finisher is able to complete far more projects in the same length of time. Autofinish is flexible to the finishing strategy of your lab. It can be used to finish with just universal primer reads, just oligo walks, just minilibraries, or a combination of these. It can be used to finish either genomic or cDNA. Autofinish will do the following: -close gaps -improve sequence quality -determine the relative orientation of contigs -ensure that, at each consensus base, at least 2 reads from different templates are aligned (You can configure Autofinish to do any combination of these tasks.) Autofinish will suggest the following types of experiments: -universal primer reads (forward or reverse) -custom primer reads with subclone templates -custom primer reads with whole clone templates -minilibraries (transposon or shatter) from subclone templates (You can configure Autofinish to suggestion any combination of these experiments.) ---------------------------------------------------------------------------- QUICK TOUR OF CONSED Release 12.0 Consed is a program for viewing and editing assemblies assembled with the phrap assembly program. If you are already an advanced Consed user, you should read through this and do any of the exercises on features that you are unfamiliar with. I frequently run across people who are doing something in Consed a hard way month after month, and request a new feature to make things easier, when that new feature is already in Consed. If you have never used Consed before, to follow this Quick Tour will take you less than 4 hours. However, it will save you approximately 2 days in agony. If you have 2 extra days to spare, and prefer to waste them in agony, then do not do this Quick Tour and instead immediately skip down to 'INSTALLING CONSED' below. If you do the Quick Tour, start your system administrator installing consed (see INSTALLING CONSED below about 35 pages) because you will need to have completed that for some of the more advanced sections of the Quick Tour. When you do the quick tour, I encourage you to be free about changing the data set. If you really mess things up (such as changing all a read's bases to N's), no problem--just delete the data set and start again with a fresh copy. 1) After downloading the distribution with netscape (see www.phrap.org and click on 'Consed'), copy the distribution to a unix computer (if it is not already on one). Copy the distribution to a directory where you can put the sample datasets to be used by anyone learning Consed, and then unpack the files by typing the appropriate line below (which one depends on what you named the file downloaded by netscape): zcat consed_solaris.tar.Z | tar -xvf - zcat consed_alpha.tar.Z | tar -xvf - zcat consed_hp.tar.Z | tar -xvf - zcat consed_sgi.tar.Z | tar -xvf - zcat consed_linux.tar.Z | tar -xvf - Note: You must run tar on a UNIX computer--not on an NT computer, due to a difference in the handling of breaks between lines. 2) The only unix commands you must learn are the following 3: pwd -- this tells you were you are ls -- this tells you what files are there (Same as DIR in DOS) cd -- this moves you (Same as CD in DOS) That's it--use them a lot! USING CONSED GRAPHICALLY 3) Type the following: cd standard/edit_dir 4) start Consed by typing the appropriate command below: ../../consed_solaris ../../consed_alpha ../../consed_hp ../../consed_sgi ../../consed_linux (Don't worry about a message like: Warning: Cannot convert string "helvetica" to type FontStruct ) Two windows will appear. One of these will have the list of .ace files and say 'select assembly file to open' and 'standard.fasta.screen.ace.1'. Double click on "standard.fasta.screen.ace.1". The first window goes away. You will now see a list of one contig and a list of reads. This is the 'Main Consed Window'. Double click on 'Contig1'. The 'Aligned Reads Window' will appear. Try scrolling back and forth. Try scrolling by dragging the thumb of the scrollbar. Also try scrolling by clicking on the 4 << < > >> buttons for scrolling by small amounts. For scrolling by tiny amounts, click on the arrows at either end of the scrollbar. For scrolling by huge amounts, use the middle mouse button and just click on some location on the scrollbar. For scrolling to the beginning or end of the contig, use the <<< or >>> buttons. (Question: why can't you just move the scrollbar to the extreme right in order to go to the end of the contig? Answer: in typical assemblies, there are reads that protrude beyond the beginning of the contig and reads that protrude beyond the end of the contig. Moving the scrollbar to the extreme right will scroll the contig to the end of the rightmost read--typically far to the right of the end of the contig. Thus you should get in the habit of using the <<< and >>> buttons.) Notice the colors. Scroll to position 937 and notice the read 'a'. The red bases are the ones that disagree with the consensus. Notice the different shades of grey background (around the bases). They have the following meanings, but first, you need to understand the meaning of the quality values: A quality value of 10 means 1 error in ten to the 1.0 power A quality value of 20 means 1 error in ten to the 2.0 power A quality value of 30 means 1 error in ten to the 3.0 power A quality value of 40 means 1 error in ten to the 4.0 power and for quality values in between: A quality value of 25 means 1 error in ten to the 2.5 power Get the idea? (These have actually been empirically verified--if you are interested in the gory details, read the phred papers: Ewing B, Hillier L, Wendl M, Green P: Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Research 8, 175-185 (1998). Ewing B, Green P: Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Research 8, 186-194 (1998). In that same copy of the journal is a paper about Consed, as well.) Also notice the upper and lowercase. This is just a cruder indication of the quality of the bases. 5) To see the quality value of a particular base, point at it and click with the left mouse button. You will see the quality displayed in the Info Box at the bottom of the Aligned Reads Window. These quality values are shown in grey scales: Quality 0 through 4 is given by dark grey Quality 5 through 9 is given by a shade lighter Quality 10 through 14 is given by a shade still lighter . . . Quality of 40 through 97 is given by white (the brightest shade) A quality value of 99 is reserved for bases that have been edited and the user is absolutely sure of the base ('high quality edited'). A quality value of 98 is reserved for bases that have been edited and the user is not sure of the base ('low quality edit'). The ends of the reads shows bases that are grey and have a black background. These are the low quality ends of the reads or the unaligned ends of reads, as determined by phrap. 6) Click on a base on a read. Then hold down the control key and type 'a'. You will move to the beginning of the read. Hold down the control key and type 'e'. You will move to the end of the read. (Emacs users will recognize these commands.) 7) Scroll so that location 490 is about in the middle of the aligned reads window. Push the left mouse button down on the menu item 'Dim'. There will be a list of choices that will appear. Drag the cursor down to 'Dim Nothing' and release. Now look what happened to the color of the bases. The ends of the reads that used to be with a black background now appear red with a grey background. You are seeing the clipped-off bases with all the same information as any other base. Since there is a huge amount of red (discrepant) bases, the screen becomes distracting and busy. Thus by default the low quality clipped-off bases are made with a black background and a grey foreground so they don't distract you. Notice there is a distinction here between 'low quality ends of reads' and 'unaligned ends of reads'. Unaligned ends of reads can be low quality as well, or they can be high quality, as in the case of chimeric reads. Point with the mouse to a read name and hold down the right mouse button. You will notice there is a line that says "high quality from nnn to nnn; aligned from nnn to nnn; chem: prim". This is giving the same information in number form. Check that the numbers agree with the dimming. You can play with the dimming options a bit. Then return it to 'Dim Low Quality' for the rest of this tour. TRACES AND EDITING 8) Point with the mouse at a base of one of the reads and click with the middle mouse button. (If you have a 2 button mouse, see MONITORS AND MICE FOR CONSED below.) The Trace Window showing the traces for that stretch of read should popup. There are 2 rows of numbers: 'con' are the consensus positions 'rd' are the read positions There are 3 rows of bases in the trace window: 'con' is the consensus 'edt' is where you can edit the base calls of the read 'phd' is the original phred base calls Notice that a red rectangle blinks (the 'cursor') in the corresponding positions of the Aligned Reads Window and the Trace Window. 9) Try editing in the Trace Window. You can click the left mouse button on a base in the 'edt' line to set the cursor (a blinking red rectangle). You can directly overstrike a base by typing a letter. Try this. Try undoing it (by clicking on 'undo' ). If you want to undo more than one edit, you will have to go back to the main Consed window and click on the button labeled 'Undo Edit...'--you will learn that later. You can overstrike with the following characters: acgt (bases), * (a pad, in effect deleting the base), and mrwsykvhdb (IUB ambiguity codes). You can move left and right with the arrow keys. We believe that the user should change a base call only while examining the traces. That is why editing is done here--not in the Aligned Reads Window. 10) You can insert a column of pads by pushing the space bar. Try this. (You may need to click on a base on the 'edt' line first.) (For those of you new to editing assemblies, a 'pad', which in Consed and phrap is represented by the '*' character, is used to align two or more sequences such as these: gttgacagtaatcta gttgacataatcta in which one sequence has an inserted or deleted base with respect to the other. By inserting the pad character, it is possible to get a good alignment: gttgacagtaatcta gttgaca*taatcta This is the purpose of pad character--it is just a placeholder.) You can then overstrike a pad with a base. In this way you can insert a base, and still preserve the alignment. 11) Try highlighting a stretch of a read on the edt line by holding down the middle mouse button and dragging the cursor over some bases. They will turn yellow as you drag. Then release the mouse button. A window will pop up giving you some choices of what to do with those (yellow) bases.: Make High Quality--makes the highlighted bases edited high quality (99). This tells phrap (when it reassembles) that you are sure of the sequence here. Change Consensus--make the highlighted bases edited high quality and change the consensus to agree with that stretch of the read. This is a directive to phrap (upon reassembly) to use that stretch of that read to be the consensus. Make low quality--makes the highlighted bases edited low quality. This tells phrap (when it reassembles) that you are not sure of the bases here and phrap can go ahead and make a join even if the bases in this region don't match perfectly. Make Low Quality to Left End--same as above, but all the way to the left end of the read. Make Low Quality to Right End--same as above, but all the way to the right end of the read. Change to n's--Change the highlighted bases to n's which means they are unknown bases. This tells phrap (when it reassembles) to not make any join based on these bases. It is useful when you believe the bases may be in the chimeric portion of a read. Change to n's to left--same as above but to left end. Change to n's to right--same as above but to right end. Change to x's to left--Change the highlighted bases to x's which means they are vector. This tells phrap to ignore these bases for the purpose of determining overlap. Change to x's to right--same as above but to right end. Add Tag--allows user to add any tag to a stretch of read bases. Dismiss--you decided you don't really want to do anything with this stretch of bases. This popup is made so that nothing else works until you choose something. Try each of these choices, except for tags, which you'll try below. 'Change Consensus' has an additional function--if a read extends out on the right beyond the end of the consensus, you can extend the consensus by using this function. You might want to do this, for example, if crossmatch did not correctly find the cloning site and thus clipped too much. You can add these bases to the consensus by using 'Change Consensus'. Typically, the quality of these bases in the read and in the consensus is 99. That is so that next time phrap runs, it will correctly extend the consensus. However, if you aren't going to reassemble, you might want to just leave the quality values the way phred originally called them. You can do this by using a Consed parameter (consed.extendConsensusWithHighQuality), which you will learn more about later (see CONSED CUSTOMIZATION). 12) To delete a base, overstrike it with a '*' character. (Phrap ignores '*', so this is the same as deleting the character.) If you overstrike all bases in a column with * characters so the entire column consists of *'s (including the consensus base), there is no way to remove the column. This is OK since when you export the consensus (try the exercise on EXPORTING THE CONSENSUS), the *'s are not exported. While you are editing in Consed, we believe there should be a visual indication that a base was deleted. SCALING THE TRACES 13) Grab the thumb of the line that is labelled "V" (for Vertical magnification) and move it back and forth, noticing the effect on the traces. This is useful if the traces are too small or too large. There are several other methods of scaling the traces you will learn later. SAVING THE ASSEMBLY 14) To save the assembly, pull down the 'File' menu on the Aligned Reads Window, and release on 'Save assembly'. A box will pop up with a suggested name. I suggest you always use the one it suggests. The idea is that the ace files: (project).fasta.screen.ace.1 (project).fasta.screen.ace.2 (project).fasta.screen.ace.3 (project).fasta.screen.ace.4 (project).fasta.screen.ace.5 are in order of how old they are. If you feel you are taking up too much disk space, then start deleting the ace files starting at the oldest. I do not recommend that you overwrite existing ace files. The version numbers just keep growing, and that is not a problem. EXPORTING THE CONSENSUS 15) Exporting the consensus. Bring the Aligned Reads Window into view again. Hold down the left mouse button on the 'File' menu and release the button on 'Export consensus sequence'. Notice that the consensus will be stored (in this case) in a file called 'Contig1.fasta'. Click 'OK'. There is now a file in your edit_dir directory called 'Contig1.fasta' that has the consensus sequence in it. If you want to see the file, bring up another Xterm (if you are UNIX literate), and type: cd standard/edit_dir more Contig1.fasta 16) Fancier exporting the consensus. Bring the Aligned Reads Window into view again. Hold down the left mouse button on the 'File' menu but this time release on 'Export consensus sequence (with options)...'. Just export a little snip of the consensus, from 400 to 410. (You will notice this contains a pad * character.) Ask for both the bases file and the quality file. Click 'OK'. Consed will want to call this file 'Contig1.fasta' again. You can overwrite the existing file. Look in your other Xterm at these files: more Contig1.fasta more Contig1.fasta.qual The one file contains the bases (but no * pads) and the other contains the corresponding qualities of those bases. 17) Exporting the consensus of all contigs at once: Go to the Main Consed Window. Point to 'File', hold down the left mouse button, and release on 'Write all contigs to fasta file'. You then can choose a filename for all contigs to be written to. (In this project there is only 1 contig, so there is no difference between this option and just exporting a contig at a time.) 18) (For this step, first click on the 'Dim' menu and release on 'Dim Nothing'.) Point to the 'Color' menu, hold down the left mouse button and release on 'Color Means Edited and Tags'. Notice that the bases that you have edited (make sure you have edited some bases) will stand out in either white or grey (depending on whether the base was made high quality or low quality). Observe this both in the Trace Window and the Aligned Reads window. This colormode is useful if you are interested in easily spotting which bases are edited. Return to the 'Color Means Quality and Tags' colormode by the following: point to the 'Color' menu, hold down the left mouse button and release on 'Color Means Quality and Tags'. FIND MAIN WINDOW 19) On the Aligned Reads window, click on 'Find Main Win'. This will cause the Consed Main Window to pop up in the event you have buried it under other windows or iconified it. (This may not work with some settings of your X emulator. In that case you will have to find and click on the Main Window to bring it up.) MULTIPLE UNDO EDIT 20) Now that the Consed Main Window is visible, click the 'Undo Edit...' button. There will be a popup indicating the most recent edit. (If it says "no edits so far", then bring up a trace and make several edits. Then click on 'Undo Edit...' again.) Click 'undo'. Then you will see the edit that was done before that. Click 'undo'. You can continue undoing if you like. You now know how to undo more than one edit. You cannot choose which edits to undo and which to not undo--edits can only be undone in precisely reverse order from the order you made them. Once you save the assembly, you cannot undo prior edits. SCROLLING TRACES AND ALIGNED READS TOGETHER 21) In the Aligned Reads window, scroll along the contig to a different point. Click the left mouse button on a read whose trace is already up. Notice that the existing trace instantly scrolls to the corresponding location. Now go to the Trace Window and scroll the traces to a new location. Click on the edt line with the left mouse button. You will notice that the Aligned Reads window will instantly scroll to the corresponding location. Thus you can keep the Aligned Reads window and the traces scrolled to the same location. EXAMINING ALL TRACES 22) Go to a region where there are lots of reads, say base 1660. Push down the right mouse button and release on 'Display traces for all reads'. You will see all traces displayed in a scrolling window. You can drag the scrollbar on the right down and up to see all the traces. This feature is particularly useful for polymorphism/mutation detection work. This feature was added to work in cooperation with polyphred. To see it in action, exit Consed. EXITING CONSED 23) On the Aligned Reads Window, point to 'File' menu, hold down the left button and release on 'Quit Consed'. If it asks you some questions, answer 'Quit Without Saving and Discard .wrk File'. ASSEMBLY VIEW 24) Consed can show you a bird's eye view of the Assembly using forward/reverse pair information. We have a test database which shows its features. Type: cd assembly_view/edit_dir (You might need to type "cd ../.." first depending on where you are.) ls ../../consed_(computer_type) where (computer type) is one of solaris, hp, alpha, sgi, or linux Double click on "assembly_view.fasta.screen.ace.1" In the Main Consed Window, click on the button "Assembly View" which is near the upper left corner of the window. You should see 3 grey bars with pink labels "2", "3", and "1" (the "1" is obscurred by lots of red and purple lines between the "3" bar and the "1" bar), and some blue and purple triangles between the "2" bar and the "3" bar. The bars are the contigs: Pink "1" means Contig1, pink "2" means Contig2, etc. Notice the scale on the contigs. This gives the contig position. The green graph is highest around 7000 to 10000 of Contig2 and around 14000 of Contig3. The green graph indicates, for each base, the depth of subclone templates that have a consistent forward/reverse pair. A forward/reverse pair is "consistent" if the forward and reverse are pointing towards each other and are not too far away from each other (as defined by the maximum template size for the library). In other words, the green graph tells for each base, how many consistent forward/reverse pairs have that base between the forward read and the reverse read. This forward/reverse pair depth is not the same as read depth, which is typically much less. Forward/reverse pair depth is important in that it gives a measure of the confidence of the assembly at a base. If the forward/reverse pair depth is close to zero, as it is in Contig1 position about 9300, there is a likelihood that phrap has made an incorrect join. When the forward/reverse pair depth is zero, the green line turns red, as it does on the right end of Contig3. The red lines connect the right end of Contig3 with the middle of Contig1. These are filtered inconsisent forward/reverse pairs--they are "inconsistent" because they are not consistent (see above) and they are "filtered" in that they have another inconsistent read close by (at both ends) that is inconsistent for the same reason. If two red lines are on top of one another, it is displayed in purple so you know there is more than one there. This is a good example of a misassembly. There are many many reads at the right end of Contig3 that are paired with reads in the middle of Contig1. Notice that the forward/reverse pair depth of Contig1 is close to zero around base 9300. (You can use the "Zoom In" button to see this in more detail, but when you are done experimenting with the Zoom buttons and the scroll bar, click on "Zoom Orig" for the rest of this exercise.) This is where phrap made a bad join. If you tear the contig apart there, complement the left part of Contig1, and then join it to the right end of Contig3, the forward/reverse pairs will be happy. You will learn later how to do that. 25) Point to one of the red lines. You will notice that it turns yellow. the box near the bottom of the screen tells you a little more about what you have "highlighted" (turned yellow). If you want more information, click with the left mouse button. A window "Clicked Forward/Reverse Pairs" will appear giving information about each highlighted read. Try this. In the "Clicked Forward/Reverse Pairs" Window double click on one of the reads. The Aligned Reads Window should appear with the cursor on that read. This shows how to go from the Assembly View Window to the Aligned Reads Window. 26) You can also go from the Aligned Reads Window to the Assembly View Window. First you must make sure the Assembly View Window is already open (or else open it by clicking on Assembly View in the Main Consed Window). In the Aligned Reads Window, point to a read name, hold down the right mouse button, and release on "Find Read in Assembly View" (one of the last items in the menu the appears when you push down with the right mouse button). If the read is from a subclone that has a forward/reverse pair in the assembly, then the same "Clicked Forward/Reverse Pairs" Window will appear. It will contain not only the read that you pointed to, but all of the other reads from the same subclone as the one you pointed to. In the Assembly View Window, all of these reads are highlighted. You can use this procedure to go within the Aligned Reads Window from forward read to reverse read or visa versa. 27) Notice the aqua and purple lines that connect the right end of Contig2 to the left end of Contig3. These are consistent gap-spanning forward/reverse pairs. If there is more than one pair on top of each other, the color is purple. These are the reads that tell you (and Consed, Autofinish, and Phrap) that the right end of Contig2 is connected to the left end of Contig3. As above, point to one to highlight it and click on it to see more information. 28) You can see much more information by clicking on the "What to Show" button in the lower right corner. Up will pop the "What to Show in Assembly View" Window. 29) Click on "All" next to "Show Inconsistent Forward/Reverse Pairs". Then click "Apply" at the bottom of this window. In this particular example, you just see a few more stray red lines. In a real example, you would probably see so many red lines that it would be a mess. In most cases those inconsistent forward/reverse pairs would be just caused by some laboratory problem (turning a plate around, mislabelling, etc) and not to any misassembly. Thus I suggest that you only generally leave "Show Inconsistent Forward/Reverse Pairs" to "Filtered". 30) Click on "Show each consistent fwd/rev pair within contigs" and click "Apply". This will show a blue (or purple if there is more than one at a location) square for each consistent forward/reverse pair within a contig. The horizontal position of the square is the center of the subclone (midway between the forward and reverse read) and the vertical position of the square indicates the size of the subclone (higher means a larger subclone). If you really want to see the position of the forward and reverse reads, you can do that too: Click on "Show legs on squares for consistent fwd/rev pairs" ("Show each consistent fwd/rev pair within contigs" must be still on) and click "Apply". What a mess! I believe most of this information is much more easily understood by just showing the "consistent fwd/rev pair depth" (the green graph described above). But it is your choice. When you want to highlight a consistent fwd/rev pair, you must point to the square--not the legs. Try it so you understand. 31) Try turning on/off each of the options so you understand them. CONSED-POLYPHRED INTERACTION Polyphred is a program for finding polymorphic sites; it was developed by Debbie Nickerson's group (contact them at http://droog.mbt.washington.edu). We have a test database, 'polyphred', which has had polyphred run on it already. Polyphred has put a polymorphism tag on each polymorphic site. Type: cd polyphred/edit_dir (You might need to first type "cd ../.." depending on where you are.) ls ../../consed_(computer type) where (computer type) is one of solaris, hp, alpha, sgi, or linux. Double click on example2.fasta.screen.ace.1 When Consed comes up, you should see 2 contigs. Double click on Contig2 In the Aligned Reads Window, push the left mouse button while pointing to the 'Navigate' menu and release on: 'Toggle feature: when navigating to consensus location, pop up all traces (currently off)' That will turn this feature on. Now push the left mouse button while pointing to the 'Navigate' menu and release on 'Tags'. Up should pop a list of tag types. Double click on 'polymorphism'. Polyphred has already been run so the consensus is tagged with polymorphism tags at each polymorphic site. Up will pop a window labelled 'Polymorphism Tags' with a list of sites. Click on 'Next'. If you correctly followed the instructions above, all the traces should pop up at the first polymorphic site. You may want to reposition the traces window to see it better. Now ignore the original 'Polymorphism Tags' window and instead click on 'Next' in the *traces* window. This will take you to the next polymorphic site. Pretty nice, huh? 32) ALPHABETICAL ORDERING OF READS The reads can be ordered in two ways: a) alphabetically b) first all the top strand reads and then all the bottom strand reads. The top strand reads are then ordered by the left end of the reads. Same with the bottom strand reads. Try changing between a) and b). In the Main Consed Window (click on 'Find Main Win' on the Aligned Reads Window if you can't find the Main Consed Window because it is covered up with other windows), pull down the 'Options' menu, and release on 'General Preferences'. Scroll down until you find 'Display reads sorted alphabetically or by strand/left end of read.' Switch it between 'alpha' and 'strand'. Then click 'Apply and Dismiss'. Notice the effect in the Aligned Reads Window. Many polymorphism and mutation detection labs find that alphabetically sorting is most useful, while many genomic sequencing labs find that sorting by strand/left end of read is most useful. After you are done playing with these features, exit Consed and go back to the previous database: cd standard/edit_dir (You might need to first type "cd ../.." depending on where you are.) ls ../../consed_(computer type) Double click on standard.fasta.screen.ace.1 When it says "There is an edit history file (a .wrk file)...Do you want to apply those edits?", click on "no". Double click on Contig1 to bring up the Aligned Reads Window again in preparation for the next step. NAVIGATING 33) In the Aligned Reads window, pull down the Navigate menu and release on 'Low consensus quality'. You will see a list of locations. Move the 'Low consensus quality' window down so you can see the Aligned Reads window. Repeatedly click on 'Next' until you reach the end of the list. (Low consensus quality means an area in which the bases each have too high probability of being wrong.) This saves you from having to look through large amounts of high quality data trying to find problem areas. There are 2 'Next' buttons--one on the Aligned Reads Window and one on the Low Consensus Quality Window. You can click on either, but it is probably more convenient to use the 'Next' button on the Aligned Reads Window. Thus you can keep the Aligned Reads Window in front with input focus and keep the Low consensus quality window pushed out of the way. You may want to click on the 'Save' button in the Low consensus quality Window to save to a file a copy of this list of problem areas as you work through them. In our experience, this will be the most important navigate list you will use. In fact, finishing consists mainly of adding reads and rephrapping until this list is reduced to nothing. 34) Dismiss the Low consensus quality window. Pull down the 'Navigate' menu again and release on 'High quality discrepancies as above, but omitting tagged compressions and G_dropouts'. You will probably notice there are no entries (unless you created some yourself by editing). That is because there are no high quality discrepancies with this dataset. So let's force there to be some by lowering the quality threshold. First, dismiss the High quality discrepancies window. Click on 'Find Main Win'. In the Main Consed Window, pulldown the 'Options' menu and release on 'General Preferences'. Notice that the default for 'Threshold for High Quality Discrepancy' is 40. Change it to 15 and click 'Apply & Dismiss'. Then follow the steps above to bring up the High quality discrepancies menu. Now you will see several entries. Click 'next' repeatedly to go successively to the next high quality discrepancy in the Aligned Reads Window. You can also double click on a particular line in the High quality discrepancies window to go to that location. Alternatively, you can single click on a line and then click the 'Go' button. Dismiss the High quality discrepancies window. 35) Similarly, try the other navigate lists: Unaligned high quality regions (this list will be empty with this data set), Edits, Regions covered by only 1 strand and only 1 chemistry, and Regions covered by only 1 subclone. Unaligned high quality regions are regions in which the traces are high quality so there is no question of the bases, but the region differs so much from other reads that phrap has given up trying to align the region with the consensus. This could be due to a chimeric read, or perhaps the read belongs somewhere else. We believe that regions covered by only 1 subclone should be covered by a 2nd subclone to prevent the possibility of there being a deletion in the single subclone. There are so many different problem lists that you may forget to check one of them and thus miss a serious problem. Thus we combined them all into a single list. This is the first menu item: 'Low Cons/High Qual Discrep/Single Stranded/Single Subclone/Unaligned High'. We suggest you use this list. 36) Also try navigate by tags by selecting 'tags' under navigate: when the Select Tag Type Window appears, double click on 'compression'. (Note that you can't do anything else until you deal with this window.) This gives a list of a particular tag type in a particular contig. 37) There is also a way of getting a list of a particular tag type in all contigs: Click on 'Find Main Win'. In the Main Consed Window, point to the 'Navigate' menu, hold down the left mouse button, and release on 'Tags in all contigs'. Continue as in the previous step. (Since there is only one contig, this list will not be any different than the corresponding list for Contig1.) PRIMER-PICKING To do this step, you must have first completed the INSTALLING CONSED section (far below). So, if you haven't done that yet, please complete that first. 38) Go to some location near the right end of the contig, say base 2470. Click with the right mouse button on the consensus and click on either one of the top strand primer choices (either from subclone template or from clone template). Consed will pause a moment, and then there will appear a selection of primers that pass all of Consed's requirements. Templates are also chosen for each primer. You may have to scroll the primer list to the right to see the templates. Consed lists these templates in order of quality--all of them will cover the read you want to make. Double click on one of the primers in the Primers Window. That will cause the Aligned Reads Window to scroll to show that oligo in context. Click on 'Accept Primer'. A comment box will pop up. Enter some comment and click 'OK'. Notice that a yellow oligo tag, with a little red end, is created on the consensus for that primer. The red end points in the direction of the oligo. The tag contains all the information you need to order that oligo and do the reaction--you will learn how to pop it up below under 'tags'. What is the difference between 'Pick Primer from Subclone Template' and 'Pick Primer from Clone Template'? There are 3 differences: A. which vector file the primers are screened against. In the former case, the primer is screened against the file primerSubcloneScreen.seq and in the latter case against the file primerCloneScreen.seq B. In checking for false matches elsewhere in the assembly, if the template is the whole clone, then Consed must check for false matches in the *entire* assembly, including all other contigs. But if the template is just going to be a subclone, Consed only needs to check elsewhere in that subclone. Actually, to be conservative, Consed checks for false matches +/- the maximum insert size of a subclone. C. If you are picking primers for subclone template, then the primer picker can also pick the subclone templates. If it doesn't find any suitable subclone template, it will reject the primer. (By default, picking of subclone templates is turned on. If you prefer to pick your own primers, and want Consed's primer picker to be much faster, you can turn it off temporarily or permanently. To turn it off temporarily, go to the Consed Main Window, point to the Options menu, hold down the left mouse button and release on 'Primer Picking Preferences'. Scroll down to 'Pick Subclone Templates for Primers' and click 'False'. Click on 'Apply and Dismiss'. To change this permanently, see CONSED CUSTOMIZATION below. Beware: you must correctly customize determineReadTypes.perl for template picking to work. See INSTALLING CONSED below.) If you are interested in the details of primer-picking, see the section 'AUTOFINISH AND PRIMER PARAMETERS' (below). When you are done editing and have saved the assembly and exited Consed, run ace2Oligos.perl (supplied with this distribution--make sure your system administration installed it) which will extract all the oligos you just created. This is handy for email ordering of oligos. In the xterm, type: ace2Oligos.perl standard.fasta.screen.ace.2 oligos.txt where standard.fasta.screen.ace.2 is whatever the name is of the ace file you just saved. 39) WHEN CONSED CAN'T FIND AN ACCEPTABLE PRIMER Sometimes Consed refuses to pick a primer. This is because it has tried every possible primer and rejected it for one reason or another. If you don't understand why it didn't pick a particular primer, you can ask it as follows: In the Aligned Reads Window, point to the "Misc" menu, hold down the left mouse button and release on "Check Primer". Enter the left and right consensus positions of the primer, check which strand, and whether the primer is to use subclone templates or the whole clone as a template. Consed will tell you all that is wrong with that primer. Try it. 40) PICKING PCR PRIMER PAIRS In the Aligned Reads Window, go to the location where you want to pick the first PCR primer, say base 500. Point to the consensus, hold down the right mouse button and release on "Top Strand PCR Primer". Then scroll to the location where you want to pick the second PCR primer, say base 2200. Point to the consensus, hold down the right mouse button and release on "Bottom Strand PCR Primer". There will be a pause and then there will be a list of PCR primer pairs. Click on the pair you want and click "Accept Pair". You can modify the parameters for choosing PCR primer pairs by going to the Main Consed Window, pointing to "Options", holding down the left mouse button, and releasing on "Primer Picking Preferences." For example, by default Consed does not display all PCR primer pairs--this would take too long and give you too many. However, you can ask it to show you all such pairs. In the Primer Picking Preferences, scroll down to "Check All PCR Pairs (huge) or Just Sample?" and click on "All". Then click on "Apply and Dismiss". Then pick PCR primers again, as above. Don't be surprised if you get 10,000 or more pairs of primers! SEARCH FOR STRING 41) Try the 'Search for String' button (left side of the Aligned Reads Window). Type in a string (such as aaaca), and click 'ok'. There should be a list of 'hits'. Double click on one of the hits (or single click on it and click on 'go'.) Notice that the Aligned Reads Window scrolls to that position and has the cursor on the found string. (It might be complemented.) Dismiss this window. Try this again, only this time in the Search For String Window select 'Search Just Reads'. Then click 'OK'. You will notice there are many more hits. This is because this shows hits in each read, even if they are at the same consensus position. You can also try the approximate match search for string by clicking on 'Approximate' instead of 'Exact'. The 'Per Cent Mismatch' only applies to the Approximate match search. COPY AND PASTE 42) In the Aligned Reads Window, swipe some bases by holding down the left mouse button. You should see the bases turn yellow, at least temporarily. Then click the 'Search for String' button. Use the middle mouse button to paste the bases you have just swiped into the 'Query string:' box. Notice that you can swipe bases either from the consensus or from a read. The search for string is case-insensitive so don't worry about the pasting being upper or lowercase. CORRECTING FALSE JOINS MADE BY PHRAP 43) Phrap may put several reads together that you believe do not belong together. (For example, you may see several high quality discrepancies between the reads.) If you are sure these reads do not belong together, you can force a subsequent reassembly by phrap to not assemble those reads together. You do this by finding a location where there is a high quality discrepancy. Then click on the read with the right mouse button and release on 'Tell phrap not to overlap reads discrepant at this location'. There are no high quality discrepancies with this dataset so Consed won't let you do this. (Try it and see.) However, when you use your own data, you may get the chance! ADDING READS 44) For this to work, your system administrator must have set up everything correctly. (See below in INSTALLING CONSED.) Assuming you have set everything up correctly, you can now experiment with adding reads. From a UNIX prompt, copy the new chromatograms into the chromat_dir directory: cp ../chromats_to_add/* ../chromat_dir Exit Consed and bring it up again using the original ace file standard.fasta.screen.ace.1 If it asks if you want to apply edits, just say 'no'. On the Main Window, click on the Add New Reads button. There will appear a list of files ending with .fof. These are files that contain lists of chromatograms. Double click on 'reads_to_add.fof' Then Consed will ask "If a read doesn't align against any existing contig, do you want to have it go into a contig by itself? (otherwise it will just not be put into the assembly)" Answer yes or no--I don't care. Consed will ask "Do you want to recalculate the consensus quality values where each of the new reads is aligned?" Answer yes or no, but in practice you should generally answer "yes." There should be lots of progress output in the xterm from which you started Consed. When it completes, there will be a Reads Added Window popup with a report of which reads were added. In this case, it should say that 9 reads were successfully added and list them. If you get an error message, look carefully at the full error message in the xterm to diagnose the problem. Probably there is some mistake in how you installed Consed. See INSTALLING CONSED (below). TEARS AND JOINS Just so you get the same results as I do, exit Consed and bring it up again using the original ace file standard.fasta.screen.ace.1 If it asks if you want to apply edits, just say 'no'. 45) When phrap really screws up, you may want to just tear the contig apart in several places and then join the pieces back together in a different way. Let's try it: Go to location 1500. Point the mouse at the consensus base at 1500 and push the right mouse button down. Release the button on 'Tear Contig at This Consensus Position'. Up will pop a list of reads with 2 little buttons next to them <- and ->. Leave everything as it is and just click 'Do Tear'. (If you want to play around with which reads goes into which contig, do that another time.) Now you should have 2 Aligned Reads Windows on top of each other. One should contain 'Contig2' and the other 'Contig3'. Dismiss the little window that says 'Tear Complete'. Now let's join these 2 contigs back together: Click on 'Search for String' and type in the following bases: agctgccatc Click 'OK'. Search for string should find 2 locations, one in Contig2 and one in Contig3: Contig2 (consensus) 1447-1456 (uncomplemented) Contig3 (consensus) 829-838 (uncomplemented) Double click on the first one. The Aligned Reads Window for Contig2 will scroll to location 1447 and the window will raise up. In that Aligned Reads Window, click on 'Compare Cont'. Now double click on the 'Contig3' line in the above Search for String results. The Aligned Reads Window for Contig3 will scroll to location 829 and lift up. In that Aligned Reads Window, click on 'Compare Cont'. Now the Compare Contigs Window should be visible. In the Compare Contigs Window, try scrolling back and forth. You can change the cursors (blinking red), but if you do, please return them to the locations 1447 and 829 for the next step. The cursors 'pin' these bases together when doing an alignment. (The algorithm is a pinned Smith-Waterman alignment.) Click on Align. Try scrolling the alignment by dragging the thumb in the lower half of the Compare Contigs. An 'X' means there is a discrepancy between the 2 contigs. There is also a 'P' (see if you can find it!) The P indicates the bases that you pinned together. Click with the left mouse button on either contig in the bottom alignment. You will notice that both contigs will have the red blinking cursor in the same position. Click on 'Scroll Both Aligned Reads Windows' and look at the Aligned Reads Windows to see that they scroll to the corresponding positions. You can have traces up for the contigs, and they will scroll as well. Experiment with this. Then click 'Join Contigs'. The 2 previous Aligned Reads Windows will disappear and there will be a new one which has a new contig 'Contig4'. You have made a join! Scroll left and right. You will notice that many of the reads are highlighted. These are the reads that came from the previous "right" contig. To unhighlight all of these reads at once, point to the "Misc" menu, hold down the left mouse button and release on "Unhighlight All Reads". It is possible to have more than one Compare Contigs windows up at a time. This allows you to investigate a repeat that has more than 2 copies. Compare Contigs is one method of exploring joins of contigs that were not made by phrap. Another method is to use phrapview, supplied with phrap. phrapview gives a high level view of all internal joins while 'compare contigs' shows the alignment of a single internal join. Some users have found them to work well together--phrapview to find a join and, having found it, 'compare contigs' to examine it in more detail. REMOVING READS 46) You can also remove individual reads and put them into their own contigs. For example, in the Aligned Reads Window, go to location 2000. Point to the read name of read djs74_2664.s1 and hold down the right mouse button. Release on 'Put read djs74_2664.s1 into its own contig.' Consed will ask you 'Are you sure...?' Answer 'yes'. Presto-chango! The read is put into its own contig and the old contig is redrawn without the read in it. At this point you should save the assembly--you should always save the assembly after removing reads. TAGS 47) Bring up a trace for a read (as above). Swipe some bases on the 'edt' line with the middle mouse button. A list of choices will pop up. Select 'Add Tag'. Type in a comment in the box at the bottom, and select 'comment' from the list of tag types. You will now see a blue box both in the Aligned Reads Window and in the Traces Window on that read. To see the comment, you can just point to it in the Aligned Reads Window and you will see the comment in the lower right hand corner of the Aligned Reads Window. Alternatively, you can click on that blue tag in the Aligned Reads Window with the right mouse button and release on 'Tag: comment Show more info?'. Alternatively, you can click on the blue tag in the Traces Window with the right mouse button. Try creating some other kinds of tags: again swipe some bases in the Trace Window by selecting a different tag type. You will notice that different tags are in different colors. You can always use the methods above to see what kind of tag it is if you forget what a particular color means. You can also define your own tag types. See below CREATING CUSTOM TAG TYPES for how to do that. CREATING LONG TAGS 48) You can create really, really long tags as follows: Just create a short version of the tag as above for where you want the tag to start. Then figure out the consensus position of where you want the tag to end. In the Aligned Reads Window, click on the short tag with the right mouse button and release on 'tag: show more info?' (as above). A Tag Window will appear for that tag. In the Tag Window, simply change the End Unpadded Consensus Position to the place you want it to end. Then click 'OK'. You will now notice that the tag will be as long as you wanted. 49) You can create tags on the consensus in the same way. In the Aligned Reads Window, use the middle mouse button to swipe some bases on the consensus in the Aligned Reads Window. Up will pop a list of tag types. Click on one of them. Try it again somewhere else. Try it with the tag type being 'comment'. In this case, you must enter a comment. Notice the pretty colors! If you forget what a particular color means, you can click on the colored tag with the right mouse button and it will tell you. 50) Try creating some tags that overlap each other. You will notice that the overlapping region will be purple. If you want to know which tags overlap, you can click with the right mouse button on the purple and you will be told all tags that are on that base. 51) If you have many tags that overlap and thus are purple, you can hide some less relevant tag types so there is less purple and there is less distraction. Make sure you have a few tags visible. Then click on 'Find Main Win'. In the Main Window, open the Options menu, and release on 'Hide Some Tag Types'. A list of tag types will pop up. Select the type that you have visible (above). Then click 'OK'. Go back to the Aligned Reads Window. That tag should still be visible. Click on the button 'Some Tags' in the upper right part of the Aligned Reads Window. Your tag should disappear. The 'Some Tags' button should have changed to 'Sh All Tags'. Click on it again. Your tags should have reappeared. 52) Normally, when you re-assemble, phrap will name the contigs differently--what was Contig31 before may become Contig32. To help you know which contig is which, Consed allows you to give a name (e.g., "A") to a contig which will persist after re-assembling. To do this, swipe some consensus bases with the middle mouse button (as above). When the "Select Tag Type" box pops up, click on "contigName" and also type a name into the "Contig Name:" field and then click "OK". The next time you re-assemble, the name "A" will appear in the list of contigs on the Main Consed Window. SEARCH FOR READ NAME 53) Restart Consed using the original ace file standard.fasta.screen.ace.1 If it asks if you want to apply edits, just say 'no'. Instead of clicking on a read or contig name, type a read name into the "Find read (with *'s):". If you want to look at the location containing read djs74-2689.s1, you can just type "2689" and then push the "Enter" key and Consed will immediately bring up the Aligned Reads Window with the cursor on read djs74-2689.s1. Suppose that there were more than one read that matched? For example, suppose you type: "26" and then push the "Enter" key. This matches 3 reads: djs74-2689.s1 djs74-2679.s1 djs74-2664.s1 Try it and see what happens... Try entering "26*9" and see what happens. What does the "*" mean? Try using "Find read (old way)". Try typing djs74-2 You will notice that as you type each letter, the first item in the list that matches the letters typed will be highlighted. Experiment with deleting a few letters and typing others. This is a powerful method of quickly getting to the read name you are interested in. When you get to the name in the list, you do not have to type the rest of the name--just type carriage return or else click on 'OK'. Why does Consed have both methods? It used to just have the old way, but then I added the more powerful first method. However, a few people like the old way, so I kept it. ONLINE DOCUMENTATION 54) On the Aligned Reads Window, click on the 'Help' menu and release on 'Show Documentation'. You will see this document. You can search for keywords in it. GOTO POSITION 55) In the Aligned Reads Window, click in the 'Pos:' box in the upper right-hand corner. Type in a number, such as 540, and push the 'Return' or 'Enter' key. The Aligned Reads Window will scroll to position 540. We find this feature is particularly useful when one person wants another person to look at something in the sequence. HIGHLIGHTING READ NAMES 56) In the Aligned Reads Window, click on a read name with the left mouse button. The name will turn magenta. Click again and it will turn yellow again. Try turning it magenta and then scrolling. This feature is helpful in keeping track of a particular read as you scroll. If you have an emacs window open (or any editor window), you can paste the read name in by just clicking with the middle mouse button. When you clicked on the read name in the Aligned Reads Window with the left mouse button, the read name was loaded into the paste buffer. COMPLEMENTING THE CONTIG 57) Push 'Comp Contig' in the Aligned Reads Window to complement the contig. This displays the opposite strand of the contig including the consensus and all reads. Push this button again to uncomplement it. RECOVERY FROM CRASHES 58) It is important to feel that your data are safe, even if the computer (or Consed) were to crash. Consed will recover your data from such a crash. Make an edit (remember, edits are made in the Trace Window) and jot down its location. Also note the name of the ace file which is displayed in the upper left box in the Aligned Reads Window. Then simulate a crash by going to the xterm where you started Consed and typing control-C (holding down the control key, and typing C). Restart Consed and select the same ace file you noted (above). A box will come up saying 'There is an edit history (a .wrk file) Consed may have crashed during a previous session with this same file. Do you want to apply those edits?' Click on 'yes'. Go and find the edits you made before Consed crashed--you will find them. This is the purpose of the .wrk files--they are a log file of your edits and they are added to as you make edits. 59) You should save your edits by pulling open the 'File' menu on the Aligned Reads Window, and releasing on 'Save assembly'. RESTRICTION DIGEST 60) In the Main Consed Window, click the "Digest" button. For the purpose of this exercise, the full pathname of file of vector sequence can refer to any file of sequence in fasta format. However, when you are using it with your own data it should refer to a file that contains the sequence of your cloning vector. For example, if you are sequencing a BAC, it should contain BAC vector. The sequence must start at the vector/insert junction that you used when you ligated the insert. Click "OK". You will see a comparison of in-silico fragments (those calculated from the sequence) and real fragments (those in fragSizes.txt which supposedly came from a real gel). If a band is red, that means that it doesn't match. Move the pointer over the fragments, and you will see the fragment sizes appear. Move the pointer to the in-silico fragment with size 2299. Click on it. You will see the fragment on the left size of the window become highlighted. Click on the button labeled "right end" (2nd row from the bottom of the window) and the Aligned Reads Window will pop up, with the cursor on the right end of the fragment. Click on "show problems" and navigate through the list of problems by clicking on "next". You will notice that the Gel Window is zoomed in. To return to the original zoom, click on "Zoom Original". Where it says "Select Enzyme:", point to "EcoRV", hold down the left mouse button and release on "HindIII". This is how you change enzymes. Click on the button labeled "Text Output". This can be printed out. Dismiss the restriction digest window. On the Main Consed Window, click the "Digest" button again. Notice the file "fragSizes.txt". This is a file of actual gel fragment sizes. If you don't have an actual gel, but rather you want to just make predictions of fragment sizes from the sequence, you can leave this box blank (erase the "fragSizes.txt"). Try that. fragSizes.txt has the following format: >EcoRV 448 710 1102 1197 -1 >HindIII 448 508 586 735 801 -1 where EcoRV and HindIII are enzymes and the numbers below them are the actual fragment sizes. Each enzyme list is terminated by -1. PROTEIN TRANSLATION AND OPEN READING FRAMES 61) If you would like, you can see the amino acid translation of the consensus in all reading frames. In the Aligned Reads Window, push down the left mouse button on the 'Misc' menu and release on 'Show Top Strand Protein Translation'. Try again but this time release on 'Show Bottom Strand Protein Translation'. Notice that there are 2 characters that are in magenta color. What are those characters? Why are they made in a different color? To not show the protein translation, push down the left mouse button on the 'Misc' menu and release on 'Don't show protein translation'. 62) You can search for open reading frames within a contig. In the Aligned Reads Window, push the left mouse button on 'Navigate' and release on 'Search for Open Reading Frames'. Notice that the open reading frames are shown for all 6 reading frames and are sorted by length. ERROR RATE 63) In the Aligned Reads Window is a box (upper right) labelled 'Err/10kb'. This is the estimated error rate for this contig, and it is a good indicator of when you are done (or not done) finishing. In addition, you can find the error rate for a particular region of contig as follows: Point at 'Misc' menu, hold down the left mouse button, pull down and release on 'Show Error Info For Region'. Fill in the boxes for left and right consensus position, click on 'Calculate' and you will be given the error and single subclone data for that region. RUNNING PHRED and PHRAP phred and phrap *must* be run via the phredPhrap perl script. If you don't do this, you are on your own. If you run phred on its own, and then you run phrap on its own, you will get an ace file that will not be usable by Consed. If you try to run phred and phrap without using the phredPhrap script, you are on your own. After you have run into problems (and you probably will), then do not email us--instead please use the phredPhrap script. To use the phredPhrap script to run phred and phrap: 64) Type: phredPhrap -V It should say: 000726 (or newer). If it does not, then you probably have not installed all the perl scripts from the scripts directory, as directed in INSTALLING CONSED. 65) Make a copy of the standard dataset. E.g., (First go up by typing "cd .." until you see "standard" when you type "ls". Then type: cp -r standard test cd test 66) Delete all the files in phd_dir and edit_dir: rm phd_dir/* rm edit_dir/* 67) cd edit_dir 68) Run phredPhrap by typing phredPhrap That's it--you no longer need to type *any* arguments, and generally you should not. (Please do *not* use the -notags option any longer.) If you want to add phrap options, you can do that: e.g., phredPhrap -forcelevel 3 Then run Consed on the resulting ace file as indicated in the beginning of the Quick Tour (above). If you have any problems, this is the time to diagnose them before you use your own data. COMMON PROBLEMS RUNNING PHREDPHRAP 69) Problems that were due to polyphred. To check this, in phredPhrap, leave the following line: $bUsingPolyPhred = 0; This will make polyphred not be used. If the problem then goes away, you will know the problem has something to do with polyphred so do not contact any of the phred/phrap/Consed people. Instead, contact the polyphred people: http://droog.mbt.washington.edu and dpc@u.washington.edu and debnick@u.washington.edu 70) Permission problems. Check that you have write access to the phd_dir and edit_dir directories. You can do this by trying to create a file in those directories: touch ../phd_dir/xxx which creates a file ls -l ../phd_dir/xxx which checks if the file was created. Do the same with ../edit_dir/xxx If you get a permission problem, do not contact me. UNIX permission problems are very simple for anyone who knows UNIX--get someone locally who understands UNIX and can help you solve the permission problem. ---------------------------------------------------------------------------- USING AUTOFINISH Note: Before you use Autofinish on your own data, you must modify determineReadTypes.perl. See INSTALLING CONSED below for information about this. To do the exercises in this section, it would help to be able to edit a file under UNIX and run a program under UNIX. If you can't do that, have someone teach you. (It will not work to edit a file on Windows and then transfer to UNIX.) Typical editors on UNIX are vi and emacs, but pico is probably the simplest for occasional users. You can find more information on pico from: http://www.strath.ac.uk/IT/Docs/IntroToUnix/node122.html You should also learn how to examine a file in UNIX, how to move around the filesystem, etc. If you don't know how to do this, consult: http://www.washington.edu/computing/unix/startdoc/files.html and http://www.washington.edu/computing/unix/startdoc/directories.html There are also many books at Unix at bookstores. 71) Type: cd autofinish/edit_dir (You might need to first type "cd ../.." depending on where you are.) 72) Try starting Consed by typing: ../../consed -ace autofinish.fasta.screen.ace.1 -autofinish (Note 'consed' above may be 'consed_solaris', 'consed_alpha', 'consed_hp', 'consed_sgi', or 'consed_linux' depending on your executable. If you have trouble, use that 'ls' command (see above)! ) If Autofinish says: Run-time exception error; current exception: InputDataError No handler for exception. Abort that means that you have not followed the instructions under 'INSTALLING CONSED' below. Please follow those instructions and then try this again. Consed will create 7 files: autofinish.fof (project name).001014.155627.customPrimers (project name).001014.155627.nav (project name).001014.155627.out (project name).001014.155627.sorted (project name).001014.155627.univForwards (project name).001014.155627.univReverses Where '001014.155627' is replaced by your current date and time in format YYMMDD.HHMISS. The first file, autofinish.fof, is a file of filenames. It contains the names of the other files. (project name).001014.155627.univForwards is the summary file of the suggested universal forward subclone reads (project name).001014.155627.univReverses is the summary file of the suggested universal reverse subclone reads (project name).001014.155627.customPrimers is the summary file of the suggested custom primer reads These are the files you will typically use for directing your bench work. If you like, you can import these files into Excel since the fields are separated by commas. The .out file is the Autofinish output file. This is the most important file to examine while you are evaluating Autofinish. If you want to know *why* Autofinish picked the reads it did, it will tell you. Consult this file before you start complaining about Autofinish's choices. I've had people complain, and then, once they look in the .out file (*not* any of the other files), they learn information that persuades them that Autofinish was correct all along. This is hard to over-emphasize, but I will try: CONSULT THE .out FILE CAREFULLY IF YOU DISAGREE WITH ANY OF AUTOFINISH'S CHOICES! It will tell you lots more, such as the orientation of the contigs. It will also tell you the value of all Autofinish parameters used. If you try to customize one of the parameters, check in the .out file to be sure that Autofinish used the value you intended. The .sorted file gives the reads sorted by contig and position. This file is useful if you want to find what reads Autofinish suggested for a particular location. It is *not* useful for understanding *why* Consed chose a particular read. It is deliberately terse to make it useful for automation the ordering of reads. The .nav file is a custom navigation file (see "CUSTOM NAVIGATION" far below). This file allows a Consed user to just click 'next', 'next', ... to review all of Autofinish's suggestions in context. This is a great way to quickly and easily review all of the reads suggested by Autofinish. This finishing tool is designed to be run in batch after each assembly. In a high throughput operation, the production people can make these reads without anyone using Consed to examine the assembly interactively. Only when Autofinish cannot help you any longer (generally after 3 or more times of running Autofinish, making the reads, and re-assembling), must you bring up Consed graphically and examine the assembly. We suggest that you write some of your own software to parse the summary files to automatically order primers and reads. The summary files (.customPrimers, .univForwards, .univReverses) will not change much but the .out file is constantly changing, so don't try to parse it. AUTOFINISH: MINIMUM NUMBER OF ERRORS FIXED PER READ 73) By default, the minimum number of errors fixed by an experiment is 0.02 Human finishers typically look for low consensus quality regions--regions that have one or more bases below a particular quality threshold. However, Autofinish can do better: it can find regions where the *total* number of errors is greater than some particular cutoff value. This method can find regions where none of the bases are low quality, but many are nearly low quality and thus the total number of errors in the region is high. Autofinish will also ignore regions that have a very few low quality bases, as long as the total number of errors is smaller than your cutoff. This is a better critereon because it is the total number of errors that you are trying to reduce when finishing--not the number of bases with quality below some arbitrary cutoff. Two bases of quality 20 have 0.02 errors (on average). Similarly, 20 bases of quality 30 have 0.02 errors (on average). (Quality values were explained at the beginning of this document.) Suppose that you want Autofinish to suggest an additional read for an area that even just has one quality 20 base. (Be aware that Autofinish will consider 10 quality 30 bases to be just as severe as 1 quality 20 base since, on average, they will both have precisely the same number of errors: 0.01) 74) EDIT PARAMETERS: HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS This shows how to change consed.autoFinishMinNumberOfErrorsFixedByAnExp. To change any other parameter, follow these same instructions replacing consed.autoFinishMinNumberOfErrorsFixedByAnExp with the parameter you want to change. In the edit_dir directory is a file called ".consedrc" which you will only see if you use "ls -a" instead of just "ls". In that .consedrc, add the following line: consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.01 You can do this using an editor, such as pico, or you can do it with Consed. To do it with Consed, bring up consed as follows: consed -ace autofinish.fasta.screen.ace.1 On the Main Consed Window, point to the "Options" menu, push down the left mouse button and release on "Edit Consed/Autofinish Parameters". Up will pop the "Edit Parameters" window. Near the top is "consed.autoFinishMinNumberOfErrorsFixedByAnExp". Point and click in the box on the left containing 0.02 just underneath "consed.autoFinishMinNumberOfErrorsFixedByAnExp". After clicking, the box outline should turn bold and the cursor should start blinking. Change the 0.02 to 0.01. Click on "just project" near the bottom of the window. The box containing 0.01 should turn red indicating that it is now different than the default. Then click "save". A box titled "Name of parameter file to write" should pop up. Click "ok". Note: you can changed more than one of these values before clicking "save". When using the Edit Parameters Window, I suggest that you do not click on the up and down arrows of the vertical scrollbar because these will scroll by too much. Instead, I suggest you point to the thumb of the vertical slider, hold down the left mouse button and drag the thumb. Alternatively, point to the black space above or below the thumb and click with the left mouse button. You need to try this to understand. To be sure that everything happened correctly, look at .consedrc file. It should contain the line: consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.01 (If you don't know how to view a file, get a UNIX book and learn the commands "less", "more", "pico", "vi", or "emacs".) (Get in the habit of checking .consedrc after using Consed's Edit Parameter Window.) AUTOFINISH: MINIMUM NUMBER OF ERRORS FIXED PER READ (continued) Then run Autofinish again: consed -ace autofinish.fasta.screen.ace.1 -autofinish Look at the files just created by typing 'ls -tlr' and look at the .out file by bringing it up with your favorite UNIX editor. You should see: PARAMETERS_CHANGED_FROM_DEFAULTS { . consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.010 . . Further down is a section: PARAMETERS { ! If you want to modify any of these parameters, just cut/paste ! the relevant line into your ~/.consedrc file ! (or into the edit_dir/.consedrc file) ! In the following, I have annotated the parameters with the following ! symbols: ! ! (YES) freely customize to your own site ! (OK) don't change unless you have a specific need and know what you ! are doing ! (NO) don't change this! This section contains all Autofinish parameters, whether you have changed them or not. Thus a changed parameter will be in both lists. Find consed.autoFinishMinNumberOfErrorsFixedByAnExp: 0.010 in this second list. Then compare the .sorted files from this run of Autofinish and the previous run of Autofinish in which the consed.autoFinishMinNumberOfErrorsFixedByAnExp value 0.02 You will notice that there are 2 additional reads suggested when the parameter is 0.01. There is a resequence with dye terminator chemistry of the djs228_474 template and a de novo reverse on template djs228_2632. Look at the .out file to see why Autofinish chose these reads. It will indicate that the first read is mainly to fix 0.01 errors in the region from 2536 to 2545 and the second read to mainly fix 0.01 errors from 969 to 978. Bring up Consed to see what is in the 10 base region from 2536 to 2545. You will see that there is a quality 25 base at 2539 and a quality 21 base at 2540. After that come some bases whose qualities are in the high 30s. In the Aligned Reads Window, point at the Misc menu, hold down the left mouse button, and release on Show Error for a Region. Enter 2539 and 2549 for the "Left Consensus Position of Region" and "Right Consensus Position of Region" respectively and click on "Calculate". You will see that there are .0135 errors in this region. This is less than 0.02 so Autofinish will not try to fix this region unless you reduce consed.autoFinishMinNumberOfErrorsFixedByAnExp to 0.01 The default is 0.02 because most labs do not want to fix regions that have less than 0.02 errors. 75) DIVERSION: UNIX LESSON Note for UNIX novices: Earlier, I said that you only needed to know 3 UNIX commands: pwd, ls, and cd. Then I added "ls -a", "less" and an editor (such as pico). Now I want you to learn one more: ls -tlr This is the same as ls, but it puts one file on a line and prints the lines so that the most recent files are on the bottom. Since you will be creating many, many files as you work through these Autofinish exercises, this command gives an easy way to see the files you have just created, without having to always look at autofinish.fof to look for the names of the files you just created. AUTOFINISH: CHANGING MELTING TEMPERATURES 76) Use 'ls -tlr' to find the most recent .out file. Search in the .out file (using your favorite editor) for MeltingTemp and you will find the following lines: consed.primersMinMeltingTemp: 55 consed.primersMaxMeltingTemp: 60 Some labs prefer to use primers with lower melting temperatures. In your .consedrc file, put the following lines: consed.primersMinMeltingTemp: 50 consed.primersMaxMeltingTemp: 55 You can do this by following the instructions above under HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS. When you are done doing that, look in the .consedrc file to make sure it contains the above 2 lines. Then run Autofinish again: consed -ace autofinish.fasta.screen.ace.1 -autofinish Using your favorite editor, check that the .out file you just created says: consed.primersMinMeltingTemp: 50 consed.primersMaxMeltingTemp: 55 (You can find the most recent .out file by typing 'ls -tlr'.) Compare the .sorted files from this run of Autofinish and the previous run. The difference should be the custom primer read: The previous .sorted file had: tcttttgtctttccatatacatttt,56 which means the melting temperature is 56. The latest .sorted file had: cattttagaatcagtttgttg,50 which means the melting temperature is 50. 77) AUTOFINISH: JUST CLOSING GAPS You could use Autofinish to just close gaps (you are not interested in fixing single subclone regions or weak regions). Add the following to the .consedrc file (and remove everything else so that Autofinish uses the default values for everything else): consed.autoFinishCoverLowConsensusQualityRegions: false consed.autoFinishCoverSingleSubcloneRegions: false If you are using the Edit Parameter Window to change these values, you will find them when scrolling about 1/3 way down. Change the consed.primersMinMeltingTemp and consed.primersMaxMeltingTemp back to their original values. Then check the .consedrc to make sure it contains the above 2 lines. (Get in the habit of checking .consedrc after using Consed's Edit Parameter Window.) Now you should see in the .sorted file just 4 reads: one custom primer read pointing out the left end of the contig and 3 reverses off the left end of the contig. The right end is not extended because Autofinish recognizes that it is the end of the BAC. You can change any of the parameters listed at the top of the Autofinish output file (or actually any of the more exhaustive list of parameters listed in the 'Info' menu, 'Show Consed Parameters' list.) We believe the defaults are an excellent starting point. 78) AUTOFINISH: JUST CLOSING GAPS JUST USING WALKS One high-throughput operation was only interested in closing gaps and only interested in using walks to close those gaps. This is the appropriate set of Autofinish parameters to do this: consed.autoFinishCoverSingleSubcloneRegions: false consed.autoFinishCoverLowConsensusQualityRegions: false consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false consed.autoFinishAllowPCR: false consed.autoFinishAllowResequencingReads: false consed.autoFinishAllowMinilibraries: false consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: false consed.autoFinishCallReversesToFlankGaps: false (and every other parameter left the default value). The first 2 parameters are the same as the "AUTOFINISH: JUST CLOSING GAPS" section (above). The other parameters tell Autofinish all of the types of reactions it is not allowed to use, leaving just walks. Try this. Now you should see only a single read, a walk, pointing left off the left end of the contig. 79) AUTOFINISH: NOT REPEATING FAILED EXPERIMENTS For this exercise, keep a backup copy of the ace file: cp autofinish.fasta.screen.ace.1 autofinish.fasta.screen.ace.1.save If you run Autofinish with the -doExperiments parameter (see below), -doExperiments causes Autofinish to record its suggestions in the ace file (hence changing the ace file). If one of these suggested reads fails to fix a problem, when Autofinish is run again it won't pick the same read again. consed -ace (ace file name) -autofinish -doExperiments If a forward or reverse universal primer read failed, Autofinish (when run in a subsequent round) will not suggest that same experiment. If a custom primer read fails, Autofinish will not pick that same experiment again, and it won't pick a custom primer read that is even close to the failed one. 'Close' is defined by the parameter: consed.autoFinishNewCustomPrimerReadThisFarFromOldCustomPrimerRead: 50 In addition, Autofinish (the next time it is run) will tell you how well each experiment did in solving the problem it was intended to solve. Return the parameters to the defaults and try this by running Autofinish twice like this: consed -ace autofinish.fasta.screen.ace.1 -autofinish -doExperiments consed -ace autofinish.fasta.screen.ace.1 -autofinish -doExperiments and look at the .out file from the 2nd run. (You can find the most recent .out file by typing 'ls -tlr'.) You should see lines such as this: rejecting experiment: reverse universal primer read with template djs228_1094 because an earlier round of autofinish called this with expid: 1 rejecting experiment: reverse universal primer read with template djs228_1422 because an earlier round of autofinish called this with expid: 2 rejecting experiment: reverse universal primer read with template djs228_1034 because an earlier round of autofinish called this with expid: 3 This is Autofinish trying experiment after experiment but finding they were already suggested in an earlier round of Autofinish. You should not type '-doExperiments' if you do not intend to do the experiments Autofinish suggests. If you use -doExperiments, but you don't really do the experiments, and then you run Autofinish again, Autofinish will be very upset--it will think that all of its suggested experiments failed (because it can't find them). It will see that all of the problems are still present but it will think that it should not choose any of those same experiments again so it will suggest different experiments that will not be as good as its original suggestions. -doExperiments will also cause oligos to be tagged. Primer id's created by Autofinish use the same naming scheme as primers created in Consed and they will not conflict with each other. For example, if Autofinish creates oligos djs14.1, djs14.2, and djs14.3, then the next primer that a user accepts will be djs14.4. If Autofinish is run a second time, it will start with primer djs14.5. When you have completed this exercise with -doExperiments, replace the original .ace file by typing: cp autofinish.fasta.screen.ace.1.save autofinish.fasta.screen.ace.1 80) AUTOFINISH: doNotFinish particular regions If there is a region that you don't care to finish (e.g., it has already been finished by an overlapping clone or you know there is no gene there), then you can put a doNotFinish tag on the consensus and Autofinish will not try to finish this area. First, delete the .consedrc file (or, if you are using the Edit Parameter Window of Consed, restore the parameters to their default values) and run Autofinish again: consed -ace autofinish.fasta.screen.ace.1 -autofinish Bring up consed: consed -ace autofinish.fasta.screen.ace.1 and put a doNotFinish tag on the region from 2000 to 4000. (If you don't know how to do that, read through the Consed Quick Tour, above.) Save the assembly as autofinish.fasta.screen.ace.2 Run Autofinish again: consed -ace autofinish.fasta.screen.ace.2 -autofinish Look at the .out files for each of the 2 runs of Autofinish. (You can find the most recent .out files by typing 'ls -tlr'.) You will notice in the .out file for the 2nd run of Autofinish that, in the other than the experiments to extend the contig to the left, there is only one experiment which is from 315 to 1662. If you find that experiment in the .out file, it will say Contig1 0.05 errors fixed in region from 315 to 1662 fixing 0.05 errors from 969 to 978 The "969 to 978" gives the worst 10 base window that the read is intended to fix. If you look with Consed, you will see that there is a quality 12 base at 974. You can also use doNotFinish tags to prevent Autofinish from *extending* a contig into a gap by putting a doNotFinish tag near the end of the contig and setting the following Autofinish parameters: consed.autoFinishDoNotExtendContigsWhereTheseTagsAre: doNotFinish consed.autoFinishDoNotExtendContigsIfTagsAreThisCloseToContigEnd: 50 81) AUTOFINISH: NOT USING PARTICULAR SUBCLONE TEMPLATES If you no longer have a template that was used in shotgun, and thus you don't want Autofinish to pick that template, you can put it in a file badTemplates.txt in edit_dir. This is a simple file with one name per line. Using your favorite UNIX editor, create a file called "badTemplates.txt" in edit_dir. Make it contain a single line: djs228_1094 Delete .consedrc (or, if you are using the Edit Parameter Window, restore the parameters to their defaults) and run autofinish again: consed -ace autofinish.fasta.screen.ace.1 -autofinish Search the .out file for djs228_1094. You will find one line like this: not using template: djs228_1094 because in bad templates file Now try deleting badTemplates.txt and running autofinish again the same way. You will notice there are many differences in reads chosen, since djs22_1094 is now available again for making reverses as well as a template for custom primer walks. badTemplates.txt can accept "*" (match any characters) as part of the name. For example, djs140_23* will eliminate templates: djs140_235684 djs140_235783 djs140_2326 etc. 82) AUTOFINISH: NOT USING ENTIRE LIBRARIES FOR FINISHING In addition to the badTemplates.txt file, you can use a badLibraries.txt file which contains a list of all libraries that are off-limits to Autofinish (e.g., you threw away all subclone templates from this library or they are from a different lab which gave you the chromatograms but not the templates). Autofinish determines the library of a read by the following in the PHD file: WR{ template dscript 990603:090231 name: djs366_101 lib: library1 } where "library1" is replaced by the actual library name. Take a look at any phd file in autofinish/phd_dir and you will see this. Make sure that badTemplates.txt is deleted and .consedrc is either deleted (or use the Edit Parameter Window to restore the defaults) and run Autofinish again. consed -ace autofinish.fasta.screen.ace.1 -autofinish Now create a file badLibraries.txt containing a single line: lib1 and run autofinish again: consed -ace autofinish.fasta.screen.ace.1 -autofinish Look at the .out file. You will see lines like this: not using template: djs228_1034 because in bad libraries file not using template: djs228_1051 because in bad libraries file not using template: djs228_1094 because in bad libraries file . . . You will see that there are no reads suggested that use any of these templates, even though some of them (e.g.., djs228_1034) were used in the Autofinish run (above) before you created the badLibraries.txt file. When you start doing this with your own data, you must put the lib: line into your phd files. Do this by modifying determineReadTypes.perl. 83) MULTIPLE LIBRARIES WITH DIFFERENT INSERT SIZES If different libraries have different insert sizes, Autofinish must know the insert size of each library. It gets this information from a file called 'librariesInfo.txt" which must be placed in edit_dir (where the ace file is). This file looks like this: LIB{ name: lib0 avgInsertSize: 1500 maxInsertSize: 3000 stranded: double cost: 600.0 } LIB{ name: lib1 avgInsertSize: 3000 maxInsertSize: 5000 stranded: double cost: 1000.0 } LIB{ name: lib2 avgInsertSize: 10000 maxInsertSize: 12000 stranded: double cost: 5000.0 } 'name' is the name of the library. This is the name that goes into the PHD files after the 'lib:' keyword (see AUTOFINISH: NOT USING ENTIRE LIBRARIES FOR FINISHING above). 'avgInsertSize' is the average insert size of the library--the figure to be used by Autofinish if there are not enough forward/reverse pairs for Autofinish to calculate the mean insert size of the library. 'maxInsertSize' is the maximum insert size--if forward/reverse pairs are further apart than this, Autofinish will assume these reads are misassembled. 'stranded' is whether this template is single or double stranded. 'cost' is the cost of making a minilibrary out of a template from this library. In .consedrc, there must be a line like this: consed.primersMaxInsertSizeOfASubclone: 5000 where 5000 is replaced by whatever the maximum insert size of all of your different libraries. For this exercise make .consedrc have a single line: consed.primersMaxInsertSizeOfASubclone: 12000 Alternatively, use the Edit Parameter Window to set consed.primersMaxInsertSizeOfASubclone to 12000. For this exercise I have a file in edit_dir called "librariesInfo.txt_hide". To make Autofinish pay attention to it, do the following: cp librariesInfo.txt_hide librariesInfo.txt Delete badLibraries.txt: rm badLibraries.txt Before you run Autofinish again, first restart Consed: consed -ace autofinish.fasta.screen.ace.1 On Consed's Main Window, point to 'Info', hold down the left mouse button, and release on 'Show Library Info'. You should see the names of your libraries and the correct number of reads in each library. This feature will be useful in debugging your use of librariesInfo.txt Then run Autofinish again: consed -ace autofinish.fasta.screen.ace.1 -autofinish Look at the .out file. Look for the following: "Choosing de novo universal primer reads to try to close gaps" You will see there are many reads under this heading. These are the lib1 and lib2 reads that have a large average insert size and thus span the gap. Autofinish did not choose some of these reads before because, if the insert size were only 1500 bases, these reads would not have helped to close the gap. When you are done with this exercise, delete librariesInfo.txt and .consedrc 84) AUTOFINISH CLOSING GAPS WITH MINILIBRARIES If you wanted Autofinish to *only* suggest minilibraries to close gaps, use the following parameters: consed.autoFinishAllowWholeCloneReads: false consed.autoFinishAllowCustomPrimerSubcloneReads: false consed.autoFinishAllowResequencingReads: false consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false consed.autoFinishAllowPCR: false consed.autoFinishAllowResequencingAUniversalPrimerAutofinishRead: false consed.autoFinishCallReversesToFlankGaps: false consed.autoFinishAllowMinilibraries: true consed.autoFinishAlwaysCloseGapsUsingMinilibraries: true consed.autoFinishPrintMinilibrariesSummaryFile: true For this exercise, type: cd assembly_view/edit_dir (You might need to first type "cd ../.." depending on where you are.) Attention! This is *not* the same directory you have been using. It, autofinish/edit_dir, does not have any gaps so it cannot be used for this exercise. Create a .consedrc file with the parameters above in it. Alternatively, start Consed in this directory and use the Edit Parameter Window to modify the parameters as above. Then run Autofinish: consed -ace assembly_view.fasta.screen.ace.1 -autofinish When it has completed, look in the .out file. You will see the following: Enough existing fwd/rev pairs to establish: Left end of Contig3 has 13 fwd/rev pairs connecting it to Right end of Contig2 with gap size -460 (contigs overlap) Trying to suggest minilibrary for gap between right end of Contig2 and left end of Contig3 MINILIBRARY{ best template: djs736a2_fp04q274 from lib djs736a2 size: 3607 errors fixed: 0.01 errors fixed per dollar: 0.00 connecting right end of Contig2 to left end of Contig3 with estimated gap size -460 alternative template: djs736a1_fp02q472 from lib djs736a1 size: 1184 errors fixed: 0.01 errors fixed per dollar: 0.00 } You will also see a more terse (but more easily parseable) description in the .minilibraries file. The parameter: consed.autoFinishPrintMinilibrariesSummaryFile: true will cause Autofinish to print a file with name similar to: (project name).001014.155627.minilibraries Or you could be more sparing in which gaps you close with minilibraries and which you do not: consed.autoFinishAlwaysCloseGapsUsingMinilibraries: false If the parameter above is set to false, then Autofinish will only choose minilibraries if the gap is the size below or larger: consed.autoFinishSuggestMinilibraryIfGapThisManyBasesOrLarger: 800 If you try this in this example, you will see that Autofinish will not suggest a minilibrary because the gap has negative size (meaning the contigs overlap) and thus is not more than 800 bases. Autofinish can suggest more than one minilibrary per gap: consed.autoFinishSuggestThisManyMinilibrariesPerGap: 2 is the default, but you can increase it. If you try this, you will see more alternate templates suggested for the minilibrary. When you are done, delete the .consedrc file. 85) CLOSING GAPS USING PCR If you are interested in just closing remaining gaps with PCR, you can set the following Autofinish parameters: consed.autoFinishAllowWholeCloneReads: false consed.autoFinishAllowCustomPrimerSubcloneReads: false consed.autoFinishAllowResequencingReads: false consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false consed.autoFinishAllowMinilibraries: false consed.autoFinishAllowPCR: true consed.autoFinishAllowResequencingAUniversalPrimerAutofinishRead: false consed.autoFinishCallReversesToFlankGaps: false This will cause Autofinish to try to close all gaps using PCR. Type: cd assembly_view/edit_dir (You might need to first type "cd ../.." depending on where you are.) (This is because the autofinish/edit_dir project does not have any gaps.) Create a .consedrc file with the parameters above in it. Then run Autofinish: consed -ace assembly_view.fasta.screen.ace.1 -autofinish When it has completed, look in the .out file. You will see the following: 436 acceptable primers on left end of contig Contig1 503 acceptable primers on right end of contig Contig1 520 acceptable primers on left end of contig Contig2 202 acceptable primers on right end of contig Contig2 523 acceptable primers on left end of contig Contig3 377 acceptable primers on right end of contig Contig3 Please make the following PCR primers: ttctgggtctggaggaca 485 to 502 (bottom strand) for left end of Contig1 melt: 57.5 taattgggactataggtacatgc 13202 to 13224 (top strand) for right end of Contig1 melt: 55.1 tttgttttgttttgtattttgttt 504 to 527 (bottom strand) for left end of Contig2 melt: 55.1 ctgttctcctgtcattctgg 9950 to 9969 (top strand) for right end of Contig2 melt : 55.7 gggcaagagctgtaaagag 114 to 132 (bottom strand) for left end of Contig3 melt: 55.3 accaaataacaggtaaaccaaa 15503 to 15524 (top strand) for right end of Contig3 melt: 55.4 Do PCR reactions with the following pairs of primers: ttctgggtctggaggaca tttgttttgttttgtattttgttt ttctgggtctggaggaca ctgttctcctgtcattctgg ttctgggtctggaggaca gggcaagagctgtaaagag ttctgggtctggaggaca accaaataacaggtaaaccaaa taattgggactataggtacatgc tttgttttgttttgtattttgttt taattgggactataggtacatgc ctgttctcctgtcattctgg taattgggactataggtacatgc gggcaagagctgtaaagag taattgggactataggtacatgc accaaataacaggtaaaccaaa tttgttttgttttgtattttgttt gggcaagagctgtaaagag tttgttttgttttgtattttgttt accaaataacaggtaaaccaaa ctgttctcctgtcattctgg gggcaagagctgtaaagag ctgttctcctgtcattctgg accaaataacaggtaaaccaaa printing experiment summary files: assembly_view.020320.130220.univForwards assembly_view.020320.130220.univReverses assembly_view.020320.130220.customPrimers assembly_view.020320.130220.sorted The first is the list of PCR primers to synthesize. The second list gives which pairs of the primers in the first list to do PCR with. (It doesn't make sense to do PCR with the primer off the left end of Contig2 and the primer off the right end of Contig2.) Some of these pairs of PCR primers will give products and others will not (or will give enormous products). The ones giving products will tell you how the contigs are ordered and oriented. You can then sequence the product to find the gap sequence between the contigs. 86) AUTOFINISH: TOO MANY UNIVERSAL PRIMER READS St Louis wanted more universal primer reads, so I put in a feature that allows for redundant universal primer reads. If you get too many for your taste, then put this into your .consedrc file: consed.autoFinishRedundancy: 1.0 The default is 2.0, meaning that Autofinish will try to fix every problem area twice--once by some universal primer reads and once again by other universal primer reads. Then, and only then, will it try oligo walks to finish remaining problems. Baylor wanted more reverses to close gaps, so I put a feature into Autofinish that calls *all* reverses near gaps: (contig) ___________________________ <- reverse 1 <- reverse 2 <- reverse 3 <- reverse 4 (including reverses that are likely to fall into the gap) in the hope that enough of them will hook onto each other that the gap will be closed. (If there is already a reverse pointing out but no forward, Autofinish will suggest the forward.) If this feature gives you too many reverses for your taste, then in your .consedrc file: consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: false 87) AUTOFINISH FOR CDNA ASSEMBLIES The way to use Autofinish for cDNA assemblies is to pretend that the cDNA is a BAC and that you are only going to allow whole clone custom primer BAC reads. To do this, put the following into your .consedrc file: consed.autoFinishAllowResequencingReads: false consed.autoFinishAllowWholeCloneReads: true consed.autoFinishAllowCustomPrimerSubcloneReads: false consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false consed.autoFinishAllowPCR: false consed.autoFinishCDNANotGenomic: true consed.autoFinishCheckThatReadsFromTheSameTemplateAreConsistent: false consed.checkIfTooManyWalks: true consed.autoFinishExcludeContigIfOnlyThisManyReadsOrLess: 0 consed.autoFinishExcludeContigIfDepthOfCoverageOutOfLine: false consed.autoFinishExcludeContigIfThisManyBasesOrLess: 0 consed.autoFinishCoverSingleSubcloneRegions: false consed.autoFinishContinueEvenThoughReadInfoDoesNotMakeSense: true consed.autoFinishCallReversesToFlankGaps: false You don't want Autofinish to try to extend off the 3' end or the 5' end of the cDNA, right? How is Autofinish going to determine that? It determines it as follows: In the 5' end read, put the following into the phd file: WR{ primer determineReadTypes 001019:112654 type: univ fwd } WR{ template determineReadTypes 001019:112654 name: cDNA1 } In the 3' end read (the read that is primed off the polyA tail), put the following into the phd file: WR{ primer determineReadTypes 001019:112654 type: univ rev } WR{ template determineReadTypes 001019:112654 name: cDNA1 } For all other reads, such as transposon reads and custom primer walks, put the following into the phd file: WR{ primer dscript 001019:112654 type: walk } WR{ template determineReadTypes 001019:112654 name: cDNA2 type: bac } If you are going to finish many cDNA's, you will find it will work better to modify determineReadTypes.perl than to go editing every phd file. So Autofinish finds the univ fwd read and assumes it indicates the 5' end of the cDNA and it finds the univ rev read and assumes it indicates the 3' end of the cDNA. (The parameter consed.autoFinishCDNANotGenomic: true tells it to try to find the end of the cDNA in this manner.) There is one additional problem when using Autofinish for cDNA assemblies: initially, the ace file created by phrap is empty since the 3' and 5' reads don't overlap enough. You have *no* contigs for Autofinish to finish. So phrap is of no use initially. But you can use Consed to create the assembly: First run phredPhrap to phred both reads and run determineReadTypes.perl Then pick the 3' read and run phd2Ace.perl on it: phd2Ace.perl (name of phd file) This will give you an ace file with one read in it. Now suppose that you have other reads from the same cDNA. You can use this technique to add them to the ace file: To add all the reads phrap has neglected to put into the ace file, do the following: 1. create a file of phd filenames. E.g., djs74_1180.s1.phd.1 djs74_1432.s1.phd.1 djs74_1455.s1.phd.1 djs74_1465.s1.phd.1 djs74_1532.s1.phd.1 djs74_1802.s1.phd.1 djs74_1803.s1.phd.1 Typically, you will get this list of phd files by looking in the singlets file. Then run consed: 2. consed -ace old_ace.ace -addReads fileOfPhdFiles.txt -newAceFilename new_ace. ace where: fileOfPhdFiles.txt is the name of the file (above) containing the phd filenames new_ace.ace is whatever you want the new ace file to be named old_ace.ace is the name of the old ace file Now you have an ace file that contains all the reads you have sequenced for that cDNA. You can now run Autofinish on it: consed -ace new_ace.ace -autofinish 88) AUTOFINISH FOR LISTING GAP-SPANNING TEMPLATES Sometimes people ask me for how to make Autofinish suggest all templates that span a gap. People who ask this question are not using Autofinish to automate finishing--they are using it as a tool in the hand of a human finisher. Although evidence has shown that Autofinish is far more powerful in an automated mode, it is also a powerful tool in the hands of a human finisher. I will specify how to do this, but hope you will move to the next level of using it in an automated manner. One method is to just shut off Autofinish suggesting any experiments at all: consed.autoFinishCallReversesToFlankGaps: false consed.autoFinishAllowDeNovoUniversalPrimerSubcloneReads: false consed.autoFinishAllowResequencingReads: false consed.autoFinishCoverSingleSubcloneRegions: false consed.autoFinishCoverLowConsensusQualityRegions: false consed.autoFinishNearGapsSuggestEachMissingReadOfReadPairs: false consed.autoFinishAllowCustomPrimerSubcloneReads: false consed.autoFinishAllowPCR: false consed.autoFinishAllowMinilibraries: false Thus Autofinish will order and orient the contigs, printing out the forward/reverse pairs that connect the contigs, as exemplified below: Examining existing fwd/rev pairs that flank gap at Right end of Contig46 read (start end) mate read mate contig (start end) contig (pos pos) name name (pos pos) length agroa3_fp19q452.x1u3 -> -13 824 agroa3_fp19q452.y1 Contig47 -> 365 1282 1844 mag3gpk041f9.y1 -> 29 689 mag3gpk041f9.x1 Contig53 -> 139675 140402 140921 agroa3_fp06q173.x1u3_m -> 91 1053 agroa3_fp06q173.y1 Contig53 -> 139683 140618 140921 mag2gpk013f21.y1 -> 314 887 mag2gpk013f21.x1 Contig57 -> 257472 258018 261664 mag3gpk064f6.x1 -> 542 1173 mag3gpk064f6.y1 Contig53 -> 139061 139709 140921 mag3gpk105a18.y1 -> 710 1318 mag3gpk105a18.x1 Contig53 -> 138774 139329 140921 Enough existing fwd/rev pairs to establish: Right end of Contig53 has 4 fwd/rev pairs connecting it to Right end of Contig46 with gap size -17 (contigs overlap) This shows you that (according to the naming convention of this lab), the following templates span the gap between the right end of Contig53 and the right end of Contig46 (clearly one of these contigs is complemented with respect to the other): agroa3_fp19q452 mag3gpk041f9 agroa3_fp06q173 mag2gpk013f21 mag3gpk064f6 mag3gpk105a18 However, what are you going to do now with these templates? Walk on them? Resequence the universal primer reads? Whichever you plan to do, why not allow Autofinish to make the suggestions and spend you time on the harder problems? 89) FINISHING A SPECIFIC CONTIG ../../consed -ace autofinish.fasta.screen.ace.1 -autofinish -contig Contig1 This will just finish Contig1. 90) MARKING THE END OF THE CLONE Autofinish tries it best to recognize the end of the clone (BAC), and it does pretty well, but you might have information it doesn't have, such as knowing the sequencing of the BAC or having reads that were primed from within the BAC vector. You can tell Autofinish this information by adding cloneEnd tags. You can do this in Consed as follow: In the Aligned Reads Window put the cursor on the consensus at the base position marking the end of the insert. Point at the "Misc" menu, hold down the left mouse button and release on "Add Clone End Tag With Insert To Right" (alternatively, "Add Clone End Tag With Insert To Left"). Then save the assembly and run Autofinish. If you want such tags to be added automatically, you could write a perl script to append such tags to the ace file. ---------------------------------------------------------------------------- USING AUTOPCRAMPLIFY If you know a region of an assembly that you would like to amplify using PCR, you can use this program to do it. We have used it for amplifying cDNAs. It can handle very high throughput: on a slow computer it takes about 5 minutes to find PCR primers for a hundred different regions. AutoPCRAmplify requires a file from you that specifies the regions you want to amplify: Type: cd autofinish/edit_dir (You might need to first type "cd ../.." depending on where you are.) more autoPCRAmplify.txt You will see such a file: regionA Contig1 20-100 -> Contig1 1000-1160 <- biggest regionB Contig1 5200-5270 -> Contig1 6000-6050 <- smallest regionA and regionB are the identifiers for the 2 regions that will be amplified. "Contig1 20-100 ->" says that you want a top strand primer to be within the region 20-100 of Contig1. "Contig1 1000-1160 <-" says you want the bottom strand primer within the region 1000 to 1160 of Contig1. "biggest" means that if there are more than one pair of primers that satisfy the above conditions (there will typically be zillions), you want the one with that gives the largest product. "smallest" means you want the smallest product. Run autoPCRAmplify by typing: consed -ace autofinish.fasta.screen.ace.1 -autoPCRAmplify autoPCRAmplify.txt Type: ls -tlr to see what file was just created. It should be a file called autofinish.020301.113327.out (where 020301.113327 is replace by the current date/time in the format YYMMDD.HHMISS) Look at this file with your favorite UNIX editor and you will see the following (with lots of interspersed information): PRIMER_PAIR { id: regionA product_size: 1136 tgatttaatataatttcaagaaaatcc temp: 56 24-50 in Contig1 cattgtggttttaatttgcatt temp: 57 1138-1159 in Contig1 } PRIMER_PAIR { id: regionB product_size: 791 ggctgacgcttgtaatcc temp: 57 5249-5266 in Contig1 gattatacgcgtgagcca temp: 56 6022-6039 in Contig1 } These are the primer pairs for the 2 regions. ---------------------------------------------------------------------------- ADVANCED PHRAP/CONSED USAGE 91) BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY If you decide that all your edits are terrible and you want to start over (perhaps you have been training a new finisher), the cleanest solution is to delete everything in phd_dir and edit_dir , but leave everything in chromat_dir and just run phredPhrap again. 92) SELECTIVELY BACKING OUT EDITS AND REMOVING READS If you want to back out all edits in just particular reads, I have provided a perl script to do this: revertToUneditedRead (read name) What it does it copy the .phd.1 to 1 greater than the highest version. Then you must reassemble using the phredPhrap script to create an ace file that has no edits for that particular read. It will have all edits for all other reads. Why doesn't it just delete all phd files except for the .phd.1? In that case, Consed could not read any previous ace file since all previous versions of ace files would refer to phd files that have been deleted. 93) REMOVING READS FROM AN ASSEMBLY Create a file containing the filename of all the reads you want to remove, one filename per line. Then use the perl script removeReads Then reassemble using the phredPhrap script. 94) ADDING READS WITHOUT CHROMATOGRAM FILES This may happen if you, for example, download sequence from Genbank and want to assemble it along with your reads. There are 2 ways to do this, depending on whether you want to edit the read or not. a) If you want to edit the read, run mktrace to produce a fake trace. It will have all perfect peaks. Run: mktrace (name of file with fasta sequence) Then run the phredPhrap script normally. You will be able to bring up the traces in Consed and edit the read. b) If it is not important to edit the reads, there is a method that is a little faster. Create just a fake phd file using: fasta2Phd.perl (name of file with fasta sequence) It will create a file whose name is taken from the fasta file name: for example, if the fasta filename is Contig1.c.fasta, then the phd file will be called Contig1.c.phd.1 The fasta name in the file is ignored. You can then put this in the phd_dir, and reassemble using the phredPhrap script. If the reads are really fake (you don't want the templates to be chosen by Consed/Autofinish as a template for a primer), then the read should end with an extension .c or .a or .c1 or .c2 ... or .a1 or .a2 or ... This indicates to Consed/Autofinish that the read is a fake read. Note: when you are creating phd files such as this, you must start with (read name).phd.1 Do not start with (read name).phd.2 or any higher version number. This is because Consed looks for the .1 version in order to find the original phred calls so it expects there to be a .1 version. 95) FASTER CONSED STARTUP You can greatly speed up Consed startup if you are willing to use more disk space. The disk space used will be about equal to the total space used by the PHD files. Try this will a large dataset (you won't notice any difference with the test datasets that come with Consed.) To use this method of startup: 1) cd to directory where ace file is kept 2) type: catPhdFiles.perl (This will create a file called phd.ball which is big.) 3) start consed normally In many situations, this will greatly speed up Consed startup. The amount of speedup depends on which operating system is used: on Linux, the time to read phd files dropped from 75 seconds to 8 seconds, and thus the total time to start up consed dropped from 86 seconds to 17 seconds. I saw similar speedups on Solaris where the phd files are on an nfs mounted disk. However, there was another situation in which the startup time was the same. Warning: If you create phd.ball as above, Consed will be reading most phd files from phd.ball instead of from ../phd_dir. If you delete phd files in phd_dir, you must also delete phd.ball. Otherwise Consed will give lots of error messages "TIME STAMP MISMATCH" and many things will not work correctly. 96) WHY ARE ALL THE READS NOT IN THE ASSEMBLY? You will notice that there are some contigs that contain only one read. You will also notice that there are some reads that are not shown by Consed at all, since phrap did not put them into the ace file. Why? If a read does not have a significant match (with Smith-Waterman score exceeding minscore) to any other read, that read is not included in the ace file. Instead, that read is put in the '.singlets' file. That read will not appear in Consed. If a read does have a significant match to any other read, then it will appear in the ace file and be shown by Consed. However, such a read might have other problems: it might not be possible to assemble such a read with other reads (in the case of EST's this read may be a unique representative of a particular gene (or a genomic sequence contaminant) that happens to contain an Alu repeat and thus happens to match other reads in the data set; or it may represent the only read of a particular alternatively spliced form; or it may have data anomalies of some sort (chimeras, etc.). Such a read would end up in a contig all of its own. 97) ARE THERE READS THAT ARE TOTALLY UNALIGNED? Unfortunately, yes. In my opinion, Phrap shouldn't have put them in the assembly at all. But we just have to live with it. You can find if a read is totally unaligned by pointing the the read name in the Aligned Reads Window and holding down the right mouse button. Consed will tell you the aligned positions, the high quality position, and the chemistry of the read. 98) VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS If you have a chromatogram, you can use Consed to view it, even if it hasn't been assembled into the ace file. This is common with cDNA assemblies in which the reads don't overlap and thus phrap doesn't put them together into a contig. To do this, make the same edit_dir, phd_dir, and chromat_dir as above, put the chromatogram into chromat_dir, run phred on it to generate the phd file which goes into phd_dir. Then go to edit_dir and run: phd2Ace.perl (name of phd file) For example, if your phd file is myRead.phd.1 from edit_dir, type: phd2Ace.perl myRead.phd.1 This will produce myRead.ace Then just start Consed normally: consed -ace myRead.ace and you can view the chromatogram. MULTIPLE TRACE POPUP 99) Bring up dataset standard. In the Aligned Reads window, scroll to a region that has many reads and that has some discrepancies--try position 1162. Hold down the shift key, and click with the middle mouse button on the consensus. At this location 3 traces will pop up--these are the 2 highest quality traces that agree with the consensus (on each strand) and the highest quality trace that disagrees with the consensus. This feature is useful in areas of high coverage when you want to rapidly examine just the most significant traces rather than looking at all of them. MAXIMUM NUMBER OF TRACES DISPLAYED 100) Bring up dataset standard. Scroll to position 1162. Bring up 4 reads and then try bringing up additional reads.You will notice that new reads are put at the top of the stack of traces and, once there are 4 traces displayed, traces are automatically removed from the bottom of the stack. If you want to change this maximum number of traces to something besides 4, you can do that: In the Main Consed Window (click on 'Find Main Win' on the Aligned Reads window), pull down the 'Options' menu, and release on 'General Preferences'. Try changing the 'Max Number of Traces Shown' to 3. Then click 'Apply and Dismiss'. Now dismiss the Trace Window and again start adding additional traces to the Trace Window. You will notice that now the number of traces shown will not exceed 3. HOTKEYS FOR EDITING 101) If you do a lot of editing, you will want to have a faster method of doing these edits than having the popup and selecting an option. Thus the following hot keys exist: < and > (less than and greater than) to make n's to the left and the right (respectively) of the cursor control-l and control-r to make low quality to the left and the right (respectively) of the cursor overstriking with a capital letter (e.g., C instead of c) causes the base to become high quality rather than low quality overstriking with a lower case letter causes the base to become low quality Give these a try. COLOR MEANS MATCH 102) In the Aligned Reads Window, go to the menu labelled 'color', and pulldown and release on 'color means match'. Now you notice different colors: The colors have the following meaning: Blue: agrees with consensus Orange: disagrees with consensus Yellow: this stretch of this read was used by phrap to form the consensus Grey: Low quality or unaligned ends of reads Now go back to the colormode 'color means quality and tags' (the default) for the next exercise. SCROLLING TRACES INDEPENDENTLY 103) Dismiss all of your Trace Windows. Then pop up traces for 2 different reads in approximately the same location. Scroll one of them. You may want to scroll by clicking the arrows or clicking to the left or right of the thumb. You will notice that both will scroll. Consed will do its best to have corresponding peak lined up. (Consed can't line all of them up because the peak spacing is not uniform and differs from read to read.) Try removing a trace by clicking on one of the 'Remove' buttons in the Trace Window. Try adding other traces. Then click on 'No' for scrolling the traces together and try scrolling. You will now observe that they scroll separately. ABI BASE CALLS 104) If you want to see the ABI base calls, no problem. Just go to the Main Consed Window. Pull down the 'Options' menu and release on 'General Preferences'. Click on 'True' for 'Show ABI Bases in Trace Window' and then click 'OK' at the bottom of the window. The ABI bases will not be shown immediately--you must first dismiss the trace window and bring it up again. You will then see an additional line with the ABI base calls. MEASURING ERROR RATE AND SINGLE SUBCLONE BASES FOR A REGION 105) Some contigs have long tails of low quality bases and you would like to find out the error rate for the contig without that long tail. On the Align Reads Window, pull down the Misc menu, and release on 'Show Errors for a Region'. This will tell you both the error rate for the region and the number of single subclone bases for that region. 106) LONG, LONG, LONG READ NAMES If you have very long read names, you might not be able to see the whole name in the Aligned Reads Window. To fix this, go to the Main Consed Window, pulldown the 'Options' menu and release on 'General Preferences'. Scroll down until you see "Max Chars for Read Names in Aligned Reads Window". Increase the number and click on "Apply". When you are satisfied with how the read names look in the Aligned Reads Window, click on "Cancel" in the General Preferences Window. You can make this change permanent with the parameter: consed.alignedReadsWindowMaxCharsForReadNames: 20 (see CONSED CUSTOMIZATION) 107) PREVENTING 2 USERS FROM MAKING CONFLICTING EDITS If there are 2 users that are both editing in the same directory, there is the possibility they will both make edits to the same read. Whoever saves their assembly last will wipe out the edits of the other person, even if they were using different ace files. To help prevent this, consed can warn you if someone else is making edits in the same directory. Set the consed parameter: consed.onlyAllowOneReadWriteConsedAtATime: true The default is "false" so you have to turn this to true to make it work (see CONSED CUSTOMIZATION). This will usually work even if the 2 users are on different computers (and the directory is nfs-mounted between them) and even if the different computers have different operating systems. I've tested the following combinations: user 1 on Solaris; user 2 on Solaris user 1 on Linux; user 2 on Linux user 1 on Solaris; user 2 on Alpha (Digital Unix) user 1 on Linux; user 2 on Solaris <--- does not work Only the last combination doesn't work. 108) PRINTING CONSED WINDOWS There is a free (or nearly free) program called "xv". One web site is http://www.trilon.com/xv It is written by one of those dying breed of UNIX programmers who just *loved* UNIX and programming and sharing it. His web site is enjoyable because some of his passion comes through. With xv, you can make a postscript file from a Consed window. Then you can print the postscript file on a color printer. However, since some Consed windows are mostly black (Aligned Reads Window and Traces Window), this uses up a lot of toner and is difficult to read. So go to the Main Consed Window, pulldown the 'Options' menu and release on 'General Preferences'. Scroll down to "Make light background in Aligned Reads Window..." and click on "Do it now". Dismiss any Aligned Reads Windows or Traces Windows and then bring them back up. You will notice the light background. A few other things (traces colors and thickness) are also customized for making color prints. ------------------------------------------------------------------------ INSTALLING CONSED You MUST have the following phred, phrap, phd2fasta, and crossmatch in order to use this version of Consed: 000925.c or later for phred 0.990319 or later for phrap and crossmatch 0.990622.e or later for phd2fasta (supplied with this version of consed) any version of addReads2Consed.perl (supplied with this version of consed) any version of phredPhrap (supplied with this version of consed) 011130 or later for transferConsensusTags.perl (supplied with this version of consed) any verion of tagRepeats.perl (supplied with this version of consed) any version of determineReadTypes.perl (or your own custom modified version) For phred, contact bge@u.washington.edu (Brent Ewing) For phrap and crossmatch, contact phg@u.washington.edu (Phil Green) Summary of files your must edit (instructions are below): addReads2Consed.perl determineReadTypes.perl phredPhrap primerCloneScreen.seq primerSubcloneScreen.seq repeats.fasta vector.seq In order to run the gauntlet of phred/phd2fasta/crossmatch/phrap, there is a perl script phredPhrap supplied with Consed (above). YOU MUST USE THIS PERL SCRIPT. If you try to run each of these programs directly, you are on your own and you will probably spend a lot of time needlessly. 109) Follow the first few steps of USING CONSED GRAPHICALLY of the Quick Tour (above). If you have problems, it may be due to your X emulator. See 'MONITORS FOR CONSED' below. If you want to try to take a short-cut through the steps below, and you are using Redhat Linux, a fellow Consed user has written a script and instructions to allow you to take this short-cut. I have not studied nor tested his script, but he usually does good work. If you want to try it, see: The instruction are at: http://ludwig.ucdavis.edu/consed-install.txt The script is at: http://ludwig.ucdavis.edu/install-PHUI.pl Alternatively, follow the instructions below: 110) I suggest you put Consed, phred, crossmatch, phrap, the perl scripts, and other executables into /usr/local/genome/bin. So create /usr/local/genome/bin and /usr/local/genome/lib If you can't actually use /usr/local/genome, then you could make /usr/local/genome be a link to the real location--that will work just as well. If you want to have another location xxx, then put: setenv CONSED_HOME xxx into the .cshrc (or equivalent) of all Consed users and create $CONSED_HOME/bin and $CONSED_HOME/lib and put all of these programs into $CONSED_HOME/bin 111) Make sure that /usr/local/genome/bin (or $CONSED_HOME/bin) is in every Consed users' PATH. 112) Put the Consed executable in /usr/local/genome/bin (or $CONSED_HOME/bin) 113) Check this by logging on as a user and typing: consed -V You should see 'Version 12.0'. If you see something else, you have some debugging to do. 114) Build phd2fasta: Go to the misc/phd2fasta directory and type 'make' Move the phd2fasta executable to /usr/local/genome/bin (or $CONSED_HOME/bin) 115) Build mktrace: Go to the misc/mktrace directory and type 'make' Move the mktrace executable to /usr/local/genome/bin (or $CONSED_HOME/bin) 116) Move all perl scripts from the scripts directory to /usr/local/genome/bin (or $CONSED_HOME/bin) Make sure all are executable (chmod a+x *) DELETE ANY PREVIOUS VERSIONS OF THESE SCRIPTS OR YOU WILL BE SORRY! (Bugs have been fixed.) 117) Get perl 5. (If you have Linux, you already have it so you can skip this step.) You can check where to get perl via the perl web site: http://www.perl.com/perl/info/software.html (If you don't know about perl, try it--it will save you a huge amount of time over developing the same utilities in C, awk, or csh or sh.) Regardless where you put perl, put a link to it in /usr/bin so that all of the scripts with #!/usr/bin/perl will work and you won't have to edit all of them everytime a new Consed release comes out. 118) Create a subdirectory /usr/local/genome/lib/screenLibs. (If you are using a location other than /usr/local/genome for the root of all Phred/Phrap/Consed programs, create $CONSED_HOME/lib/screenLibs). From the misc subdirectory, copy primerCloneScreen.seq and primerSubcloneScreen.seq to the directory /usr/local/genome/lib/screenLibs (or $CONSED_HOME/lib/screenLibs). Take a look at these files. They are dummy files indicating the fasta format of the sequences that should be put in them. You should put into primerCloneScreen.seq the vector sequence of the cloning vectors you are using (BAC or cosmid) and into primerSubcloneScreen.seq the sequencing vectors you are using (plasmid, M13, etc). Don't be too generous in putting lots of vectors into the files! The larger they are, the slower primer picking will be. Our files are only this big: -rw-r--r-- 1 root root 29938 Nov 7 1997 primerCloneScreen.seq -rw-r--r-- 1 root root 7381 Aug 13 1997 primerSubcloneScreen.seq and primer picking is quite fast enough. Now that you have set this up, you should try the PRIMER PICKING sections in the Quick Tour (above) to make sure this works. 119) You should create a file /usr/local/genome/lib/screenLibs/vector.seq (or $CONSED_HOME/lib/screenLibs/vector.seq if you are not using /usr/local/genome for the root of the Phred/Phrap/Consed files.) This contains all the vector sequences (in FASTA format) that you want to mask out before phrapping. In general, it is the combination of primerCloneScreen.seq and primerSubcloneScreen.seq 120) You should create a file /usr/local/genome/lib/screenLibs/repeats.fasta (or $CONSED_HOME/lib/screenLibs/repeats.fasta if you are not using /usr/local/genome for the root of the Phred/Phrap/Consed files.) In this file, put any sequences (in FASTA format) that you want to have automatically tagged. These typically are ALU sequences. If you don't want to tag anything, then comment out (put '#' as the first character of the line) the following lines in phredPhrap: Change: !system( "$tagRepeats $szAceFileToBeProduced" ) || die "some problem running $tagRepeats"; to: #!system( "$tagRepeats $szAceFileToBeProduced" ) # || die "some problem running $tagRepeats"; 121) You should create a file /usr/local/genome/lib/screenLibs/singleVectorForRestrictionDigest.fasta and put in it the cloning vector. This is used for doing in-silico restriction digests. Thus this cloning vector must start at precisely the site where you cut the vector to ligate the insert. It is not sufficient to just download the vector sequence from Genbank. You might need to have it start in a different place. Try the restriction digest feature (RESTRICTION DIGEST above) to make sure this works. 122) determineReadTypes.perl Phrap, Consed's primer picking, and Consed/Autofinish all need the following information for each read: is it a univeral primer forward, a universal primer reverse, or a walking read? what is its template name? Generally this information can be determined from the read name, using *your* naming convention. Modify the perl script determineReadTypes.perl to put this information at the end of the phd file using WR info items. If you don't want to do any perl programming, you have the option of using the St Louis naming convention as is. But what is the St Louis naming convention? Most of it (but not all) is explaned in the phrap documentation. In addition, you must never use an underscore in the name if the read is a universal primer forward or universal primer reverse read. If the read is a walk, then you must have an underscore (_) follow the template name and then have a number (the oligo number). Examples of reads in the St Louis naming convention: read eeq03a01.g1.phd.1 is univ rev template: eeq03a01 library: eeq03 read eeq03a02.b1.phd.1 is univ fwd template: eeq03a02 library: eeq03 read eeq03a02.g1.phd.1 is univ rev template: eeq03a02 library: eeq03 read eeq03a03.b1.phd.1 is univ fwd template: eeq03a03 library: eeq03 read eej45h07_2.i1.phd.1 is walk template: eej45h07 library: eej45 read eej46c12_1.i1.phd.1 is walk template: eej46c12 library: eej46 Once you have correctly customized determineReadTypes.perl, then uncomment the line in phredPhrap which calls determineReadTypes.perl Consed allows you to check that you have correctly modified determineReadTypes.perl: On the Main Consed Window, point to 'Info', hold down the left mouse button, and release on 'Show Info for Each Read'. Check that the information presented is correct. If, for example, Consed thinks that there are templates that have 9 or more reads, it is likely that you have not correctly customized determineReadTypes.perl If you think you have made a mistake in customizing determineReadTypes.perl, it is best to delete the PHD files (and phd.ball if you are using that) and run phredPhrap again since the otherwise incorrect WR items will be left in the PHD files. See the script determineReadTypes.perl for more information about how to customize it. 123) Fake Reads In the past, any read that ended with a .a2 or .c3 (where 2 and 3 could be any numbers), was considered a fake read. Now you can make Autofinish not assume this using the .consedrc parameter: cons