If you wish to scan your DNA sequence for transcription factor binding
sites you have two choices in GCG.
I. Here's the first process
1) Start GCG and "fetch" tfsites.dat.
tfsites.dat is a compilation (a bit outdated--who knows why
it is supplied by GCG and there was NOT a newer release
available from NCBI...) of known transcription factor
binding sites along with their literature references.
The current tfsites.dat file fetched this way is
dated 1996. A 2003 version in GCG format(tfsites.gcg) may be
retrieved from this site.
2) Select your to be scanned sequence and use findpatterns
with the following option: -data=tfsites.dat
ie:
findpatterns -dat=tfsites.dat filename.seq
^^^^^^^^^^^^^^^^
3) The outputfile ie filename.find contains all the locations
where the recognition sites for transcription factors have
been mapped. You can scan through the file to see if anything
looks useful to you.
4) There is no provision for finding the references to these
known sites. This is a flaw in GCG. To get around this
I wrote some things and Karen Jesmer at CCIT helped a huge amount
to create a short unix shell script which will read your findpatterns
output file and then get the references which match your hits.
Send an email to Starr Hazard requesting the unix shell program starr
5) Using "starr"
a) run findpatterns with -dat=tfsites.dat
b) type sh, starr, the findpatterns output file and the ref file
to be created ie
sh starr filename.find filename.findref
c) examine filename.findref for the references which your findpattern
search located.
II. Here's the second way
1) The tfsites.dat file may also be read by any of the map programs
to locate the tfsites along your sequence.
type
mapplot -dat=tfsites.dat filename.seq
^^^^^^^^^^^^^^^^
This will create a "digestion" map showing the places where the tfsites
finds recognition sites. Of course, the MAP program will use the
tfsites.dat file as well. Type:
map -dat=tfsites.dat filename.seq
^^^^^^^^^^^^^^^^
7) OR you could use Dan Prestridge's SignalScan program. To use this you
must add two lines to your .cshrc file. Then save the modifications.
Finally type "source .cshrc" to activate the changes (or start a new
shell, or log out then login again). Typing "signal" should initiate
SignalScan. This is not a better program its just different. You can
go back and look at the references but only one at a time.
There are the two lines to add to your .cshrc file. Send
an e-mail to Starr Hazard to get these two
lines.
8) OR finally, refer to the following links to Web resources. These do not generally work better or faster but they do give you hypertext links to the references and are therefore more convenient in that regard.
The IFTI site returns some nice graphics regarding the strength of the hits. This is useful for sorting voluminous data.
The NIH Proscan site is a Web implementation of Dan Prestridge's PROSCAN program.
revised by ESH September 8, 2003