|
|
preg |
A regular expression is a way of specifying an ambiguous pattern to search for. Regular expressions are commonly used in some computer programming languages and may be more familiar to some users than to others.
The following is a short guide to regular expressions in EMBOSS:
The following quantifier characters specify the number of time that the character before (in this case 'x') matches:
Quantifiers can follow any of the following types of character specification:
Combining some of these features gives these examples from the PROSITE patterns database:
'[STAGCN][RKH][LIVMAFY]$'
which is the 'Microbodies C-terminal targeting signal'.
'LP.TG[STGAVDE]'
which is the 'Gram-positive cocci surface proteins anchoring hexapeptide'.
Regular expressions are case-sensitive. The pattern 'AAAA' will not match the sequence 'aaaa'.
% preg Regular expression search of a protein sequence Input sequence(s): tsw:*_rat Regular expression pattern: IA[QWF]A Output file [100k_rat.preg]: |
Go to the input files for this example
Go to the output files for this example
Mandatory qualifiers:
[-sequence] seqall Sequence database USA
[-pattern] regexp Regular expression pattern
[-outfile] outfile Output file name
Optional qualifiers: (none)
Advanced qualifiers: (none)
General qualifiers:
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
|
| Mandatory qualifiers | Allowed values | Default | |
|---|---|---|---|
| [-sequence] (Parameter 1) |
Sequence database USA | Readable sequence(s) | Required |
| [-pattern] (Parameter 2) |
Regular expression pattern | Any regular epression pattern is accepted | Required |
| [-outfile] (Parameter 3) |
Output file name | Output file | <sequence>.preg |
| Optional qualifiers | Allowed values | Default | |
| (none) | |||
| Advanced qualifiers | Allowed values | Default | |
| (none) | |||
preg search of tsw:*_rat with pattern IA[QWF]A
Matches in 100K_RAT
100K_RAT 390 IAQA
|
| Program name | Description |
|---|---|
| antigenic | Finds antigenic sites in proteins |
| digest | Protein proteolytic enzyme or reagent cleavage digest |
| fuzzpro | Protein pattern search |
| fuzztran | Protein pattern search after translation |
| helixturnhelix | Report nucleic acid binding motifs |
| oddcomp | Finds protein sequence regions with a biased composition |
| patmatdb | Search a protein sequence with a motif |
| patmatmotifs | Search a PROSITE motif database with a protein sequence |
| pepcoil | Predicts coiled coil regions |
| pestfind | Finds PEST motifs as potential proteolytic cleavage sites |
| pscan | Scans proteins using PRINTS |
| sigcleave | Reports protein signal cleavage sites |
Other EMBOSS programs allow you to search for simple patterns and may be easier for the user who has never used regular expressions before: