Regular expressions (also known as regex) provide a flexible way to match strings of text, such as particular characters, words, or patterns of characters. Below is an introduction into regex usage in SINETable (actually regex allows much more than shown but too sophisticated patterns make little sense in SINETable).
You can use special characters and wildcards in SINETable search boxes:
Symbol | Description |
. | any single character |
\w | a single alphanumeric character or underscore |
\d | a digit (0-9) |
[ACG] | any single character in the set (here: A, C, or G) |
^ | beginning of line |
$ | end of line |
Any character, wildcard, or their combination in parentheses can be followed with a repetition:
Repetition | Description |
* | zero or more occurrences |
+ | one or more occurrences |
? | zero or one occurrences |
For instance, 5S in the Structure box will match SINEs with a 5S rRNA-derived region located anywhere, while ^5S will match only those with a 5S rRNA-derived region at the 5' end. ^..$ in the Tail box will show tails composed of dinucleotide repeats. .+ matches any character repeated one or more times. Thus, tRNA.LINE will match SINEs with a tRNA-derived region immediately followed by a LINE-derived region, while tRNA.+LINE will match SINEs with a LINE-derived region anywhere downstream of a a tRNA-derived region.
Certain characters have special meaning in regex including repetition characters, various brackets, and hyphen. Precede such character in the search pattern with a backslash; e.g., \?\?\? will match SINEs with a region of unknown origin ('???'). Note that the search is case sensitive, so tc in the Features box will fail to match SINEs with a (TC)n stretch; use TC instead.
Finally, alternative patterns are separated by vertical bar; e.g., (tRNA)|(ANRt) in the Structure box will match SINEs with tRNA-derived regions in both orientations.
Other examples:
Pattern | Box | Description |
L1 | LINE | SINEs mobilized by L1 |
\[L1\] | LINE | SINEs mobilized by a LINE of the L1 clade |
^tRNA.*CORE | Structure | tRNA-derived SINEs with a central domain |
(Trp)|(Tyr) | tRNA | SINEs with similarity to either tryptophan or tyrosine tRNAs |
7SL.*7SL | Structure | SINEs with at least two 7SL RNA-derived regions |
^[AG].{0,3}$ | Tail | SINEs with tails composed of repeat units starting from a purine and 1 to 4 nt in length (A, GAA, AGC, ATTT, etc.) |
IUPAC nucleotide code | Base |
A | Adenine |
C | Cytosine |
G | Guanine |
T (or U) | Thymine (or Uracil) |
R | A or G |
Y | C or T |
S | G or C |
W | A or T |
K | G or T |
M | A or C |
B | C or G or T |
D | A or G or T |
H | A or C or T |
V | A or C or G |
N | any base |
. or - | gap |