Regular expressions (also known as regex) provide a flexible way to match strings of text, such as particular characters, words, or patterns of characters. Below is an introduction into regex usage in SINETable (actually regex allows much more than shown but too sophisticated patterns make little sense in SINETable).

You can use special characters and wildcards in SINETable search boxes:

Symbol Description
. any single character
\w a single alphanumeric character or underscore
\d a digit (0-9)
[ACG] any single character in the set (here: A, C, or G)
^ beginning of line
$ end of line

Any character, wildcard, or their combination in parentheses can be followed with a repetition:

Repetition Description
* zero or more occurrences
+ one or more occurrences
? zero or one occurrences

For instance, 5S   in the Structure box will match SINEs with a 5S rRNA-derived region located anywhere, while ^5S  will match only those with a 5S rRNA-derived region at the 5' end. ^..$   in the Tail box will show tails composed of dinucleotide repeats. .+   matches any character repeated one or more times. Thus, tRNA.LINE   will match SINEs with a tRNA-derived region immediately followed by a LINE-derived region, while tRNA.+LINE   will match SINEs with a LINE-derived region anywhere downstream of a a tRNA-derived region.

Certain characters have special meaning in regex including repetition characters, various brackets, and hyphen. Precede such character in the search pattern with a backslash; e.g., \?\?\?   will match SINEs with a region of unknown origin ('???'). Note that the search is case sensitive, so tc   in the Features box will fail to match SINEs with a (TC)n stretch; use TC   instead.

Finally, alternative patterns are separated by vertical bar; e.g., (tRNA)|(ANRt)   in the Structure box will match SINEs with tRNA-derived regions in both orientations.

Other examples:

Pattern Box Description
L1 LINE SINEs mobilized by L1
\[L1\] LINE SINEs mobilized by a LINE of the L1 clade
^tRNA.*CORE Structure tRNA-derived SINEs with a central domain
(Trp)|(Tyr) tRNA SINEs with similarity to either tryptophan or tyrosine tRNAs
7SL.*7SL Structure SINEs with at least two 7SL RNA-derived regions
^[AG].{0,3}$ Tail SINEs with tails composed of repeat units starting from a purine and 1 to 4 nt in length (A, GAA, AGC, ATTT, etc.)



IUPAC nucleotide code Base
A Adenine
C Cytosine
G Guanine
T (or U) Thymine (or Uracil)
R A or G
Y C or T
S G or C
W A or T
K G or T
M A or C
B C or G or T
D A or G or T
H A or C or T
V A or C or G
N any base
. or - gap