Simple and complex SINEs
Simple SINEs have no body and consist of a head and a tail (Borodulina 2005), thus, resembling RNA pseudogenes. However, such simple SINEs have specific nucleotide substitutions distinguishing them from the RNA of origin, which indicates that they directly descend not from the RNA gene but from a SINE copy with such substitutions.
On the other hand, SINEs can combine into dimers (or even trimers) that are further amplified in this complex form. Such complex SINEs can combine representatives of the same or different families.
In addition to true dimers, there are SINEs with internal duplications called quasidimeric SINEs (Labuda 1991); quasioligomeric SINEs contain more than two repeat units.
Stringent and relaxed recognition SINEs
LINE reverse transcriptase (RT) utilizes one of two systems protecting it from processing foreign RNA templates: (i) specific sequence recognition of the RNA encoding the enzyme and (ii) cis-preference, when the LINE mRNA translated into RT is used by it as the template for reverse transcription. SINEs utilizing RTs of the first (stringent recognition) group have the sequence recognized by the RT at their 3' end (Wei 2001, Kajikawa 2002). It remains unclear how SINEs utilizing RTs of the second (relaxed recognition) group overcome the cis-preference, but all of them have poly(A) or A-rich tails as the recognized sequence.
CORE and similar central domains
The body of SINEs is usually unique for each SINE family, and its origin is largely unclear; this pattern is particularly common in mammals. At the same time, a part of the body can contain domains shared by distant SINE families. To date, four such domains have been described: CORE domain in vertebrates (Gilbert 2000), V-domain in fishes (Ogiwara 2002), Deu-domain in deuterostomes (Nishihara 2006), Ceph-domain in cephalopods (Akasaki 2010), α-domain in a wide range of species (largely invertebrates; Vassetzky 2013), and β-domain in mammals and fish (Vassetzky 2013). SINESearch allows to search sequences against COREBase, the bank with consensus sequences generated for these domains.
TSD is a duplication of a short genomic sequence at the insertion point appeared in the course of the reverse transcription. This results from the asymmetric cleavage pattern of the endonuclease activity in most LINE reverse transcriptases. In rare cases, the nicks approach each other in the two strands and no clear TSDs can be found (such SINE families are indicated as TSD– in the SINE Table). | Typical cleavage site for reverse transcriptase of mammalian L1 (Jurka 1997): |
5'‑...AA TTTTN~15↓...‑3' 3'‑...TT↑AAAAN~15 ...‑5' |
Many SINE families (actually, most mammalian SINEs) have an A-rich tail, and some of them have a terminator at the 3' end and one or several AATAAA signals in the tail. Elements with these signals are called T+ SINEs, while those without them are T– SINEs (Borodulina 2001).
Certain SINEs contain stretches of simple repeats such as (TC)n in their body. These microsatellite-like sequences can vary in length between SINE copies.
All SINEs as well as the cellular RNA genes that gave rise to them possess an internal pol III promoter. The promoter in tRNA- and 7SL RNA-derived SINEs includes 11-bp boxes A and B at a distance of 30–35 bp, while in 5S rRNA-derived SINEs, the promoter is composed of boxes A, IE, and C. The internal pol III promoter is indispensable for SINEs since it is preserved in new SINE copies, thus making possible their transcription.