Non-coding RNA

  • Wanna Join? New users you can now register lightning fast using your Facebook or Twitter accounts.

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#1
Non-coding RNA

The term non-coding RNA (ncRNA) is commonly employed for RNA that does not encode a protein, but this
does not mean that such RNAs do not contain information nor have function. Although it has been generally
assumed that most genetic information is transacted by proteins, recent evidence suggests that the majority
of the genomes of mammals and other complex organisms is in fact transcribed into ncRNAs, many of which
are alternatively spliced and/or processed into smaller products. These ncRNAs include microRNAs and
snoRNAs (many if not most of which remain to be identified), as well as likely other classes of
yet-to-be-discovered small regulatory RNAs, and tens of thousands of longer transcripts (including complex
patterns of interlacing and overlapping sense and antisense transcripts), most of whose functions are
unknown. These RNAs (including those derived from introns) appear to comprise a hidden layer of internal
signals that control various levels of gene expression in physiology and development, including chromatin
architecture/epigenetic memory, transcription, RNA splicing, editing, translation and turnover. RNA regulatory
networks may determine most of our complex characteristics, play a significant role in disease and constitute
an unexplored world of genetic variation both within and between species.

INTRODUCTION

Until recently most of the known non-coding RNAs (ncRNAs)
fulfilled relatively generic functions in cells, such as the
rRNAs and tRNAs involved in mRNA translation, small
nuclear RNAs (snRNAs) involved in splicing and small
nucleolar RNAs (snoRNAs) involved in the modification of
rRNAs. The central tenet of molecular biology, developed
from the study of simple organisms like Escherichia coli,
has been that RNA functions mainly as an informational intermediate
between a DNA sequence (‘gene’) and its encoded
protein. The presumption has been that most genetic information
that specifies biological form and phenotype is
expressed as proteins, which not only fulfill diverse catalytic
and structural functions, but also regulate the activity of the
system in various ways. This is largely true in prokaryotes
and presumed also to be true in eukaryotes. Reciprocally,
the extensive sequences in the higher eukaryotes that do not
encode proteins or cis-acting regulatory elements (i.e. the
majority of the vast tracts of intronic and intergenic
sequences) have been regarded as simply accumulated evolutionary
debris arising from the early assembly of genes
and/or the insertion of mobile genetic elements.
However, most of these supposedly inert sequences are
transcribed. It is also increasingly evident that RNA itself
can and does have a very wide repertoire of biological functions
(1) and, in particular—as first predicted by Jacob and
Monod 45 years ago (2)—that it is widely employed as a
means of gene regulation, both in cis and in trans, especially
in the higher eukaryotes. These RNAs are the subject of this
review.

EXPANSION OF ncRNAs AND RNA METABOLISM
IN EUKARYOTES

A limited number of trans-acting small ncRNAs have been
described in prokaryotes that appear mainly to regulate
mRNA translation or stability. Over 60 such RNAs have
been identified during the past few years in E. coli, with
another 200 or so predicted bioinformatically (3–7). Some
of these RNAs are co-expressed with mRNAs and released
by cleavage after transcription (4,7), examples of a parallel
output of regulatory RNAs that appears to be widespread in
the higher eukaryotes (. Small ncRNAs have also been
identified in other bacteria (see e.g. 9,10) and archaea (11),
which interestingly have homologs of Argonaute, a family
of RNA-binding endonucleases central to the action of micro-
RNAs (miRNAs) and small interfering RNAs (siRNAs) in
eukaryotes (12). However, ncRNAs do not dominate
genomic output in prokaryotes, representing, as far as one can
tell, only a small fraction of their genomes, which are generally
dominated (80–95%) by protein-coding sequences (13),
whose repertoire can vary widely even between closely
related strains (14).
In contrast, the higher organisms have a relatively stable
proteome, and a relatively static number of protein-coding
genes, which is not only much lower than expected but also
varies by less than 30% between the simple nematode worm
Caenorhabditis elegans (which has only 103 cells) and
humans (1014 cells), which have far greater developmental
and physiological complexity (15). Moreover, only a minority
of the genomes of multicellular organisms is occupied by
protein-coding sequences, the proportion of which declines
with increasing complexity, with a concomitant increase in
the amount of non-coding intergenic and intronic sequences,
most of which are in fact transcribed [(15,16); discussed in
more detail subsequently]. Thus, there seems to be a progressive
shift in transcriptional output between microorganisms and
multicellular organisms from mainly protein-coding mRNAs
to mainly non-coding RNAs, including intronic RNAs.
The eukaryotes, particularly the higher eukaryotes, also
have a far more developed RNA processing and signaling
system than prokaryotes, which appears to be linked to the
more sophisticated pathways of gene regulation and complex
genetic phenomena in eukaryotes, transcriptional and posttranscriptional
gene silencing, including RNA interference
(RNAi), DNA methylation and chromatin modification,
imprinting, and other phenomena such as transvection, transinduction,
dosage compensation and position effect variegation
(8,17,1. The higher eukaryotes also have a large repertoire
of RNA-binding proteins as well as many nucleic acid- and
chromatin-binding proteins whose exact specificity is
unknown or uncertain, but which may recognize different
types of RNA:RNA and RNA:DNA complexes (8,1.
Both theoretic considerations and empirical evidence indicate
that the amount of regulatory overhead scales nonlinearly
with complexity in all integrated systems, and that
regulatory architecture will progressively dominate the information
content of more complex systems, leading to complexity
limits, until and unless there is a change in the physical
basis of the regulatory architecture itself (19). The generic
solution to this accelerating regulatory problem is the superimposition
of digital communication and control systems,
which have only been broadly established in the human
intellectual lexicon during the past 20–30 years, well after the
central tenets of molecular biology were developed and after
introns were discovered. Interestingly, although it is widely
appreciated that DNA itself is a digital storage medium, it has
not been considered that some of its outputs may themselves
be digital signals, communicated via ncRNA, in addition to
the mRNAs encoding analog components (i.e. the proteins),
albeit with many design variations elaborated by alternative
splicing (which itself requires regulation).
Regulatory proteins scale almost quadratically with genome
size in prokaryotes (20,21), and extrapolation of this relationship
suggests that prokaryotes have been limited in their
complexity by their reliance on a protein-based regulatory
architecture, probably for most of their evolutionary history
(13,19,20,22). Conversely, it appears that the eukaryotes
breached this limit by the co-option of RNA as a digital
regulatory solution, in concert with the evolution of the
necessary protein infrastructure to recognize and act on
these signals (13). Indeed, both logic and evidence suggest
that both developmental programming and the phenotypic
difference between species and individuals is heavily
influenced, if not fundamentally controlled, by the repertoire
of regulatory ncRNAs (13,16–18,23), which are only now
being recognized and beginning to be studied in any systematic
way.

INFRASTRUCTURAL ncRNAs

Some infrastructural ncRNAs have been known for a long
time and have well-established functions. These include
tRNAs, rRNAs, spliceosomal uRNAs or ‘snRNAs’ and the
common ‘snoRNAs’. Both translation and splicing require
core infrastructural RNAs not only for sequence-specific recognition
of RNA substrates, but also for the catalytic
process itself (1,24–27). Recent findings indicate that some
of these RNAs, not surprisingly, may also be involved in regulatory
processes. For example, besides its role in splicing, the
U1 snRNA is involved in the regulation of transcription
initiation by RNA polymerase II through interaction with the
transcription initiation factor TFIIH (2. U1 RNA also interacts
with cyclin H (29), raising the possibility that ncRNA
might be involved in cell cycle regulation. In addition, the
small conserved nuclear RNA 7SK inhibits the kinase activity
of the CDK9/cyclin T complex, leading to reduced phosphorylation
of RNA polymerase II and a reduction in transcription
(30). The 7SK RNA acts in concert with the
HEXIM1 and HEXIM2 proteins, both of which show distinct
expression patterns in various human tissues (31–33), and
depletion of 7SK RNA by siRNA causes apoptosis in HeLa
cells (34).
ncRNAs also play a role in chromosome maintenance and
segregation (35). A small RNA with similarity to box H/ACA
snoRNAs is a component of telomerase (for review see 36)
and is mutated in autosomal dominant dyskeratosis congenita
(37). In human–chicken hybrid cells, mutation of Dicer, a key
component of the siRNA/miRNA processing machinery, leads
to the accumulation of transcripts derived from centromericsatellite
repetitive sequences, premature separation of sister
chromatids and cell death (3. ncRNA has also been implicated
in the control of chromatin architecture and epigenetic
memory (35,39; discussed further below).
There are also other types of infrastructural ncRNAs that are
involved in central cell biological processes. The ncRNA 7SL
RNA is a core component of the signal recognition particle
(SRP), a ribonucleoprotein complex that interacts with the ribosome
and is essential for targeting/transportation of nascent
proteins containing signal peptides to the endoplasmic reticulum
membrane for secretion or membrane insertion (40–43).
The 13 MDa vault complex (discovered in 1986) is the
largest ribonucleoprotein complex described to date, three
times bigger (albeit far less complex) than the ribosome. It
is present in 104 to 105 copies per cell, forms a barrel-like
structure predominantly localized in the cytoplasm and is
presumably involved in transport (for review see 44).
R18 Human Molecular Genetics, 2006, Vol. 15, Review Issue 1
Different species have between one and three vault RNAs,
ranging in length from 86 to 141 nucleotides. In multi-drug
resistant cells, the vault complex is upregulated and has a
different ratio of vault RNAs in comparison with normal
(44). Moreover, two human vault RNAs, hvg-1 and hvg-2,
specifically bind to mitoxantrone (45), a chemotherapeutic
agent commonly used for treatment of breast cancer,
myeloid leukemia and non-Hodgkin’s lymphoma.

cis-ACTING REGULATORY SEQUENCES IN
NON-CODING REGIONS OF mRNAs AND
PRE-mRNAs

Regulatory RNAs function in most cases by base-pairing with
complementary sequences in other RNAs and DNA, to form
RNA:RNA (and probably RNA:DNA) complexes that are
recognized, and acted upon, by a relatively generic infrastructure
[such as RNA-induced silencing complex (RISC)
complexes or RNA editing enzymes]. There are many wellcharacterized
examples of regulatory RNA sequences in the
untranslated regions (UTRs) of mRNAs that act in cis as
receivers of other trans-acting signals, by forming secondary
structures that bind regulatory proteins or small molecular
weight ligands. Examples of the former include sequences in
UTRs that can bind regulatory proteins or be the targets of
RNA editing to control the stability, translatability or localization
of mRNAs (46–49). Examples of the latter are the
so-called ‘riboswitches’ that regulate metabolic pathways by
binding metabolites such as vitamins, amino acids and
purines, to effect allosteric changes in the mRNA to control
its translation or stability. These have been well documented
in bacteria (50–52), but also occur in eukaryotes (53,54).
UTRs in mRNAs (as well as the coding sequences themselves)
can also be the sensors of trans-acting regulatory
RNAs, specifically miRNAs (at least some of which are
encoded in introns of other genes), by base sequence recognition
(8,55,56), which appear to have significant influence
on their evolution (57). That is, ncRNAs can either be receivers
or transmitters, or both, of regulatory signals. Interestingly,
the average length of the UTRs in mRNAs increase
with developmental complexity in animals, and is almost
equivalent to the length of the protein-coding sequences
in human (total 34 Mb of coding sequences and 32 Mb
of UTR at last count) (15), indicative of the much greater
sophistication of mRNA regulation in the higher organisms.
There are also cis-acting regulatory sequences in and
around splice junctions, some of which (the so-called
‘exon-splicing enhancers’ or ESEs) occur within proteincoding
sequences (5. Nucleotide sequence conservation is
higher around alternative splice sites than constitutive splice
sites, albeit in complex patterns (59–61). These sequences
are thought to bind regulatory proteins that influence splice
selection, but two recent papers have suggested that such
selection may, at least in some cases, involve complex
RNA:RNA interactions, which are themselves presumably
regulated by other trans-acting signals, including other
RNAs (62–64). Consistent with this, small artificial antisense
RNAs and introduced riboswitches have been shown to easily
regulate splicing in vitro and in vivo (65–6, with obvious
implications for the natural mechanisms of splicing control
(. A snoRNA has also been shown to control splicing of
serotonin receptor 5-HT(2C)R mRNA (64). In addition, a
significant number of ultra-conserved sequences in mammals
and insects are located at splice sites (63,69). It should be
borne in mind that some protein-coding sequences may have
dual function, and be themselves the targets of regulatory
molecules, such as miRNAs and siRNAs, as has been well
documented in plants (70) and has been recently shown to
occur in mammals (71–73). It should also be borne in mind
that many RNAs may combine both digital (i.e. sequencespecific)
and analog (structure-based ligand/protein binding
or catalytic) functions, and that we have barely yet scratched
the surface of these functions and networks.

LARGE NUMBERS OF ncRNAs EXPRESSED
FROM THE MAMMALIAN GENOME

The Ensembl 34b version of Human Genome annotation lists
22 287 known or predicted protein-coding gene loci. The
coding regions occupy 34 Mb (1.2%) of the euchromatic
genome, and the total fraction of bases occupied by known
protein-coding transcripts is only about 2% (15,74).
However, summation of the sequences covered by known
genes, ‘mRNAs’ and spliced ESTs indicates that (at least)
60–70% of the mammalian genome is transcribed on one or
both strands (15,75), noting that introns are also actually transcribed
(as distinct from generating stable transcripts) (Fig. 1).
These estimates are conservative, as it is clear from both
cDNA and genome tiling array studies that we have not yet
come close to plumbing the full depth or breadth of the
expressed transcripts in different types of cells under different
developmental and physiological conditions (75–79).
Large-scale cDNA cloning studies have recently shown that
there are many tens of thousands of transcripts expressed from
snoRNAs and miRNAs. in the mouse genome, a large fraction of which (over 34 000) do
not appear to encode proteins (75). These studies involved
aggressive normalization to enrich for rare transcripts, which
introduces the possibility of contamination from pre-mRNA
sequences (i.e. introns), but the findings were generally
supported by the results of large-scale promoter/transcription
start site mapping, suggesting that the observed transcriptional
complexity of the genome is real and extends far beyond what
had been previously imagined (75). It should be noted that
most putative ncRNAs are expressed at lower levels than
mRNAs, and many are rare, consistent with the suggestion
that these RNAs mainly fulfil regulatory functions. It should
also be noted that these studies, as is traditional, were orientated
towards cytoplasmic polyA þ RNA (75,76), for technical
reasons (to exclude infrastructural RNAs and primary
transcripts), on the assumption that nearly all transcripts are
processed to polyadenylated RNAs that are exported to the
cytoplasm for translation, which may not be correct.
It is also apparent that much of the mammalian genome is
transcribed from both strands. It is estimated that 5880
human transcription clusters (22% of those analyzed) form
sense–antisense pairs with most antisense transcripts being
ncRNA (80), an arrangement that exhibits considerable evolutionary
conservation between the human and pufferfish
genomes (81). A detailed analysis of the mouse transcriptome
indicated that 43 553 (72%) transcriptional units overlap with
transcripts coming from opposite strand (82). In fact, there is
evidence from spliced ESTs, annotated ‘mRNAs’ and proteincoding
genes listed on the UCSC Genome Database (83)
that at least 2.4 Gb of the human genome is transcribed,
at least 25% from both strands (Fig. 1; M. Pheasant and
J.S. Mattick, unpublished analysis). It would not be surprising
if the true extent of transcription was greater than the size of
the genome itself, noting that the upper limit is twice the
genome size.
Genome tiling array (76,77) and massively parallel signature
sequencing (MPSS) (7 studies of various tissues and
cell lines have independently revealed many thousands of
non-coding transcripts from intergenic and intronic sequences
in the human genome. Over 37% of the MPSS signatures
matched known loci, but outside of annotated exons, with
another 20% matching the complementary strand of known
transcripts, indicating the presence of as many as 50 000
additional non-annotated RNAs in analyzed human tissues
(7. These findings are reinforced by the analysis of
conserved RNA secondary structures which predict thousands
of functional ncRNAs in the human genome (84,85).
High-density genome tiling array studies of 10 human
chromosomes (approximately one-third of the human
genome) showed that 9% of the non-repetitive sequences
were expressed as detectable transcripts (‘transfrags’) in individual
cell lines, and that 16.5% of non-repetitive bases were
transcribed in at least one out of eight cell lines analyzed, indicating
that many of the observed RNAs are cell-type specific
(77), consistent with MPSS studies (7. It should be noted
that this figure is much higher than the total length of all
mRNAs expected from these chromosomes. Over 56% of
the detected transfrags do not overlap with any wellcharacterized
exon, mRNA or EST annotation; 30% map
with ‘intergenic’ regions and 26% with introns of known
genes. The latter do not appear to represent pre-mRNA
contamination, as the signals were not generally spread
across the introns, but rather showed discrete foci, indicative
of previously unknown exons or of other RNAs (perhaps regulatory
ncRNAs or their precursors) derived from these regions
(77). Moreover, for technical reasons these analyses are likely
to overlook many important small regulatory RNAs such as
miRNAs which may be present in only trace amounts and
are difficult to label by reverse transcription.
Rapid amplification of cDNA ends (RACE) analysis of
selected genomic regions (79) confirmed the existence of
these RNAs, and revealed an amazingly complex landscape
of interlacing and overlapping transcripts, not only on opposite
strands, but also on the same strand, so that there is often no
clear distinction between splice variants and overlapping and
neighboring genes, which had also been indicated by cDNA
cloning studies (75,82). This study also showed that there
are many hitherto unrecognized exons and splice variants
even in very well-studied genes, such as that encoding Sonic
Hedgehog, and that it is not unusual for a single base pair
to be part of an intricate network of multiple isoforms of
overlapping sense and antisense transcripts (Fig. 2). These
observations all have important and challenging implications
for genotype–phenotype correlations, the complexity of the
transcriptional regulation, and the definition of a gene (79),
which may now be best viewed as fuzzy transcription clusters
with multiple products (1.
Just as disturbingly, it appears that almost a large proportion
of the transcripts in human and mouse are unique to the largely
unstudied polyA2 and the nuclear polyAþ fractions of the
transcriptome (77,86), which have escaped detection in most
transcriptomic studies. It seems that we have barely begun
to uncover the extraordinary complexity of the mammalian
transcriptome.

TRANSCRIPTIONAL NOISE OR MEANINGFUL
OUTPUT?
The observation that there are literally tens of thousands of
ncRNAs expressed in mammals, and that most of the
genome is transcribed, confronts and very largely contradicts
the traditional protein-centric view of genetic information and
genome organization. There are two opposing alternatives—
either the bulk of the transcription which does not yield
mRNAs is ‘transcriptional noise’ and/or (in the case of
introns) the residue of evolutionary baggage retained or accumulated
within genes, or this transcription comprises another
level of expression and transaction of RNA information that
is important to the evolution and developmental ontogeny of
the higher organisms (13,16,18,23,87–90).
Most of the ncRNAs identified in genomic transcriptome
studies have not been studied and have yet to be ascribed
any function. However, there are many lines of evidence
that suggest that these RNAs are biologically meaningful.
First, most intensively studied gene loci, including both
those that are imprinted and conventional loci such as
beta-globin, have been shown to express non-coding transcripts
(91–96). This includes some enhancers and conserved
intergenic sequences (92,97).
Second, it is clear that many of these transcripts are celltype
specific, with specific subcellular locations, and are
developmentally regulated (77,98,99). A large number of
ncRNAs are specifically expressed from either the paternal
or maternal allele at imprinted loci, and some are associated
with human diseases, such as the Prader–Willi and Angelman
syndromes (39). Hence, the genetic cause for some, and
perhaps many, diseases may be associated with mutations
within ncRNAs. An imprinted ncRNA, LANCAT, spanning
more than megabase in the murine region orthologous to the
human Prader–Willi/Angelman syndrome locus, exhibits a
distinct expression pattern in brain, as well as a cytoplasmic
location (100). It has also been shown that some snoRNAs
and miRNAs may be encoded within the introns of imprinted
ncRNA genes (95,101). The snoRNA HBII-52 which regulates
the splicing of the serotonin receptor 5-HT(2C)R gene is not
expressed in Prader–Willi syndrome patients which have
different 5-HT(2C)R mRNA isoforms from normal,
suggesting that this defect contributes to the Prader–Willi
syndrome (64,102). Antisense transcripts associated with
eight transcription factor genes involved in eye development
also display specific expression patterns in brain, and in the
retina in particular (103). Another non-coding antisense
transcript, which has several alternatively spliced isoforms,
shows an expression pattern similar to the sense-strand
Foxl2 gene, which encodes a forkhead transcription factor
involved in development of eyelid and ovary (104).
Third, the upstream regions of ncRNA transcripts show
many of the features normally associated with promoters
(75,105,106) and, somewhat surprisingly, may be more
highly conserved than the promoters of protein-coding genes
(75). A recent large-scale study of the binding sites for the
transcription factors, Sp1, cMyc and p53, found that a large
proportion (36%) correlate with ncRNA transcripts, a significant
number of which are regulated in response to retinoic
acid, leading to the general conclusion that the human
genome contains comparable numbers of protein-coding and
non-coding genes that are bound by common transcription
factors and regulated by common environmental signals (106).
Finally, an increasing number of ncRNAs have been shown
to be functional, including the well-characterized ncRNAs Xist
and Tsix that control X-chromosome inactivation in mammals
(107,108). They also include a number of well-characterized
antisense transcripts which appear to play regulatory roles in
relation to their sense gene, including those opposite FGF-2
(fibroblast growth factor-2), HIF-1 (hypoxia inducible
factor-1) and myosin heavy chain [for review see (109)].
Increasing numbers of functional studies of ncRNAs are
being conducted using ectopic expression and RNAi-mediated
knockdowns. For example, ectopic expression of the murine
brain-specific ncRNA SCA8, which has been implicated in
Spinocerebellar Ataxia Type 8 (110), under the control of a
promoter specific to photoreceptors, results in late-onset,
progressive neurodegeneration in the Drosophila eye (111).
Moreover, using this neurodegenerative phenotype as a
sensitized background for a genetic modifier screen, mutations
were identified in four genes, all of which encode neuronally
expressed RNA binding proteins conserved in Drosophila
and humans (111). The knockdown by RNAi of a 6.7 kb
spliced and polyadenylated murine ncRNA (TUG1) that is
expressed in the retina and brain and upregulated by taurine
in developing retinal cells RNA resulted in malformed
or non-existent outer segments of transfected photoreceptors
in mice (112).
This approach has recently been extended into large-scale
screening strategies of ncRNAs. Pairs of siRNAs directed
against 512 ncRNA sequences from the RIKEN Fantom2
mouse cDNA collection (113) were used to interrogate a
battery of 12 cell-based reporter assays representing key
cellular processes and signaling pathways (114). Eight functional
ncRNAs were identified (114; J.B. Hogenesch and
P.G. Schultz, personal communication), a good rate of return
given the limited functional scope of the assays: six essential
for cell viability, one repressor of Hedgehog signaling, and
one (termed NRON) which acts as a repressor of the transcription
factor NFAT, which itself is required for T-cell
receptor-mediated immune response, and the development
of the heart, vasculature, musculature and nervous tissue.
NRON occurs as a variety of alternatively spliced transcripts
ranging from 0.8 to 3.7 kb, and interacts with 11 different
proteins, possibly as scaffolding for a complex including a
translation initiation factor, RNA helicase and proteins
involved in nucleocytoplasmic transport, proteolysis and
signal transduction (114).
The number of known functional ncRNA genes has risen
dramatically in recent years and over 800 ncRNAs (excluding
tRNAs, rRNAs and snRNAs) have been catalogued in
mammals, at least some of which are alternatively spliced
(115,116). ncRNAs have also been implicated in many diseases,
including various cancers and neurological diseases
(18,115).
There is a rapidly looming nomenclature problem for the
large number of ncRNAs (117), especially as the function
and mode of action of the vast majority are unknown, and
their complex structures and interlacing/overlapping nature
make discrete classification difficult. As a considerable fraction of eukaryotic transcripts are spliced, most approaches
used, including cDNA cloning, detect only portions of transcripts,
which often correspond to exons. Depending upon
the method used these detected sites of transcription have
been called an assortment of terms, such as ditags, CAGE
tags, transfrags and ESTs, to mention a few. In some cases,
experiments are used to connect these fragments into fulllength
or near full-length transcript structures [see e.g. (79)].
When transcripts are found to contain reduced protein-coding
potential these have also been given various names including
npcRNA (non-protein-coding RNA), utRNA (untranslated
RNA) (117) or TUF (transcript of unknown function) (77).
A structured system that may be used to catalog and refer
to ncRNAs until they can be grouped and re-classified
into recognized structural and/or functional classes is currently
being considered by the HUGO Gene Nomenclature
Committee (see http://www.gene.ucl.ac.uk/nomenclature/).


SMALL REGULATORY ncRNAs

The past few years have seen an explosion in the discovery of
small regulatory RNAs in animals and plants (8,118–120)
that, at present, largely fall into two classes: snoRNAs and
miRNAs/siRNAs.
Small nucleolar RNAs
snoRNAs generally range from 60 to 300 nucleotides in length
and guide the site-specific modification of nucleotides in
target RNAs via short regions of base-pairing. There are two
major classes, the box C/D snoRNAs which guide
20-O-ribose-methylation, and the box H/ACA snoRNAs
which guide pseudouridylation of target RNAs (36,121–
123). Initially, it was thought that the role of snoRNAs was
restricted to rRNA modification in ribosome biogenesis, but
it is now evident that they can target other RNAs, including
snRNAs and mRNAs (36,64,121–123). Most mammalian
snoRNAs come from the introns of either protein-coding or
non-coding genes (124) but apparently some human C/D
snoRNAs are independently transcribed as indicated by the
presence of methylated guanosine caps at their 50 ends
(125). Although the snoRNAs involved in ribosome biogenesis
are located in the nucleolus where this type of ncRNA
was first characterized (hence their name), a subset of
H/ACA snoRNAs is located in Cajal bodies (a class of small
nuclear organelle) and are sometimes called scaRNAs (small
Cajal body RNAs) (36). Telomerase RNA is also found in
Cajal bodies in a cell-cycle dependent manner (126,127).
At least some snoRNAs exhibit tissue-specific and developmental
regulation, and/or imprinting (101,102,128,129),
indicative of a regulatory function. There are also a number
of so-called orphan snoRNAs without known targets
(101,102,123,128,130,131). As noted earlier, one of these
snoRNAs is linked to the aberrant splicing of the serotonin
receptor 5-HT(2C)R gene in Prader–Willi syndrome patients
(64,102). It is also evident that there are many other
snoRNAs, as well as likely, other as yet functionally uncharacterized
classes of small regulatory RNAs, that have yet to
be discovered (36,132).
MicroRNAs and small interfering RNAs
miRNAs and siRNAs are short, approximately 22 nucleotides
long RNA molecules derived either from hairpin or doublestranded
RNA precursors. Details of miRNA and siRNA
biology and biochemistry can be found in a number of
recent reviews (8,133–135). miRNAs suppress translation
via non-perfect pairing with target mRNAs—usually involving
a seed pairing of just six to eight nucleotides in length
(56)—or (as with siRNAs) cause degradation of target
RNAs by the RISC complex in the case of perfect complementarity
with the target site—the phenomenon known as RNAi. It
is estimated that approximately one-third of human proteincoding
genes are controlled by miRNAs [reviewed in (119)].
In addition, siRNAs derived from repeats participate in the
establishment of silenced (heterochromatic) chromatin, as
well as in other aspects of chromosome dynamics, phenomena
best studied in yeast [for reviews see (8,136)].
miRNAs are derived from the introns and exons of both
protein-coding and non-coding transcripts that are synthesized
by RNA polymerase II (8,137,138). It has also recently been
shown that a number of mammalian miRNAs are derived
from repeats, mainly various transposons (139), which may
lead to a re-examination of the functional role of transposons,
especially since it also appears that transposon sequences can
play a significant role in the developmental processes and epigenetic
variation (140,141). Some miRNAs also appear to be
derived from processed pseudogenes (142).
The expression of many miRNAs is regulated and miRNAs
have been shown to be central to a wide range of developmental
processes, including developmental timing, cell proliferation,
left–right patterning, neuronal cell fate, apoptosis
and fat metabolism [for reviews see (8,133–135,143)], as
well as neuronal gene expression (144), brain morphogenesis
(145), muscle differentiation (146) and stem cell division
(147). Not surprisingly, therefore, alterations in the
expression, sequence or target sites for miRNAs may be a
significant but hitherto unrecognized source of human
genetic disease, including cancer. Sequence variants in the
binding site for the miRNA miR-189 in the SLITRK1 mRNA
have recently been shown to be associated with Tourette’s
syndrome (148). miRNA expression is dysregulated in
cancer cells (143,149,150) and miRNA profiling can be used
as a very accurate diagnostic tool for cancer classification
(151,152). The proto-oncogene c-Myc has been shown to
activate expression of an miRNA cluster on human chromosome
13, and two miRNAs (miR-17-5p and miR-20a) from
this cluster downregulate expression of the transcription
factor E2F1 that activates cell cycle progression (153).
Enforced expression of the same miR-17-92 miRNA cluster
has also been shown to promote tumor development (154),
as has misexpression of the Drosophila miRNA mirvana/
mir-278 (155), indicating that some miRNAs may also
function as proto-oncogenes.
Until recently, it was believed that the post-transcriptional
suppression of gene expression by miRNA in vertebrates
occurs through translation suppression directed by a nonperfect
duplex formed between miRNA and mRNA in the
30-UTR. However, in 2004, two groups described suppression
of HOX gene expression by mRNA degradation because of a

perfect match between miRNA and mRNA in 30-UTR (71,72).
Another example of mRNA degradation because of a perfect
match with a trans-acting miRNA has been reported for the
imprinted Rtl1/Peg11 locus (73). The maternally transcribed
anti-Peg11 transcript is processed into several miRNAs,
which cause RISC-mediated cleavage of paternally expressed
Rtl1/Peg11 mRNA. Interestingly, the miRNAs are complementary
to the coding region, not to the 30-UTR (73), indicating
that miRNA target sites may be located anywhere in
the transcript, and indeed in any functional transcript, not
just mRNAs. In addition, it has recently been shown that
certain miRNA precursors are edited by ADAR1 and
ADAR2, resulting in both suppression of processing by
Drosha, and degradation by Tudor-SN, which is a component
of RISC (156).
The miRBase database (http://microrna.sanger.ac.uk/) lists
over 300 experimentally verified miRNAs in human as well
as predicted miRNA target genes (157). However, many
more miRNAs have been identified computationally, with a
proportion validated post hoc (158). Most miRNA prediction
methods rely on identification of a stable stem–loop precursor
and phylogenetic conservation [see e.g. (158)]. However,
these criteria may be far too narrow. Although many of the
known miRNAs are highly conserved (and have been mainly
identified on this basis), there is no reason why they all
should be, as (as far as one can tell) these short RNAs have
no intrinsic catalytic activity and function simply by target
recognition, and thus should be able to evolve relatively
quickly by co-variation with their targets, and by positive
selection for new connections in regulatory networks underpinning
adaptive radiation. Consistent with this, the known
miRNAs appear to have many targets, thereby making
co-variation difficult, and explaining their strong conservation,
which in many cases surpasses that of protein-coding
sequences (108). A recent study that did not require substantial
evolutionary conservation identified many new human
miRNAs, a significant number of which appear to be primatespecific
(159).
The number of predicted human miRNAs is rising rapidly
(8,135,159). Sensitive genetic screens in C. elegans have
also identified rare miRNAs with limited evolutionary
conservation such as lys-6 which is required for left–right
neuronal patterning (160), suggesting that many miRNAs
may be cell-type specific and that many more remain to be
found.

BIOLOGICAL ROLES OF ncRNAs

As outlined earlier, ncRNAs are already known to fulfill a
wide range of functions, including the control of chromosome
dynamics, splicing, RNA editing, translational inhibition and
mRNA destruction. It is obvious that we have only begun to
explore the true extent of RNA regulation of these processes.
It also appears that RNA may play a role in virtually all levels
of gene regulation in eukaryotes.
A range of evidence suggests that RNA signaling underpins
chromatin remodeling and epigenetic memory, although the
mechanisms are unknown, and the matter is not without controversy
[for reviews and discussion see (8,18,35,161–163)].
There is evidence that transcription from upstream regions
can affect the expression of the adjacent gene, either by promoter
interference (164) or by altering chromatin structure
(165–167), leading to the hypothesis that it is the act of transcription
which is responsible for the regulatory effects, and
that the transcript itself (an ncRNA) is just a by-product
(168). However, it is hard to imagine how transcription per
se could convey sufficient information to account for the
precise and quite complex changes in histone modification
and chromatin remodeling that are observed at most loci.
Indeed, there are only a limited number of chromatinmodifying
enzymes in animals, suggesting that these
enzymes must be targeted to their sites of action, which vary
at thousands of loci around the genome during differentiation
and development, by another level of sequence-specific
signals, most logically RNA. In agreement with this prediction
small RNAs have been shown to induce transcriptional silencing
and alterations to DNA methylation in human cells
(169,170).
There are also good reasons to expect that splicing is regulated,
at least in part, by trans-acting RNAs that guide splice
site selection (8,18,64,171) or modify sequences around
splice sites to render them accessible or otherwise to the
splicing machinery (64).
Evidence is also emerging that transcription itself may be
regulated by ncRNAs (18,163). As noted earlier, RNA polymerase
II itself appears to be regulated in part by ncRNA signaling
(30–34). A ncRNA has been reported to be required for
the repression of RNA polymerase II-dependent transcription
in primordial germ cells in Drosophila (172). At least some
transcription factors (and chromatin-modifying proteins)
appear to have affinity for structures involving RNA
(173–179). A small double-stranded RNA termed NRSE activates
transcription of neuron-specific genes (180) and short
artificial RNAs have been shown to inhibit transcription of targeted
genes in the absence of concomitant DNA methylation,
with considerable potential for therapeutic use (181,182). An
interesting case is the steroid receptor RNA activator (SRA)
which was originally described as functional non-coding
RNA involved in the regulation of gene expression by
steroid hormones (183). The gene produces several transcripts
of which one encodes a protein (184) and both the ncRNA and
its encoded protein affects the activity of estrogen receptor in
breast cancer cells (185). Recently, it was shown that pseudouridine
synthase mPus1p (an enzyme that converts uridine to
pseudouridine in RNA) is a coactivator for the retinoic acid
receptor, which acts by pseudouridinilation of SRA RNA
(186). In addition, the thyroid hormone receptor has an
RNA-binding domain which binds SRA, and the binding
enhances expression of reporter genes (187).
ncRNAs also play a role in stress responses. The small noncoding
transcript B2 is produced by RNA polymerase III from
murine short interspersed elements (SINE) under heat shock.
The B2 RNA binds to RNA polymerase II and represses
transcription after heat shock (188,189). In primates, RNA
polymerase III also produces the brain-specific Alu-derived
transcript BC200 (190). Non-coding repetitive RNAs are
also transcribed in stressed human cells and are localized in
‘nuclear stress bodies’ that are assembled on specific pericentromeric
heterochromatic domains that change their epigenetic
Human Molecular Genetics, 2006, Vol. 15, Review Issue 1 R23
status from heterochromatin to euchromatin in response to
stress (191). The non-coding RNA omega is among few
heat-shock-inducible genes in Drosophila (192), and although
its exact role is unknown, it binds to a number of RNAbinding
proteins involved in processing of nuclear RNA
(hnRNPs complexes) (193).
ncRNAs may also act as scaffolding for the assembly of
macromolecular complexes. Examples include rRNA in ribosomes,
the 7SL RNA in the SRP (40), and possibly RNAs
involved in the assembly of chromatin complexes (35), as
well as NRON, recently shown to interact with a number of
proteins involved in nuclear transcription factor trafficking
(114).


INTRONS AS A SOURCE OF FUNCTIONAL
ncRNAs

Introns account for at least 30% of the human genome and may
be a significant, perhaps major, source of regulatory ncRNAs
(17,87), produced in parallel with protein-coding sequences
(and others) as efference signals to convey regulatory information
to other genes and transcripts (16,1. Almost all
snoRNAs and a large proportion of miRNAs in animals are
encoded in introns (138,194–196), located in both proteincoding
and non-protein-coding genes [for review see (].
Although introns are thought to be simply degraded after
being excised from primary transcripts, there is good evidence
that intronic RNAs may actually be processed to smaller RNAs
(which were not anticipated or detected when introns were first
studied) with significant half-lives and specific subcellular
locations (197,198). Recently, it was shown that ectopic
expression of intronic sequences derived from the CFTR
gene causes specific changes in transcription of various
genes in HeLa cells (199). Interestingly, each of the three
intron sequences tested resulted in a distinctive pattern of
effects on specific subsets of genes (199). The idea that
introns may be a rich source of regulatory information is
consistent with the fact that the density of introns scales with
developmental complexity (87), and many highly conserved
sequences, including ultraconserved sequences, are found in
introns (69,200–203). However, at present, it is simply not
known what proportion of transcribed introns are subsequently
processed into smaller functional RNAs, although many
intronic sequences are detected in whole genome tiling
array analyses of human transcription (77).


CONCLUSION

We may have fundamentally misunderstood the nature of
genetic programming in the higher organisms. It appears that
the human genome and those of other complex organisms
express an enormous repertoire of ncRNAs, and that their
cells are awash with these RNAs, which constitute a hidden
layer of molecular genetic signals. Although the functions of
these RNAs are likely to be many and varied, both logic and
evidence strongly suggest that their main role is to regulate
and direct the complex pathways of developmental ontogeny,
which must require enormous amounts of information in an
organism as precisely sculptured as a human (13).
The existence of a sophisticated RNA-based regulatory
system would also largely explain the paradox of the tremendous
diversity of characteristics observed among mammals
and other complex organisms, despite the relative commonality
of their proteomes. That such RNAs have remained hidden
from view for so long appears to have been a consequence of
their sheer numbers and population complexity which makes
biochemical detection of individual sequences difficult, combined
with the subtlety of their genetic signatures. Indeed,
with few exceptions, until recently most known ncRNAs
were those that are present in relatively large amounts, such
as rRNAs, tRNAs and the common snoRNAs and snRNAs,
and it has only been the combination of sensitive genetic
screens (such as those that first identified miRNAs),
large-scale cDNA and whole genome sequencing, new sensitive
analytical methods (such as RT–PCR and genome tiling
arrays) and bioinformatics, based on clues from known
examples, that has begun to reveal the true complexity of
what lies under the surface.
It is also evident that many ncRNAs, including those of
demonstrated functionality like Xist, are evolving quickly
(108). This rapid evolution has been considered as evidence
of lack of functionality (204). This may be incorrect, and
these sequences may in fact be simply able to drift easily
because of different constraints and/or be subject to positive
selection related to phenotypic variation. Recent analyses of
the Drosophila genome have indicated that, contrary to longheld
expectation, a large fraction of the non-coding sequence
is functionally important and subject to various levels of
purifying selection and adaptive evolution (205).
The extent of non-coding sequence conservation in
mammals is also much higher than that of protein-coding
sequences (202,206), perhaps as high as 10% by some estimates
(207). This conservation includes ultraconserved
sequences (69) and long transposon-free regions that have
remained refractory to transposon insertions throughout mammalian
evolution (208), observations which are difficult to
reconcile with orthodox protein-based conceptions of gene
regulation. As noted earlier, there is increasing evidence that
transposon-derived sequences may also contribute to mammalian
genetic activity. Indeed, it may be that much, if not
most of the sequences comprising the human genome are
functional, albeit having arrived at different times in our
evolutionary history and be evolving at different rates.
The problem has been compounded by the fact that most
mutations in regulatory sequences may be both subtle and
difficult to track, particularly given the expectational and
practical bias to date in genome scanning projects on exonic
lamp-posts of protein-coding genes, and the fact that the relevant
mutations may be quite distal to these lamp-posts, hidden in the
dark of the vast tracts of intergenic and intronic sequences. The
mutations underlying the callipyge (‘beautiful bottom’) phenotype
in sheep and the enhanced muscling of domestic pigs,
which are single base substitutions within non-coding sequences
(a long intergenic sequence of unknown transcriptional status in
the DLK1-GTL2 imprinted region, and the third intron of the
IGF2 gene, respectively), the identification of which involved
tour-de-force analyses in well structured pedigrees (209–211).
It is clear that different types of genetic information will
be subject to different structure–function relationships and
R24 Human Molecular Genetics, 2006, Vol. 15, Review Issue 1
therefore different constraints on their variation related to their
role and the number of interacting partners. We predict that
mutations/variations in many if not most ncRNA sequences,
especially those that are involved in regulatory networks,
will lead to a variety of milder phenotypes than the usually
severe consequences of mutations in proteins, and will have
a major influence on quantitative trait variation, developmental
differences and abnormalities, cancer and other complex
diseases such as neurological disorders.
The functional genomics of ncRNAs will be a daunting
task, an equal or greater challenge than that we already face
in working out the biochemical functions and biological
roles of all of the known and predicted proteins and their isoforms
(212). Bioinformatics will be key, as it should be possible
to use sequence homology (albeit in small patches, and
obeying a rather broader set of rules than simply Watson–
Crick DNA base pairing) to identify transmitters and their
receivers in RNA regulatory networks, as is already the case
for miRNAs. This also means that it should be possible to
develop generic approaches, applicable to any regulatory
RNA or its target, to intersect and modulate gene activity at
various levels for therapeutic purposes, which may revolutionize
the pharmaceutical industry. The advent of large-scale
whole genome (re-)sequencing, which is at an advanced
stage of development (213,214), while creating enormous
informatic challenges, will soon also provide the density of
genomic data required to identify sequences directly associated
with different characteristics in structured populations,
without assumptions about the genomic position of these
sequences or their mode of action.
 

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#2
Large numbers of noncoding RNA transcripts (ncRNAs) are being revealed by complementary
DNA cloning and genome tiling array studies in animals. The big and as yet largely
unanswered question is whether these transcripts are relevant. A paper by Willingham
et al. shows the way forward by developing a strategy for large-scale functional screening
of ncRNAs, involving small interfering RNA knockdowns in cell-based screens, which
identified a previously unidentified ncRNA repressor of the transcription factor NFAT. It
appears likely that ncRNAs constitute a critical hidden layer of gene regulation in complex
organisms, the understanding of which requires new approaches in functional genomics.

Recent large-scale studies of the human and
mouse transcriptomes have used both cDNA
cloning approaches (1–3) and the interrogation
of genome tiling arrays (4–6). The surprising
but consistent finding of these studies
has been that a huge number of observed
transcripts—about half of the total—do not
appear to encode proteins. Many of these
transcripts appear to be developmentally regulated
(1, 4), and similar findings have been
reported in Drosophila
(7). The big and as yet
largely unanswered question
is whether these noncoding
RNAs (ncRNAs)
are meaningful or simply
represent Btranscriptional
noise[ (Fig. 1). A study by
Schultz, Hogenesch, and
colleagues ( begins to
answer this question by developing
a strategy for largescale
functional screening
of ncRNAs.
Willingham et al. (
selected 512 ncRNA sequences
from the RIKEN
Fantom2 mouse cDNA collection
(1, 9) that showed
significant conservation
with human genomic sequences
and constructed
small interfering RNAs
(siRNAs) (two each, expressed
as short hairpin
RNAs) against the human
orthologs of these sequences.
These siRNAs
were then used to interrogate a battery of 12
cell-based assays representing key cellular
processes and signaling pathways, with the use
of reporter assays in microtiter plates (10).
They identified eight functional ncRNAs: six
essential for cell viability, one repressor of
Hedgehog signaling, and one (termed NRON)
that acts as a repressor of the transcription
factor NFAT, which is itself required for Tcell
receptor–mediated immune response and
the development of the heart, vasculature,
musculature, and nervous tissue.
Detailed analysis of NRON showed that
this ncRNA, which has two blocks of nearperfect
conservation between humans and
mice but no substantial open reading frame,
is enriched in placenta, muscle, and lymphoid
tissues and exhibits a distinct tissue-specific
distribution of splice variants, suggesting
subtle but biologically relevant differences
in its function in different tissues (. By
tagging NRON with an RNA hairpin that is
bound by the MS2 phage protein, followed
by affinity chromatography of whole-cell
extracts, the authors showed that NRON

interacts directly or indirectly with 11 proteins,
including three members of the importin-beta
superfamily, which mediate the nucleocytoplasmic
transport of cargoes such as
NFAT. siRNA knockdown of four of these
proteins (including importin–beta 1) activated
NFAT activity, whereas overexpression of
these proteins repressed NFAT activity, as
did siRNAs directed against NRON. Moreover,
binding and ribonuclease protection
experiments supported a direct association of
NRON with importin–beta 1, which itself is
known to associate with some of the other
proteins that were identified as interacting with
NRON (.
These observations suggest that NRON
may act as a modulator of NFAT nuclear trafficking,
probably by regulating its subcellular
location, a conclusion supported by the observation
that NFAT nuclear
localization is increased
when the level of NRON
is reduced by siRNA (.
The broader conclusion is
that NRON may act as a
scaffold for the assembly
of protein complexes that
regulate nuclear trafficking
of this and probably other
important transcription factors,
opening a new dimension
of organizational
control in cell biology and
development.
This elegant study not
only points the way ahead
but also illustrates the magnitude
of the task that is in
front of us, which may be
an equal or greater challenge
than that we already
face in working out the
biochemical function and
biological role of all of
the known and predicted
proteins and their isoforms.
The cDNA and genome
tiling array studies have indicated not only that
there are tens of thousands of ncRNA transcripts
(both polyadenylated and nonpolyadenylated)
expressed from the mammalian
genome in different cells and tissues but also
that these transcripts comprise a complex
interlaced and overlapping network from both
strands, whereby even a single nucleotide may
be part of multiple differently processed transcripts
(2, 3, 6, 11, 12).

Ascribing function to these ncRNAs will
not be simple, nor occur quickly, given that
this will require in vivo and in vitro assays, the
interpretation of which will be compromised
by ambiguity in the former (for example in
discriminating between mutations that affect
cis-acting regulatory sequences and those that
affect functional trans-acting RNAs) and in both
cases by the ability to detect a phenotype when
the expression of targeted ncRNA sequences
is altered by siRNA-mediated knockdown or
ectopic expression. Only 8 of 512 ncRNAs
showed function in the assays undertaken by
Willingham et al. (, although this is not a
bad rate of return given the limited scope of
these assays. Nonetheless, these initial findings
will have a big impact, because they reveal the
involvement of hitherto unsuspected ncRNAs
in already intensively studied pathways such
as Hedgehog signaling and nuclear trafficking.
Notably, genome tiling array studies have also
revealed unknown transcript and splice variants
of sonic hedgehog (11), indicating just how
much remains to be done.
The selection of phenotypic assays may be
guided by other studies, such as the analysis of
the patterns of expression and the subcellular
location of the ncRNAs under analysis, as is
already routinely done for proteins with unknown
functions. Indeed, most would regard
tissue-specific expression as a reasonable prima
facie indicator of function. On the other hand,
faced with the uncomfortable implications of
large numbers of such RNAs and the evidence
that many are expressed only at low levels,
others may suggest that these RNAs are merely
transcriptional noise from illegitimate promoters,
which may be variable in different cells,
because of, for example, different chromatin
architectures, although it also seems likely that
chromatin architecture is itself controlled by
RNA signaling (13, 14).
Notably, evolutionary conservation may not
be a reliable signature of functional ncRNAs.
The ncRNAs selected by Willingham et al.
were those that were most highly conserved
between humans and mouse, a reasonable filter
given that conservation is normally a good
indicator of function. However, the reverse—
i.e., that lack of conservation indicates lack of
function—is not necessarily true. Sequence
conservation is normally mandated by the
preservation of structure-function relationships
(as in proteins) and/or multilateral interactions
(as in ribosomal RNA). If many of these
newly discovered ncRNAs are regulatory, as
one might reasonably suppose them to be,
they may have quite different evolutionary
constraints. Many microRNAs (miRNAs)—
small 20- to 25-nucleotide RNAs that control
many aspects of plant and animal development
by sequence-specific interactions with
other RNAs—are highly conserved (and have
been mainly identified on this basis), but these
appear to be central regulators that have many
targets (making covariation difficult) and there
are likely to be many more that are not so
constrained (13).
This possibility is supported by a recent
study that did not require substantial evolutionary
conservation and (thereby) identified
many new human miRNAs, a significant
number of which appear to be primate specific
(15). The number of known human miRNAs
stands at well over 1500 and is rising rapidly
(13, 15, 16). Sensitive genetic screens in Caenorhabditis
elegans have also identified rare
miRNAs with limited evolutionary conservation
such as lys-6, which is required for leftright
neuronal patterning, suggesting that many
more remain to be found (17). Moreover, a
number of well-studied ncRNAs are poorly
conserved, such as XIST, which controls Xchromosome
inactivation in mammals, and
Air, a ncRNA of over 100 kb that is involved
in imprinting of the Igf2r locus in mouse
(18, 19). All of these considerations suggest
that many ncRNAs are evolving quickly (by
drift under mild negative selection or under
positive selection for the rewiring of regulatory
circuitry in phenotypic radiation) and that
those that have been identified (or prioritized
for study) on the basis of evolutionary conservation
are probably just the tip of a very
large iceberg. Nonetheless, there is considerable
scope for using more sophisticated bioinformatic
approaches, including intragenomic
sequence matching.
It is also clear that the majority of the
genomes of animals is indeed transcribed (12),
which suggests that these genomes are either
replete with largely useless transcription or
that these noncoding RNA sequences are
fulfilling a wide range of unexpected functions
in eukaryotic biology. These sequences include
introns (Fig. 1), which account for at
least 30% of the human genome but have been
largely overlooked because they have been
assumed to be simply degraded after splicing.
However, it has been shown that many
miRNAs and all known small nucleolar RNAs
in animals are sourced from introns (of both
protein-coding and noncoding transcripts)
(13), and it is simply not known what proportion
of the transcribed introns are subsequently
processed into smaller functional
RNAs. It is possible, and logically plausible,
that these sequences are also a major source of
regulatory RNAs in complex organisms (20).
The studies of Willingham et al. and
others that have begun to explore the underworld
of RNA in eukaryotes raise more
questions than they answer. That complex
organisms have complex genetic programming
should come as no surprise. That much of this
programming may be transacted by noncoding
RNAs may be. However, given the sheer extent
of noncoding RNA transcription, it seems
more and more likely that a large portion of the
human genome may be functional by means of
RNA. This also means that we may have
seriously misunderstood the nature of genetic
programming in the higher organisms (21) by
assuming that most genetic information is
expressed as and transacted by proteins, as it
largely is in prokaryotes (22). If so, there is a
long road ahead in functional genomics.
 

I AM

Some Random Asshole
Apr 25, 2002
21,002
86
48
#3
I read the first two-three paragraphs, and when I saw the extra 8 pages I decided to stop....that's way too much shit to read...not to mention it looks all fucked up cause of the format....nice sentence and then one word, then a sentence on another line, then another word...that's almost as bad as PeOpLe WhO TyPe LiKe ThIs..
 

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#4
I AM said:
I read the first two-three paragraphs, and when I saw the extra 8 pages I decided to stop....that's way too much shit to read...not to mention it looks all fucked up cause of the format....nice sentence and then one word, then a sentence on another line, then another word...that's almost as bad as PeOpLe WhO TyPe LiKe ThIs..
Sentences are OK, it's just that copying from PDFs doesn't work that great
 

I AM

Some Random Asshole
Apr 25, 2002
21,002
86
48
#5
I figured it was a copy and paste deal, wasn't tryin to say I think you're stupid....

That shit is just too long...maybe when I get a lump of free time I'll read the rest of it.
 

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#6
the basic idea is that we have much more RNAs than proteins and RNAs were probably the much more important component the evolution of mammals

and the genome has much more than 2% meaningful sequences but maybe even more than 100% when transcription from both strands is accounted for
 

I AM

Some Random Asshole
Apr 25, 2002
21,002
86
48
#7
You know, that first sentence is what I got from the first two paragraphs....lol...

and of course they'll find out that other things are important as they research...it'll be interesting to see what they find out...if we're even alive by the time it's all "done."
 

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#8
I AM said:
You know, that first sentence is what I got from the first two paragraphs....lol...

and of course they'll find out that other things are important as they research...it'll be interesting to see what they find out...if we're even alive by the time it's all "done."
yeah, I was very upset the last few years that there won't be anything really big left for my generation to discover after the genome was seqeunced and all that; well, there's no reason to worry, there are a lot of things to learn and a lot of work to do :)
 

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#10
nhojsmith said:
so this is evidence that the idea that we are all 99.9% similar may simply be complete PC bullshit
99.9% similar to what?

me and you differ at only 1 base pair on every 1300bp in the genome so we are more than 99.99% similar...

these articles have nothing to do with SNPs and CNVs and variation, they are about RNA
 

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#14
my thoughts - the thread has nothing to do with races

when they say The existence of a sophisticated RNA-based regulatory system would also largely explain the paradox of the tremendous diversity of characteristics observed among mammals and other complex organisms, despite the relative commonality of their proteomes they mean the commonality between mammalian species, not between individual humans

nobody studying these things really cares about interindividual variation; well, not really, because SNPs and CNVs are closely related to disease, but the thread was about the bigger picture - the genome, ncRNAs and evolution
 

ThaG

Sicc OG
Jun 30, 2005
9,597
1,687
113
#15
about races:

do we have a scientific evidence supporting any difference in intelectual abilities between races (becasue I guess that's what you're aiming at) - no, because we don't understand the genetic bases of inteligence yet

this will change in the next 10 years with SNP chips and 454 massive parallel sequencing which will hopefully finally allow global analysis of transcriptional and regulatory network in single cells in the brain but right now we have no idea which genes are involved in determinig how smart an individual will be

what we know though is that the environment is a much mroe important factor