Welcome to Structure Surfer!
Structure Surfer is a browsable database of RNA secondary structure data. Check out the FAQ for tips on searching for structures and interpreting the results, or just dive right in.
You can browse by coordinate in RNA Structure tab. We also recommend a data aggregation approach if you have many features with similar structure (i.e. predicted hairpins, protein binding sites, etc.)
To get started, click on the 'RNA Structure' tab, or 'Aggregate Mode' tab. Sample bed files can be found in the Downloads section.
Enter the genomic coordinates for your feature of interest and select the organism.
Set the search genome
Search By Transcript
Enter the transcript ID for a gene of interest from TAIR10, hg19, or mm10 (e.g. AT3G61897.1, uc002clg.2, NM_001080397, ENSMUST00000102181). The structure scores are given for the spliced transcript.
Set the search genome
Aggregate Structure Search
Upload a bed file with the regions you wish to scan. The intervals must all be of the same length.
Set the search genome
Usage Tip - data aggregation is a useful way to learn structural trends from sparse data when you have a list features with similar structure (e.g. splice junctions, mRNA hairpins, protein binding sites).
Inputs not defined.
Structure Surfer currently contains data from four experiment types:
- icSHAPE - in vivoclick selective 2′-hydroxyl acylation and profiling experiment (mouse)
- DMS-Seq - Dimethyl Sulphate Sequencing (human)
- PARS - Parallel Analysis of RNA Structure (human)
- ds/ssRNA-Seq - Double Stranded and Single Stranded RNA Sequencing (human)
All of the experirments use next generation sequencing technology to uncover RNA secondary structure, but the strategies are different in some important ways.
Two of the methods use ribonucleases (RNases) to cut RNA in way that is sensitive to structure. One RNAse prefers to cut structured regions of RNA and another prefers to cut unstructured regions. Investigators learn about the secondary structure of RNA by comparing the cutting patterns of the two.
Both PARS and ds/ssRNA-Seq both use a cut and compare approach, but with one major difference. In a PARS experiment, the cutting is very light creating nicks in the RNA. Each nick represents a position that was cut by one enzyme or the other. The structural preference of the enzyme that did the cutting provides evidence of base pairing status at that position.
In a ds/ssRNA-Seq experiment, the RNA is digested more thoroughly. Instead of nicking the RNA, the enzymes are allowed extensively digest their preffered regions. This leaves one set of sequences highly enriched for structured RNA and another set that is highly enriched for unstructured RNA. Short RNA sequences that occur more often in one group than the other reveal their structural state.
The other two methods, DMS and icSHAPE, use small molecules instead of enzymes. The strategy is to chemically label unpaired nucleotides while nucleotides that are base paired are protected. During RNA-Sequencing the labels create abrupt stops in the sequence. A sequencing stop at a particular position is evidence that that position was unpaired.
The difference between the two methods is the label. DMS uses dimthyl sulfate to label Adenines and Cytosines. icSHAPE uses an alkylating agent to mark the RNA's ribose backbone. It's important to note that icSHAPE was done in mouse cells while all of the other sets are in human.
Each of the techniques has its own scoring method.
DMS-Seq DMS labeling of a nucleotide causes RT to stall during the cDNA synthesis step of RNA-seq. Unstructured nucleotides, those not involved in base pairing, are more highly reactive with DMS and thus more likely to be the site of such a stall. The resulting RNA-Seq reads have 5' ends corresponding to the reactive, unpaired position. However, DMS labeling is not the only possible explanation for position with a high tendency to cause RT stalls. For this reason DMS-Seq Scores are expressed as nucleotide reactivity compared to a denatured, maximally reactive control. The signal at each position is calculated based on the normalized number of 5' read ends mapping to that position in the native library compared to the control (Rouskin et al., 2014).
reactivity = (base 5' coverage/max base 5' coverage)/(control base 5' coverage/max control base 5' coverage)
icSHAPE As with DMS-Seq, icSHAPE scoring is based on the higher reactivity of unpaired nucleotides compared to nucleotides involved in pairing. Reactivity is calculated from the count of 5' read ends covering each position. These counts are normalized to counts from a no-reagent background library and adjusted according to a background base density (Spitale et al., 2015).
reactivity = (5' end coverage – control 5' end coverage)/(control background density)
PARS PARS scores reflect the differential accessibility of paired and unpaired regions to ribonucleases. Unpaired regions are more accessible to RNAse S1 while paired regions are more accessible to RNAse V1. Each enzyme cleaves RNA in its preferred region creating RNA fragments with 5' hydroxyl groups. The ends are directly ligated onto sequencing primers. After cDNA synthesis and sequencing, each read has a 5' ends corresponding to a cleavage site. Scores were calculated from the count of 5' read ends covering each position in the two nuclease treated libraries. Each score is the generalized log ratio of the two coverage scores. A 5 nt rolling average is applied for smoothing. Positions with no coverage in either library were omitted (Wan et al., 2014).
structure_score = glog(5' end RNAese V1 coverage / 5' end RNAase S1 coverage)
ds/ssRNA-Seq Unlike scores from the other methods, ds/ssRNA-Seq scoring take into account all positions from each read rather than the 5' end coverage only. It employs the same reagents as PARS, but uses a longer enzyme treatment resulting in more complete digestion of each enzyme's preferred structure type. After cDNA synthesis and sequencing, reads represent regions that were protected from structure specific digestion. For each position the score is the generalized log ratio of the normalized counts in the two libraries (Li et al., 2012)
The databases plotting tool is implemented using the Python package PyGal. For plotting purposes, scores are scaled and re-centered to reveal local structural patterns and to make the datasets visually comparable. For the same reason, DMS and icSHAPE scores, which represent nucleotide reactivity as opposed to degree of structure, are inverted when displayed such that high scores indicate evidence of paired nucleotides in a all datasets. Raw scores are available for download alongside the plots.
Spitale, R.C., Flynn, R.A., Zhang, Q.C., Crisalli, P., Lee, B., Jung, J.-W., Kuchelmeister, H.Y., Batista, P.J., Torre, E.A., Kool, E.T., et al. (2015). Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486–490.
Rouskin, S., Zubradt, M., Washietl, S., Kellis, M., and Weissman, J.S. (2014). Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705.
Wan, Y., Qu, K., Zhang, Q.C., Flynn, R.A., Manor, O., Ouyang, Z., Zhang, J., Spitale, R.C., Snyder, M.P., Segal, E., et al. (2014). Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–70.
ds/ssRNA-Seq and the Structure Surfer Database Berkowitz N.D., Silverman I.M., Childress D.M., Kazan H., Wang L.S., et al. A comprehensive database of high-throughput sequencing-based RNA secondary structure probing data (Structure Surfer). BMC Bioinformatics. (2016) May 17;17(1):215.
- Why is my search returning "No Data?"
- My search is taking a long time. Did something crash?
- Why doesn't the graph match the raw scores?
- I'm getting an error when I try to load a BED file
- How do I interpret the graph? What do peaks and troughs mean?
- Why do some of the experiments have peaks where others have troughs?
- How can I see what scores look like on structure model?
- Can I download the database and work locally?
You found a region where there's no structure data. All of the techniques require a fair amount of sequencing depth to calculate scores, so regions that aren't well expressed often have sparse or missing scores. This is especially true outside of the exome.
If your coordinates are inside and exon and you still aren't finding anything, you might consider a data aggregation approach (Click the Aggregate Mode tab).
This is probably normal. Some queries will take a few minutes. Transcript searches can be slow for long transcripts. Also, when using aggregation mode, bedfiles with large intervals will take some time.
Data on the graph are transformed in two ways: They are scaled so that all datasets are in the same range. This is similar to R's scaling function. Second, the icSHAPE and DMS scores are inverted in the plot so that the y-axis has the same meaning across all the data.
Make sure your bed file is well formed. The first three columns must be chromosome, start, and end. As an additional requirement, all of the intervals in your bed file must be the same length.
A peak is evidence for higher secondary structure. This means more hydrogens bonds and more double strandedness. A trough means the opposite, fewer hydrogen bonds and less secondary structure. Keep in mind that it's all relative. A peak may simply indicate a region of very high structure within a slighly less structured region.
The different methods use different reagents, which probably interact with structural elements pretty differently. In some cases they form a consensus, but there are features which are harder to interpret.
There's a great visualization tool called SAVoR that does exactly that! It works with a few different in silico structure prediction models. You can also input your own structure if you have one.
RBP Motif Structure - Plots of the local structural environment around RNA binding protein motif sites in the human and mouse genomes
test.bed - Example Bed File (human)
PennBox/structure_surfer Download the entire database as a mySQL dump file from PennBox.
github.com/nberkow/StructureSurfer Python scripts and database schema at Github.