Estimate secondary structure and base pairing posteriors for a given RNA sequence from high-throughput structure-sensitive sequencing data
HiPR (High-throughput Probabilistic RNA structure inference) algorithm predicts RNA secondary structure and base-pairing probabilities using experimental data from high-throughput structure probing assays and is based on the Bayesian Markov Chain Monte Carlo (MCMC) method.
Distinctive key features:
Supported inputs and protocols:
Deliverables:
HiPR software is freely available for download and use:
HIPR GitHub repository
Latest release: v1.1 30 April 2018
This data has been used for testing HiPR and other methods in the manuscript (manuscript in submission).
Raw sequencing data (DMS-Seq) GSE45803 [Rouskin et al. 2014]
Raw sequencing data (DMS-MaPseq) GSE84537 [Zubradt et al. 2016]
DMS-seq mapped data (human): in vivo DMS-seq BAM (H. sapiens) K562 GSM1297493 Rep 1-2 [40 GB]
DMS-seq mapped data (yeast): in vivo DMS-seq BAM (S. cerevisiae) Rep1-4 [18 GB]
DMS-MaPseq mapped data (human): in vivo DMS-MaPseq BAM (H. sapiens) HEK 293T Rep1-2 [6 GB]
Reference structures used in the manuscript for comparing HiPR and other structure prediction methods
Rfam structures [Nawrocki et al 2015]
Validated structures (H. sapiens) [Rouskin et al, 2014]
Validated structures (S. cerevisiae) [Rouskin et al, 2014]
RNA secondary structures predicted by HiPR and other methods
Predicted structures
Rfam structures were used as reference for structure accuracy computation.
HiPR.sh - High-throughput Probabilistic inference of RNA secondary structures.
HiPR.sh reads_file rates_file structure_file
[-outDir OutputDir] [-locusName LocusName] [-n Niter]
[-numCPU Ncpus] [-rmin MinReadLength] [-rmax MaxReadLength]
Mandatory arguments (need to be specified before optional arguments)
reads_file -- File containing DMS-seq reads for RNA locus of interest (collapsed read format)
rates_file -- File containing the initial estimates of per-nucleotide modification rates
structure_file -- File containing the sequence and initial secondary structure of an RNA of interest
Recognized optional command line arguments (need to be specified after mandatory arguments)
-outDir <string> -- Set name of output folder (default=HiPR_output)
-locusName <string> -- Set name of locus (default=UnnamedLocus)
-n <integer> -- Set maximum number of MCMC iterations (default=100000)
-numCPU <integer> -- Set number of CPUs to use (default=16)
-rmin <integer> -- Set mininum read length (default=15)
-rmax <integer> -- Set maximum read length (defauld=40)
Estimate secondary structure and base pairing posteriors for a given RNA sequence based on the distribution of read fragments along the locus.
This program requires a file containing the sequence and initial secondary structure of an RNA of interest, a file containing DMS-seq reads, and a file containing the initial estimates of per-nucleotide modification rates. A Bayesian MCMC algorithm is then used to estimate the base pairing posterior that best fits the observed sequencing reads. The results and intermediate files are written to a directory (HiPR_output/ by default).
The output file HiPR_posterior.txt contains the base pairing posteriors at each nucleotide position, one entry per line.
The output file HiPR_structure.txt contains the consensus secondary structure.
Mandatory files must be specified before any other optional arguments and must exist, otherwise a error or usage message will be shown.
MIT License https://opensource.org/licenses/MIT
Copyright (c) 2016 University of Pennsylvania
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Pavel Kuksa <pkuksa@upenn.edu> Fan Li <fanli.gcb@gmail.com> Li-San Wang <lswang@upenn.edu>
HiPR software is developed by Wang lab members, Penn Neurodegeneration Genomics Center.
Comments or Questions: HIPR@lisanwanglab.org