Welcome to NORA

A tool for transcript quantification where accuracy matters

NORA: A tool for transcript quantification where accuracy matters

Meet Nora, a tool for transcript quantification with exceptional accuracy. Nora outputs accurate read alignments in bam format, while time and memory complexity are similar to the pseudo-alignment approaches. Nora is much more accurate than its predecessor Hera, which has obtained the top ranking in the latest round of SMC-RNA DREAM challenge.

Nora source code is written in C using the linux kernel coding style. The package can be installed on most MacOS, Windows, and Linux distributions without dependencies. Nora binary package is freely available at nora.bioturing.com.

TRY NOW

Be the first ones to hear about Nora's release!

Benchmark Data

We benchmarked Nora and other transcript quantification tools (Hera, Kallisto, Salmon, RSEM+Bowtie2) using 20 simulated data sets generated from Kallisto paper (using this script), and the most recent benchmark data from SMC-RNA DREAM challenge.

Benchmark results

References and Annotations

For SMC-RNA DREAM Challenge data, we use:

Genome reference: GRCh37.75

ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna_sm.primary_assembly.fa.gz

Gene Annotation: GRCh37.75

ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz

For Kallisto simulations, we use:

Genome reference: GRCh38

ftp://ftp.ensembl.org/pub/release-80/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

Gene Annotation: GRCh38.80

ftp://ftp.ensembl.org/pub/release-80/gtf/homo_sapiens/Homo_sapiens.GRCh38.80.gtf.gz

The transcriptome fasta file is generated using RSEM script given the above genome and gene annotation as input.

Log-pearson: pearson correlation between log-transformed tpm values with offset 0.01.

MAE(asinh): mean absolute error of asinh-transformed tpm value (filtered out transcripts with zero tpm value in ground truth and predicted value)

False positive: the number of unexpressed transcripts but predicted to be expressed by the program

False negative: the number of expressed transcripts but predicted to be unexpressed by the program

Max false neg: the maximum tpm value of the transcripts but predicted to be unexpressed by the program

Max false pos: the maximum predicted tpm value of the unexpressed transcripts

Real time: is wall clock time - time from start to finish of the call (in seconds).

Note that both ground truth and predicted tpm value is rounded to 2 decimal digits, highlighted numbers in the table are choosen as the best number with epsilon = 0.005.

Machine specs

Spec 1 - for SMC simulated samples

CPU: 40 cores Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz RAM: 96 GiB DDR3 1866 MHz

Spec 2 - for 20 samples simulated from Kallisto paper’s script.

CPU: 32 cores Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz RAM: 64 GiB DDR3 1333 MHz

Tools used

Nora (coming soon…)

Run with 32 CPU cores…

Bowtie 2 + RSEM

Bowtie 2 (version 2.3.4.1) (Home page)

RSEM (version 1.3.0) (Home page)

Run command:


        
            Index: $BIN/rsem-prepare-reference                          \
                        –gtf                       $GENE_GTF      \
                        –bowtie2 –bowtie2-path    $BIN      \
                        -p                          32                  \
                        $GENOME_FASTA                                   \
                        $INDEX_DIR/rsem/genome
        
    

        
            Quant: $BIN/rsem-calculate-expression               \
                    --bowtie2 --bowtie2-path    $BIN            \
                    -p                          32              \
                    --paired-end                                \
                    $READS_1 $READS_2                           \
                    $INDEX_DIR/rsem/genome                      \
                    $OUTPUT_DIR/rsem/result
        
    

Kallisto (Home page)

Version 0.44.0

Run command:


        
            Index: $BIN/kallisto index -i $INDEX_DIR/kallisto $RSEM_TRANSCRIPT
        
    

        
            Quant: $BIN/kallisto quant  -i          $INDEX_DIR/kallisto     \
                                        -o          $OUTPUT_DIR/kallisto    \
                                        -t          32                      \
                                        $READS_1 $READS_2
        
    

Salmon (Home page)

Version 0.9.1

Run command:


    
        Index: $BIN/salmon index    –index         $INDEX_DIR/salmon   \
                                    –transcripts   $RSEM_TRANSCRIPT
    

    
        Quant: $BIN/salmon quant    --index         $INDEX_DIR/salmon   \
                                    --libType       A                   \
                                    -1              $READS_1            \
                                    -2              $READS_2            \
                                    -p              32                  \
                                    --ouput         $OUTPUT_DIR/salmon
    

Hera (Home page)

Version 1.2

Run command:


    
        Index: $BIN/hera_build  –fasta         $GENOME_FASTA       \
                                –gtf           $GENE_GTF           \
                                –outdir        $INDEX_DIR/hera/genome
    

    
        Quant: $BIN/hera quant      -i              $INDEX_DIR/hera     \
                                    -1              $READS_1            \
                                    -2              $READS_2            \
                                    -t              32                  \
                                    -o              $OUTPUT_DIR/hera