Reference-base RNA-seq basic analysis
Info
If this is your first time using NGSPipe, then we strongly recommend that you start by running test data. If you already have experience with NGSPipe, we suggest you can go straight to the custom data section.
Reference genome-based - an assembled genome exists for a species for which an RNAseq experiment is performed. It allows reads to be aligned against the reference genome and significantly improves our ability to reconstruct transcripts.
A basic RNA-Seq analysis is ploy(A) selected RNA-Seq. This pipeline can be used for traditional transcriptome profiling, differential expression, and GO and KEGG annotation.
A typical flow of transcriptome analysis with reference is shown in the figure below
NGSPipeDb runpipe command line interface
ngspipedb runpipe -h
Usage: ngspipedb runpipe [OPTIONS] PROJECTNAME
run a lot of pipeline
Example:
python -m ngspipedbcli runpipe ngspipe-rnaseq-basic -n ngspipe-rnaseq-basic
-d test_pipeline --genomeFasta testdata_ngspipe-rnaseq-basic/genome/chr19.fa
--genomeAnno testdata_ngspipe-rnaseq-basic/genome/GRCm38.83.chr19.gtf
--samplefile testdata_ngspipe-rnaseq-basic/rawdata/sample.csv
--conditionfile testdata_ngspipe-rnaseq-basic/rawdata/condition.csv
--rawreadsdir testdata_ngspipe-rnaseq-basic/rawdata --snaketype p --report
-db -ps
Options:
-n, --pipename [ngspipe-rnaseq-basic|ngspipe-rnaseq-lncRNA|ngspipe-rnaseq-trinity|ngspipe-chipseq|ngspipe-resequencing|ngsdb]
ngspipedb env name [required]
-d, --directory PATH project directory
-j, --jobs INTEGER how many cpu to use
--genomeFasta TEXT genome sequence file (fasta)
--genomeAnno TEXT genome annotation file (gff/gtf)
--samplefile TEXT samplefile
--conditionfile TEXT conditionfile
--rawreadsdir TEXT raw reads directory
-e, --email_addr TEXT result directory name (under project
directory)
--reads_prefix TEXT reads prefix (Example: _R{}.fq.gz )
--resultdirname TEXT result directory name under proect name
directory
--snaketype [np|p] `p`: print snakemake shell commands. `np`:
Enable the dry run.
-r, --report generate html report
-db, --database generate database
-c, --configfile PATH config file path
--otherparams TEXT other snakemake params
-ps, --printshell print ngspipedb shell commands
-h, --help Show this message and exit.
RNA-Seq basic analysis on test data
1. Download test files
NGSPipe is dependent on reference files and raw sequence reads which can be found in http://www.liu-lab.com/ngspipedb/testdata.
To download the mouse RNA-seq test data into current directory:
ngspipedb download -n ngspipe-rnaseq-basic -t testdata && tar -zxvf testdata-ngspipe-rnaseq-basic.tar.gz
-n ngspipe-rnaseq-basic
select pipeline name here.-t testdata
select data type is testdata
Make sure you have the following directory structure by command tree testdata-ngspipe-rnaseq-basic
:
testdata_ngspipe-rnaseq-basic
├── genome
│ ├── GRCm38.83.chr19.gtf
│ └── chr19.fa
└── rawdata
├── condition.csv
├── control-0_R1.fq.gz
├── control-0_R2.fq.gz
├── control-1_R1.fq.gz
├── control-1_R2.fq.gz
├── control-2_R1.fq.gz
├── control-2_R2.fq.gz
├── sample.csv
├── treated-0_R1.fq.gz
├── treated-0_R2.fq.gz
├── treated-1_R1.fq.gz
├── treated-1_R2.fq.gz
├── treated-2_R1.fq.gz
└── treated-2_R2.fq.gz
2 directories, 16 files
Warning
The test data is only used to verify that the analytical process is working properly and the analysis results do not have a biological significance.
See help message for subcommand download ngspipedb download -h
Usage: ngspipedb download [OPTIONS]
Commands related to get testdata and database
Example:
python -m ngspipedbcli downloaddata -l
python -m ngspipedbcli downloaddata -n ngspipe-rnaseq-basic -t testdata -o
run_test/myproject_rnaseq_basic -ps
Options:
-l, --list list all available files.
-a, --all download all datatypes
-ps, --printshell print ngspipedb shell commands
-o, --directory PATH
-n, --pipeline [ngspipe-rnaseq-basic|ngspipe-rnaseq-lncRNA|ngspipe-rnaseq-trinity|ngspipe-chipseq|ngspipe-resequencing|ngsdb]
ngspipedb env name
-p, --platform [osx|linux] A file name or file path
-t, --datatype [env|testdata|database]
file types
-h, --help Show this message and exit.
2. Run RNA-seq analysis on test data
We provied a basic reference-based RNA-seq workflow for users to take a glance of ngspipe-rnaseq-basic. This workflow contains 7 steps:
1. sampling data (choose part of your data)
2. raw reads qc
3. junction align to genome
4. transcript assembly
5. gene quantification
6. statistic
7. differential gene analysis
You can do RNA-seq analysis by just one simply command:
ngspipedb runpipe mouse_rnaseq_analysis -n ngspipe-rnaseq-basic --genomeFasta testdata_ngspipe-rnaseq-basic/genome/chr19.fa --genomeAnno testdata_ngspipe-rnaseq-basic/genome/GRCm38.83.chr19.gtf --samplefile testdata_ngspipe-rnaseq-basic/rawdata/sample.csv --conditionfile testdata_ngspipe-rnaseq-basic/rawdata/condition.csv --rawreadsdir testdata_ngspipe-rnaseq-basic/rawdata --report -db -ps
mouse_rnaseq_analysis
your project name-n ngspipe-rnaseq-basic
pipeline name--genomeFasta testdata_ngspipe-rnaseq-basic/genome/chr19.fa
give a genome fasta file path, see file format fasta--genomeAnno testdata_ngspipe-rnaseq-basic/genome/GRCm38.83.chr19.gtf
give a genome annotaion file path gtf/gff--samplefile testdata_ngspipe-rnaseq-basic/rawdata/sample.csv
give a sample file path, which has one column
The final data files are put in the folder test_pipeline/ngspipe-rnaseq-basic
. Please check you result file tree -d -L 2 test_pipeline/ngspipe-rnaseq-basic
, it may like this:
test_pipeline/ngspipe-rnaseq-basic
├── database
├── genome
├── rawdata
└── result_Sep-06-2021
├── ngsdb_code
│ ├── __pycache__
│ ├── blastplus
│ ├── geneAnno
│ ├── geneDetail
│ ├── geneExpAtlas
│ ├── home
│ ├── igv
│ ├── media
│ ├── ngsdb
│ ├── search
│ ├── tools
│ └── wooey
├── ngsdb_data
│ ├── addscript
│ ├── blastdb
│ ├── exp
│ ├── gbrowse
│ ├── gff_sqlite3
│ └── migration
├── ngspipe_result
│ ├── diff
│ ├── mapping
│ ├── quantify
│ ├── rawReads_qc
│ ├── sampling_data
│ └── statistic
└── report
├── 1.pipeline
├── 2.rawreads_stat
├── 3.cleanreads_stat
├── 4.mapping_stat
└── 5.exp_stat
37 directories
Note
If you encounter any problem in this step, please turn to TroubleShooting
for help.
RNA-Seq basic analysis on custom data
1. start a project
Create a directory structure and copy configfile:
ngspipedb startproject custom_rnaseq_analysis -n ngspipe-rnaseq-basic
Make sure you have the following directory structure by command tree custom_rnaseq_analysis
:
custom_rnaseq_analysis
├── database
├── genome
├── ngsdb_config.yaml
├── ngspipe_config.yaml
└── rawdata
├── condition.csv
└── sample.csv
3 directories, 4 files
see help message for subcommand startproject: ngspipedb startproject -h
Usage: ngspipedb startproject [OPTIONS] PROJECTNAME
Creates a ngspipedb project directory structure for the given project name
in the current directory or optionally in the given directory.
Example:
python -m ngspipedbcli startproject myproject_rnaseq_basic -n ngspipe-
rnaseq-basic -ps
Options:
-n, --pipeline [ngspipe-rnaseq-basic|ngspipe-rnaseq-lncRNA|ngspipe-rnaseq-trinity|ngspipe-chipseq|ngspipe-resequencing|ngsdb]
pipelines from ngspipedb
-d, --directory TEXT project directory
-ps, --printshell print ngspipedb shell commands
-h, --help Show this message and exit.
2. modify configfile
rnaseq pipeline need 'reference' and 'raw reads data' in custom_rnaseq_analysis/ngspipe_config.yaml
to be right setting.
#---------------------------
# rnaseq-basic
#---------------------------
## 1.reference ##
genomeAnno_path: "genome/GRCm38.83.chr19.gtf" # gene annotation file, can be gtf or gff format
genomeFasta_path: "genome/chr19.fa" # genome sequence, fasta format
## 2.raw reads data ##
sample_path: "rawdata/sample.csv" # sample file
rawreads_dir: "rawdata" # sample file directory
read1Suffix: "_R1.fq.gz" # fastq suffix, read1
read2Suffix: "_R2.fq.gz"
## 3.condition for differential expression by deseq2 ##
condition_path: "rawdata/condition.csv"
## 4.output directory ##
results_name: "results"
## 5.notice ##
# if the string is 'nobody', ngspipe will not send email
# modify 'noboby' to 'xxx@qq.com' or 'xxx@qq.com,yyy@qq.com' to send email
Warning
You cannot mix Paired-end and Single-end samples within the same NGSPipe run as this will cause an ERROR. NGSPipe only support Paired-end samples.
Note
The input, output file paths are relative to the working directory (currently, working directory is custom_rnaseq_analysis
). If you have used -d
parameter, for example, -d run_pipeline_rnaseq_basic
is given, working directory will be run_pipeline_rnaseq_basic/custom_rnaseq_analysis
. Or you can just use absolute path (start from root /
).
3. modify samplefile
And give appropriate content to custom_rnaseq_analysis/rawdata/sample.csv
and custom_rnaseq_analysis/rawdata/condition.csv
one line in sample.csv
without anything else. Use testdata as an example, 6 samples exists, than it will looks like this:
control-0
control-1
control-2
treated-0
treated-1
treated-2
4. modify conditionfile
Three columns in condition.csv
with dot split. Users can't change the header sample_id,Sample,Tissue
. Please put column 3 empty.
sample_id,Sample,Tissue
control-0,control,
control-1,control,
control-2,control,
treated-0,treated,
treated-1,treated,
treated-2,treated,