Exogenous TE insertion
- ITIS: tool to Identify TE Insertion Sites in genome
- https://github.com/Chuan-Jiang/ITIS
- https://pubmed.ncbi.nlm.nih.gov/25887332/
Two way to run:
ngspipedb runpipe ngspipe-tnt-medicago -n ngspipe-tnt --resultdirname result -d ../ --genomeFasta ../genome/scf003.fa --genomeAnno ../genome/scf003.sorted.gff --samplefile ../rawdata/sample.csv --rawreadsdir ../rawdata --snaketype p --reads_prefix _R{}_paired.fastq.gz -j 20
ngspipe-tnt-medicago
your project name-n ngspipe-tnt
pipeline name--genomeFasta ../genome/scf003.fa
give a genome fasta file path, see file format fasta--genomeAnno ../genome/scf003.sorted.gff
give a genome annotaion file path gtf/gff--samplefile ../rawdata/sample.csv
give a sample file path, which has one column
-
start a project in current directory:
ngspipedb startproject medicago_tnt_zhao -n ngspipe-tnt
- PROJECTNAME: medicago_tnt_zhao
- pipeline: ngspipe-tnt
tree medicago_tnt_zhao
├── database ├── genome ├── ngsdb_config.yaml ├── ngspipe_config.yaml └── rawdata └── sample.csv 3 directories, 3 files
-
copy your file to directory:
├── database │ ├── exogenous │ │ ├── medicago_mere1.fa │ │ └── medicago_tnt1.fa ├── genome │ ├── GRCm38.83.chr19.gtf │ ├── chr19.fa │ └── chr19.fa.fai └── rawdata ├── control-0_R1.fq.gz ├── control-0_R2.fq.gz ├── control-1_R1.fq.gz ├── control-1_R2.fq.gz ├── control-2_R1.fq.gz ├── control-2_R2.fq.gz ├── sample.csv ├── treated-0_R1.fq.gz ├── treated-0_R2.fq.gz ├── treated-1_R1.fq.gz ├── treated-1_R2.fq.gz ├── treated-2_R1.fq.gz └── treated-2_R2.fq.gz
- database/exogenous: the directory of exogenous sequence
- genome: the genome file
- rawdata: resequencing data
-
modify configfile file
medicago_tnt_zhao/ngspipe_config.yaml
:#--------------------------- # medicago tnt1 #--------------------------- ## 1.reference ## genomeAnno_path: "genome/GRCm38.83.chr19.gtf" # gene annotation file, can be gtf or gff format genomeFasta_path: "genome/chr19.fa" # genome sequence, fasta format exogenous_seq_path: "database/exogenous/medicago_tnt1.fa" # medicago_mere1.fa or medicago_tnt1.fa ## 2.raw reads data ## sample_path: "rawdata/sample.csv" # sample file rawreads_dir: "rawdata" # sample file directory read1Suffix: "_R1.fq.gz" # fastq suffix, read1 read2Suffix: "_R2.fq.gz" ## 3.output directory ## results_name: "results" ## 4.notice ## # if the string is 'nobody', ngspipe will not send email # modify 'noboby' to 'xxx@qq.com' or 'xxx@qq.com,yyy@qq.com' to send email email_addr: 'nobody' # choose where to stop your pipeline target: all # #---------------------------------- # Configuration for sampling data #---------------------------------- # for test the pipe, you can choose to the part of the input file # which sampling method do you want to use? sampling_method: links # links or head or tail or seqkit_number or seqkit_proportion # Default is links (ues the whole data of sample); head (use first sampling_range line in every sample),tail (use last sampling_range line in every sample); seqkit_number (number of reads); seqkit_proportion (percentage of reads) # and how many reads file line or reads number or reads proportion do you want to use? sampling_value: 80000 # for head and tail, this value is line number; for number, this value is reads number; for proportion, this value is percentage samples_num: all # all or interger # Default is all (use all samples), give a sample number, must less than real sample number, for example 6 #---------------------------------- # Configuration for Quality Control #---------------------------------- # which qc method do you want to use? qc_method: trim-galore # trim-galore or trimmomatic or fastqc #---------------------------------- # Configuration for Genome cut #---------------------------------- # which part of genome do you want to use? sub_genome: all # all (whole genome) or chr1:10000-20000 (seq name:start position-end position) or chr1 (seq name)
-
run command:
ngspipedb runpipe medicago_tnt_zhao -n ngspipe-tnt -c medicago_tnt_zhao/ngspipe_config.yaml -j 1