Snakemake的优势:
#conda安装
conda install -c bioconda -c conda-forge snakemake
#pip3安装
pip3 install snakemake
cd $HOME
# 创建一个工作目录
mkdir snakemake-example
cd snakemake-example
# 虚拟基因组数据
touch genome.fa
# 虚拟fastq数据
mkdir fastq
touch fastq/Sample1.R1.fastq.gz fastq/Sample1.R2.fastq.gz
touch fastq/Sample2.R1.fastq.gz fastq/Sample2.R2.fastq.gz
├── fastq
│ ├── Sample1.R1.fastq.gz
│ ├── Sample1.R2.fastq.gz
│ ├── Sample2.R1.fastq.gz
│ └── Sample2.R2.fastq.gz
└── genome.fa
SAMPLES = ['Sample1', 'Sample2']
rule all:
input:
expand('{sample}.txt', sample=SAMPLES)
rule quantify_genes:
input:
genome = 'genome.fa',
r1 = 'fastq/{sample}.R1.fastq.gz',
r2 = 'fastq/{sample}.R2.fastq.gz'
output:
'{sample}.txt'
shell:
'echo {input.genome} {input.r1} {input.r2} > {output}'
Inputs:输入文件,依赖文件。
Outputs:输出文件。
An action:运行的命令。可以是Bash命令,Python脚本,R脚本,R markdown等。
SAMPLES = ['Sample1', 'Sample2']
rule all:
input:
expand('{sample}.txt', sample=SAMPLES)
rule quantify_genes:
input:
genome = 'genome.fa',
r1 = 'fastq/{sample}.R1.fastq.gz',
r2 = 'fastq/{sample}.R2.fastq.gz'
output:
'{sample}.txt'
shell:
'echo {input.genome} {input.r1} {input.r2} > {output}'
$ snakemake -np
Building DAG of jobs...
Job counts:
count jobs
1 all
2 quantify_genes
3
[TueFeb1117:46:072020]
rule quantify_genes:
input: fastq/Sample1.R1.fastq.gz, genome.fa, fastq/Sample1.R2.fastq.gz
output: Sample1.txt
jobid: 1
wildcards: sample=Sample1
echo genome.fa fastq/Sample1.R1.fastq.gz fastq/Sample1.R2.fastq.gz > Sample1.txt
[TueFeb1117:46:072020]
rule quantify_genes:
input: fastq/Sample2.R1.fastq.gz, genome.fa, fastq/Sample2.R2.fastq.gz
output: Sample2.txt
jobid: 2
wildcards: sample=Sample2
echo genome.fa fastq/Sample2.R1.fastq.gz fastq/Sample2.R2.fastq.gz > Sample2.txt
[TueFeb1117:46:072020]
localrule all:
input: Sample1.txt, Sample2.txt
jobid: 0
Job counts:
count jobs
1 all
2 quantify_genes
3
This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
$ snakemake
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1(use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
2 quantify_genes
3
[TueFeb1118:01:042020]
rule quantify_genes:
input: genome.fa, fastq/Sample2.R1.fastq.gz, fastq/Sample2.R2.fastq.gz
output: Sample2.txt
jobid: 1
wildcards: sample=Sample2
[TueFeb1118:01:052020]
Finished job 1.
1 of 3 steps (33%) done
[TueFeb1118:01:052020]
rule quantify_genes:
input: genome.fa, fastq/Sample1.R1.fastq.gz, fastq/Sample1.R2.fastq.gz
output: Sample1.txt
jobid: 2
wildcards: sample=Sample1
[TueFeb1118:01:052020]
Finished job 2.
2 of 3 steps (67%) done
[TueFeb1118:01:052020]
localrule all:
input: Sample1.txt, Sample2.txt
jobid: 0
[TueFeb1118:01:052020]
Finished job 0.
3 of 3 steps (100%) done
$cat Sample1.txt
genome.fa fastq/Sample1.R1.fastq.gz fastq/Sample1.R2.fastq.gz
$cat Sample2.txt
genome.fa fastq/Sample2.R1.fastq.gz fastq/Sample2.R2.fastq.gz
snakemake --forceall --dag | dot -Tpng> dag.png
$ snakemake
Building DAG of jobs...
Nothing to be done.
SAMPLES = ['Sample1', 'Sample2']
rule all:
input:
'test.txt'
rule quantify_genes:
input:
genome = 'genome.fa',
r1 = 'fastq/{sample}.R1.fastq.gz',
r2 = 'fastq/{sample}.R2.fastq.gz'
output:
'{sample}.txt'
shell:
'echo {input.genome} {input.r1} {input.r2} > {output}'
rule collate_outputs:
input:
expand('{sample}.txt', sample=SAMPLES)
output:
'test.txt'
run:
with open(output[0], 'w') as out:
for i in input:
sample = i.split('.')[0]
for line in open(i):
out.write(sample + ' '+ line)
$snakemake
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 collate_outputs
2
rule collate_outputs:
input: Sample1.txt, Sample2.txt
output: test.txt
1 of 2 steps (50%) done
localrule all:
input: test.txt
2 of 2 steps (100%) done
$cat test.txt
Sample1 genome.fa ./fastq/Sample1.R1.fastq.gz ./fastq/Sample1.R2.fastq.gz
Sample2 genome.fa ./fastq/Sample2.R1.fastq.gz ./fastq/Sample2.R2.fastq.gz
snakemake --forceall --dag | dot -Tpng> dag.png
snakemake --jobs 8 --cluster 'qsub -cwd -q normal -l vf=5g,p=1'
├── config.yaml
├── softwares.yaml
├── samples.json
├── scripts
│ ├── script1.py
│ └── script2.R
├── rules
│ ├── 01qc.smk
│ ├── 02mapping.smk
│ └── 03plot.smk
└── Snakefile
联系客服