MERCYRI hff92474
64位https://the.earth.li/~sgtatham/putty/latest/w64/putty.exe
Winscp:
https://jaist.dl.sourceforge.net/project/winscp/WinSCP/5.13.4/WinSCP-5.13.4-Setup.exe
预处理原始数据
解压:
gunzip newBGIseq500_*.fq.gz
gunzip chr17.vcf.gz
fastq转fasta
命令如下:sh fq-fa.sh
fq-fa.sh
awk '{if(NR%4 == 1){print ">" substr($0, 2)}}{if(NR%4 == 2){print}}' newBGIseq500_1.fq > newBGIseq500_1.fa
awk '{if(NR%4 == 1){print ">" substr($0, 2)}}{if(NR%4 == 2){print}}' newBGIseq500_2.fq > newBGIseq500_2.fa
一、
1.成对reads数是1599999,能覆盖。
reads计算方法:CL100024405L1C001为newBGIseq500_1.fq和newBGIseq500_2.fq中reads特有的特征,故通过grep -c 'CL100024405L1C001' *.fq 来计算PE reads数。
reads命令如下:
sh reads.sh
reads.sh
grep -c 'CL100024405L1C001' newBGIseq500_1.fq
grep -c 'CL100024405L1C001' newBGIseq500_2.fq
覆盖方法及命令如下:
Rscript coverage.R
coverage.R
实际测序碱基数目<-1599999*100*2
需要覆盖碱基数目<-6*1e6*40*0.99
实际测序碱基数目>需要覆盖碱基数目
2.下载安装soapdenovo软件,
网址:https://github.com/aquaskyline/SOAPdenovo2
unzip SOAPdenovo2-master.zip
cd
SOAPdenovo2-master
pwd
/home/stu38/SOAPdenovo2-master
vi ~/.bashrc
source ~/.bashrc
make
建立配置文件lib.cfg
vi lib.cfg
max_rd_len=100
[LIB]
avg_ins=450
reverse_seq=0
asm_flags=3
rank=1
pair_num_cutoff=3
map_len=32
q1=/home/data/newBGIseq500_1.fq
q2=/home/data/newBGIseq500_2.fq
f1=/home/data/newBGIseq500_1.fa
f2=/home/data/newBGIseq500_2.fa
运行soapdenovo,得到组装结果ant.scafseq和ant.scafStatistics等
./SOAPdenovo-127mer all -s lib.cfg -K 35 -D 1 -o ant >>2ant.log
计算组装结果序列长度
命令如下:perl changdu.pl ant.scafseq > length.txt
changdu.pl
#!usr/bin/perl
use warnings;
use strict;
my (@arr, $changdu, %hash, $key);
open (FH, "$ARGV[0]"); #$ARGV[0]指的是脚本运行后跟的第一个参数
while(<FH>){
chomp;
if (/^>/){
$key=$_;
}
$hash{$key}.=$_ unless /^>/;
}
foreach (keys %hash){
$changdu=length($hash{$_});
print "$_\t$changdu\t$hash{$_}\n";
}
或者
awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }' ant.scafseq |awk '{print $1"\t"length($3)}' > length.txt
长度序列挑取及排序
命令如下:sh sort-length.sh
sort-length.sh
cut -f2 length.txt |sort -nr >sort-length.txt
长度累积曲线图,详见length.pdf
计算方法及命令如下:Rscript length.R
pdf("length.pdf")
dat<-read.table("/home/exam01/sort-length.txt")
d<-cumsum(dat)/sum(dat)
plot(d$V1,type="l",main="length distribution",xlab="length",ylab="frequency",col="red")
dev.off()
q()
3.组装结果的n50是687
计算方法及命令如下:
perl n50.pl ant.scafSeq
n50.pl
#/usr/bin/perl -w
use strict;
my ($len,$total)=(0,0);
my @x;
while(<>){
if(/^[\>\@]/){
if($len>0){
$total+=$len;
push @x,$len;
}
$len=0;
}
else{
s/\s//g;
$len+=length($_);
}
}
if ($len>0){
$total+=$len;
push @x,$len;
}
@x=sort{$b<=>$a} @x;
my ($count,$half)=(0,0);
for (my $j=0;$j<@x;$j++){
$count+=$x[$j];
if (($count>=$total/2)&&($half==0)){
print "N50: $x[$j]\n";
$half=$x[$j]
}elsif ($count>=$total*0.9){
print "N90: $x[$j]\n";
exit;
}
}
二、
1.vcf中变异位点的qual值分布请查看文件qual.txt,qual分布图详见qual.pdf
计算方法及命令如下:
下载安装vcftools
网址:https://jaist.dl.sourceforge.net/project/vcftools/vcftools_0.1.13.tar.gz
tar zxvf vcftools_0.1.13.tar.gz
pwd
/home/soft/vcftools_0.1.13
vi ~/.bashrc
source ~/.bashrccd vcftools_0.1.13
make
统计qual值分布,命令如下:
vcftools --vcf chr17.vcf --site-quality --out Q
图计算方法及命令如下:Rscript qual.R
pdf("qual.pdf")
dat<-read.table("/home/exam01/Q.1qual")
hist(dat[,2],main="qual distribution",xlab="qual value",ylab="frequency")
dev.off()
q()
2.TP53基因变异信息请查看文件tp53.vcf
提取tp53基因变异情况,命令如下:
vcftools --vcf chr17.vcf --chr chr17 --from-bp 7668402 --to-bp 7687550 --recode -c > tp53.vcf
计算变异位点总数目是41,命令如下:sh var.sh
var.sh
grep -v '^#' tp53.vcf |wc -l
计算各样本tp53变异位点情况,未变异数目=变异总数目-纯合和杂合变异数目,结果是27DMBDM4YT纯合9个,杂合8个,未变异24个,7XKZJA3JWX纯合6个,杂合33个,未变异2个,APRDKV0LDS纯合8个,杂合8个,未变异25个。
纯合计算方法及命令如下:sh hom-var1.sh
hom-var1.sh
less -S tp53.vcf | cut -f 10 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l
hom-var2.sh
less -S tp53.vcf | cut -f 11 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l
hom-var3.sh
less -S tp53.vcf | cut -f 12 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l
杂合计算方法及命令如下:sh het-var1.sh
hom-var1.sh
less -S tp53.vcf | cut -f 10 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l
hom-var2.sh
less -S tp53.vcf | cut -f 11 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l
hom-var3.sh
less -S tp53.vcf | cut -f 12 | grep -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l
三、
1.kmer图详见17mer.pdf,方法如下:
下载安装gce
网址:ftp://ftp.genomics.org.cn/pub/gce
tar -zxvf gce-1.0.0.tar.gz
cd gce-1.0.0/kmerfreq/kmer_freq_hash
pwd
/home/soft/gce-1.0.0/
kmerfreq/kmer_freq_hash
vi ~/.bashrc
source ~/.bashrc
建立fq.list,命令如下:ls /home/data/*.fq > fq.list
运行kmer分析,得到output.freq.stat,进行后续R画图
命令如下:kmer_freq_hash -k 17 -i 450000000 -l fq.list 2>kmerfreq.log
图计算方法及命令如下:Rscript 17mer.R
pdf("17mer.pdf")
dat<-read.table("/home/exam01/output.freq.stat")
plot(dat[,2],type="l",main="17mer distribution",xlab="17mer depth",ylab="frequency",xlim=c(0,200),ylim=c(0,1e6))
dev.off()
q()
2.基因组大小是6.4344e+06bp
计算方法及命令如下:less -S output.freq.stat | awk '{sum += $1*$2} END {print sum/41}'
四、
1.下载安装bzip、bwa、samtools、bcftools
bzip网址:https://pkgs.org/download/bzip2
bwa网址:https://jaist.dl.sourceforge.net/project/bio-bwa/bwa-0.7.17.tar.bz2
samtools网址:https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
bcftools网址:https://github.com/samtools/bcftools/releases/download/1.9/bcftools-1.9.tar.bz2
bzip:
tar -zxvf bzip2-1.0.6.tar.gz
cd bzip2-1.0.6
pwd
/home/soft/bzip2-1.0.6
vi ~/.bashrc
source ~/.bashrc
make
bwa:
tar jxvf bwa-0.7.17.tar.bz2
cd bwa
pwd
/home/soft/bwa-0.7.17
vi ~/.bashrc
source ~/.bashrc
make
bcftools:
tar jxvf bcftools-1.9.tar.bz2
cd bwa
pwd
/home/soft/bcftools-1.9
vi ~/.bashrc
source ~/.bashrc
make
samtools:
tar jxvf samtools-1.9.tar.bz2
cd bwa
pwd
/home/soft/samtools-1.9
vi ~/.bashrc
source ~/.bashrc
./congfigure --without-curses
make
2.bwa mem比对
bwa index *.fa
bwa mem *.fa newBGIseq500_1.fq >mem.sam
bwa aln ref.fa newBGIseq500_1.fq >read1.sai
bwa aln ref.fa newBGIseq500_2.fq >read2.sai
bwa sampe ref.fa read1.sai read2.sai newBGIseq500_1.fq newBGIseq500_2.fq >aln.sam
3.sam转bam,sort
samtools view -Sb mem.sam > mem.bam
samtools sort mem.bam >mem.sort
4.call变异
Samtools mpileup -ugf ref.fa mem.sort |bcftools call –vm –O z –o test.vcf
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请
点击举报。