Next Generation Sequencing (NGS)/SOAPdenovo

We get some E coli data from SRR001665 you could type

wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR001/SRR001665/SRR001665_1.fastq.gz wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR001/SRR001665/SRR001665_2.fastq.gz

unpack the two files

gunzip SRR001665_1.fastq.gz gunzip SRR001665_2.fastq.gz

You will need to get SOAPdenovo and the data prepare module

wget http://soap.genomics.org.cn/down/x86_64.linux/SOAPdenovo31mer.tgz tar xvzf SOAPdenovo31mer.tgz

Also we have to make a config file. We name this cont.config

#maximal read length max_rd_len=36 [LIB] #average insert size avg_ins=200 #if sequence needs to be reversed reverse_seq=0 #use for contig building only asm_flags=1 #in which order the reads are used while scaffolding rank=1 #fastq files q1=./SRR001665_1.fastq q2=./SRR001665_2.fastq

And then we scaffold using a Kmer size of 31 (the read length is 36). We use the whole SOAP pipeline by specifying the "all" parameter By setting asm_flags to 3 the same library would be used for scaffolding as well. In this case SOAP will terminate in the scaffolding step with a floating point exception as there is nothing to scaffold with. Contigs will be found nevertheless in EC.contigs.

./SOAPdenovo31mer all -K 31 -s cont.config -o EC