Next Generation Sequencing (NGS)/Ray

Contents
A basic knowledge of the UNIX command line is assumed.

In this tutorial, Ray will be installed in $HOME/software using its source code downloaded to $HOME/sources. A dataset will be downloaded to $HOME/datasets and it will be assembled de novo with Ray in $HOME/projects

Installing Ray
The first thing to do is to download the Ray tarball that contains its source code.

mkdir -p $HOME/sources cd $HOME/sources wget http://downloads.sourceforge.net/project/denovoassembler/Ray-v2.1.0.tar.bz2 tar -xjf Ray-v2.1.0.tar.bz2

A MPI library is required to install Ray. On Ubuntu or Debian, the package names are: openmpi-bin, libopenmpi-dev, make, g++.

Optionally, native support for compressed files can be included in Ray. This requires zlib and/or libbz2. On Ubuntu or Debian, the package names are: zlib1g-dev libbz2-dev.

With MPI installed, Ray can now be installed:

mkdir -p $HOME/software/ray cd $HOME/sources/Ray-v2.1.0 make HAVE_LIBZ=y HAVE_LIBBZ2=y PREFIX=$HOME/software/ray/2.1.0 make install

Obtaining data
The commands below fetch E. coli data.

mkdir -p $HOME/datasets/SRA001125 cd $HOME/datasets/SRA001125

wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000429/SRR001665_1.fastq.bz2 wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000429/SRR001665_2.fastq.bz2 wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000430/SRR001666_1.fastq.bz2 wget ftp://ftp.ddbj.nig.ac.jp/ddbj_database/dra/fastq/SRA001/SRA001125/SRX000430/SRR001666_2.fastq.bz2

Running Ray
It is a good habit to create a directory for each project. A directory will therefore be created for this tutorial.

mkdir -p $HOME/projects/Ray-tutorial cd $HOME/projects/Ray-tutorial

Next, we create symbolic links to the data files so that long paths are not required.

ln -s $HOME/datasets/SRA001125/SRR001665_1.fastq.bz2 ln -s $HOME/datasets/SRA001125/SRR001665_2.fastq.bz2 ln -s $HOME/datasets/SRA001125/SRR001666_1.fastq.bz2 ln -s $HOME/datasets/SRA001125/SRR001666_2.fastq.bz2

An arbitrary number of Ray processes can be launched. In this example, 4 Ray processes are launched. These processes can be on several computers or on a single computer.

mpiexec -n 4 $HOME/software/ray/2.1.0/Ray \ -k 21 -o EcoliAssembly \ -p SRR001665_1.fastq.bz2 SRR001665_2.fastq.bz2 \ -p SRR001666_1.fastq.bz2 SRR001666_1.fastq.bz2 \

The -k parameter sets the length of k-mers.

Assessing the assembly
Ray writes files to a single directory. Ray does several automated quality control tests.

You can list the produced files with:

ls EcoliAssembly

The important files are these:

less EcoliAssembly/OutputNumbers.txt less EcoliAssembly/Contigs.fasta less EcoliAssembly/Scaffolds.fasta less EcoliAssembly/CoverageDistribution.txt less EcoliAssembly/LibraryStatistics.txt