Learning Goals
- Run Tophat/Bowtie alignments of reads to see what are expressed regions
- Run EST/Transcripts to genome alignments to find genes
- Run protein to genome alignments to find genes
- Visualize these results in Browser (IGV)
Let's fix our paths on CPP system
export PATH=$PATH:/apps/tophat:/apps/bowtie:/apps/cufflinks:/apps/blast/bin:/apps/exonerate/bin:/apps/java:/apps/IGV
Running RNASeq Alignments
Download sequence for genome, proteins, and RNAs
wget http://stajichlab.github.io/GenomeAnnotation/data/locus.tar.gz
Uncompress the file
tar zxf locus.tar.gz # uncompress the small dataset
Align the raw sequence reads against the genome locus with Bowtie/TopHat
bowtie2-build locus.fa locus # index the database
tophat locus RNASeq_locusonly.3H.fq # run the search
# on CPP system samtools is samtools_0.1.18 otherwise use samtools
samtools_0.1.18 index tophat_out/accepted_hits.bam
- Let's investigate that alignment file.
- Open IGV. - use igv.sh
- Load locus.fa from the Genomes menu
- File - Load the tophat_out/accepted_hits.bam
- File - Load locus.fungidb.gff
Aligning ESTs to the genome
- Align ESTs to genome with exonerate
exonerate -m e2g ESTs.fa locus.fa --showtargetgff > EST.aln.gff
- Now load this GFF into IGV to visualize
Aligning Proteins to the genome
- Align proteins to genome with BLASTX
makeblastdb -in mory_proteins.fa -dbtype prot # format the db for BLAST
makeblastdb -in locus.fa -dbtype nucl # make the db for BLAST
blastx -query locus.fa -db mory_proteins.fa -outfmt 6 -evalue 1e-4 > mory.BLASTX.tab # run BLASTX to find homologs
tblastn -query mory_proteins.fa -db locus.fa -outfmt 6 > mory.TBLASTN.tab
python blast2gff.py mory.TBLASTN.tab TBLASTN LGV_locus test > mory_proteins.TBLASTN.gff
- Now load this GFF into IGV to visualize
- Align proteins to genome with exonerate
exonerate -m p2g mory_proteins.fa locus.fa --showtargetgff > mory_proteins.aln.gff
- Now load this GFF into IGV to visualize
Practice with larger datasets
wget http://stajichlab.github.io/GenomeAnnotation/data/big.tgz
tar zxf big.tgz
wget http://www.fungidb.org/common/downloads/Current_Release/Fgraminearum_PH-1/fasta/data/FungiDB-27_Fgraminearum_PH-1_AnnotatedProteins.fasta
- Look in the new folder 'big'
- there is a whole chromosome file now NcraOR74A_LGV.fa; Index this with bowtie2-build and run tophat
- Use this file Ncra3H_ChrV_reads.fastq to align to the genome with tophat.
- Load your new aligned bamfile reads (Step 3) and the genes in Ncra_OR74A_LGV.genes.gff
- Use this file Nc5H-Trinity.fasta to align transcripts to the chromosome with exonerate
- Load the chromosome NcraOR74A_LGV.fa into IGV and load its annotations NcraOR74A_LGV.genes.gff
- Use the downloaded file from another genome FungiDB-27_Fgraminearum_PH-1_AnnotatedProteins.fasta to align proteins to this chromosome with BLASTX
- You can try to run exonerate but it works better if you already have a subset of proteins that align to this chromosome as exonerate will try to align all proteins in the file (will take a while).
- Load some of the alignments into IGV if you get it to work