Bradford Condon PhD

Bioinformatics, Web & Mobile Development


Plotting a SNP map by scaffold

 

I wanted to plot out SNPs along a chromosome. I was starting with SNP tables that were roughly equivalent to a .bed file:

scaffold00001   97859   97859
scaffold00001   176752  176752

Where the first column is the scaffold # in the reference, and the second and third columns are SNP locations. I adapted the R code below to print out SNP densities for select scaffolds. Because we’re working with hundreds, if not thousands, of scaffolds, it’s necessary to filter out the scaffolds of interest in some way, either by number of SNPs or the specific scaffolds of interest.

 

We do two quick things to help us out.  The first is use count() and match() to generate a column which essentially counts the number of SNPs for each scaffold.  Then we use order() to sort by this new SNP count column.


    library(plyr)
    library(ggplot2)
    
    setwd("/your path")
    snps<-read.table("out.txt",sep="\t",header=F)
    
    colnames(snps)<-c("chr","start","end")
    
    #Column of # SNPs
    counts= count(snps[1])
    snps$Count <- counts[[2]][ match( snps$chr, counts[[1]] ) ]
    
    snpsorder= snps[order(-snps[,4]),] 
    
    subset= subset(snpsorder, Count > 40000)
    
    # Plot SNP density
    toprint<-ggplot(subset) + 
      geom_histogram(aes(x=start), binwidth=1000) + # binwidth will effectively set the printing window size.
      facet_wrap(~ chr,ncol=2) + # fiddle with number of columns depending on how many scaffolds you print
      ggtitle("SNPs") +
        xlab("nucleotide number") +   ylab("SNP density") +   theme_bw() 
    
    
    # save to .png
    png("snps.png",750,750)
    print(toprint)
    dev.off()

The above R code will plot the only scaffold with over 40k SNP counts. I chose this number because it prints the below plot.

If I reduce the minimum to 20,000, I print a few more scaffolds

multiple SNP plots

Hope this code is helpful to you!