We have two DNA sequences (strings) that need to be aligned and annotated. Can we use the Biostrings package to align and get the relevant annotation for each base, and if so, how?
>1 ATGCAT 135198 >2 ATCAT
We can use the Biostrings package to align the two DNA sequences and get relevant annotation for each base. First, we create two
DNAString objects, one for the pattern string (
p) and one for the subject string (
s). Then, we assign the annotation to the
s string using the
metadata slot. Finally, we use the
pairwiseAlignment function to align the two strings and get the annotation for each matched base.
library(Biostrings) p <- DNAString("ATCAT") s <- DNAString("ATGCAT") s_annot <- "135198" s@metadata <- unlist(strsplit(s_annot , "")) x <- pairwiseAlignment(pattern = p, subject = s) metadata(x) # [] #  "1" "3" "-" "1" "9" "8"
ATGCAT AT-CAT 13-198