Question
We have two DNA sequences (strings) that need to be aligned and annotated. Can we use the Biostrings package to align and get the relevant annotation for each base, and if so, how?
>1
ATGCAT
135198
>2
ATCAT
Answer
We can use the Biostrings package to align the two DNA sequences and get relevant annotation for each base. First, we create two DNAString
objects, one for the pattern string (p
) and one for the subject string (s
). Then, we assign the annotation to the s
string using the metadata
slot. Finally, we use the pairwiseAlignment
function to align the two strings and get the annotation for each matched base.
library(Biostrings)
p <- DNAString("ATCAT")
s <- DNAString("ATGCAT")
s_annot <- "135198"
s@metadata <- unlist(strsplit(s_annot , ""))
x <- pairwiseAlignment(pattern = p, subject = s)
metadata(x)
# [[1]]
# [1] "1" "3" "-" "1" "9" "8"
Expected output:
ATGCAT
AT-CAT
13-198