Align seqs w/ letter annotations, aware of position

Question

We have two DNA sequences (strings) that need to be aligned and annotated. Can we use the Biostrings package to align and get the relevant annotation for each base, and if so, how?

>1
ATGCAT
135198
>2
ATCAT

Answer

We can use the Biostrings package to align the two DNA sequences and get relevant annotation for each base. First, we create two DNAString objects, one for the pattern string (p) and one for the subject string (s). Then, we assign the annotation to the s string using the metadata slot. Finally, we use the pairwiseAlignment function to align the two strings and get the annotation for each matched base.

library(Biostrings)

p <- DNAString("ATCAT")
s <- DNAString("ATGCAT")
s_annot <- "135198"

s@metadata <- unlist(strsplit(s_annot , ""))
x <- pairwiseAlignment(pattern = p, subject = s)
metadata(x)
# [[1]]
# [1] "1" "3" "-" "1" "9" "8"

Expected output:

ATGCAT
AT-CAT
13-198

Thank you for the question and answer.