Recode multiple factors w/ regexp

Adolfo.Raynor63 · June 24, 2023, 10:21am

I have a data frame with survey questions in the format Do you think that [xxxxxxx]?, where the possible answers are one of the following:

“I am certain that [xxxxxxx]”
“I think it is possible that [xxxxxx]”
“I don’t know if [xxxxxx]”
“I think it is not possible that [xxxxxx]”
“I am certain that it is not possible that [xxxxxx]”
“It is impossible for me to know if [xxxxxx]”

I would like to recode these factors so that “I am certain” = 1, “I think it is possible” = 2 and so on.

I have tried using dplyr::recode, but I am unable to use regular expressions.

Example data:

set.seed(12345)

possible_answers <- c(
    "I am certain that", "I think it is possible that",
    "I don't know if is possible that", "I think it is not possible that",
    "I am certain that it is not possible that", "It is impossible for me to know if"
)

num_answers <- 10
survey <- data.frame(
    Q1 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 1"
    ),
    Q2 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 2"
    ),
    Q3 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 3"
    ),
    Q4 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 4"
    ),
    Q5 = paste(
        sample(possible_answers, num_answers, replace = TRUE),
        "topic 5"
    )
)

I would like to recode the survey questions using dplyr::recode and regular expressions.

survey %>% 
    mutate_at(vars(starts_with("Q")), recode,
                "I am certain that (.*)" = 1,
                "I think it is possible that (.*)" = 2,
                "I don't know if is possible that (.*)" = 3,
                "I think it is not possible that (.*)" = 4,
                "I am certain that it is not possible that (.*)" = 5,
                "It is impossible for me to know if (.*)" = 6)

However, this changes everything to NA, because it does not see the strings as regular expressions.

Garth13 · June 24, 2023, 11:50am

To use regular expressions with dplyr::recode, you can use the matches() function. Here’s the modified code to achieve the desired recoding:

library(dplyr)

survey %>% 
  mutate_at(vars(starts_with("Q")), recode,
            matches("I am certain that (.*)") = 1,
            matches("I think it is possible that (.*)") = 2,
            matches("I don't know if (.*)") = 3,
            matches("I think it is not possible that (.*)") = 4,
            matches("I am certain that it is not possible that (.*)") = 5,
            matches("It is impossible for me to know if (.*)") = 6)

This will recode the survey questions based on the regular expressions provided.