Merge a demographic var. into one col.: combine multiple cols

Carey.Rogahn13 · June 24, 2023, 4:59am

I’m trying to find a way to create a new variable called “Race” and merge the values from the separate columns into one. If a respondent has selected multiple races, the Race variable should be coded as “Mixed”.

ID <- c(rep(c(1:8), 1))
White <- c("White",NA,NA,NA,NA,NA,"White","White")
Asian <- c(NA,NA,NA,NA,NA,"Asian",NA,"Asian")
SouthAfrican <- c(NA,"SouthAfrican",NA,NA,NA,NA,NA,"SouthAfrican")
Hispanic <- c(NA,NA,NA,NA,"Hispanic","Hispanic",NA,NA)
WestAsian <- c(NA,NA,NA,NA,NA,NA,"WestAsian",NA)
PreferNotToAnswer <- c(NA,NA,"PreferNotToAnswer", "PreferNotToAnswer",NA,NA,NA,NA)

df <- data.frame(ID, White, Asian, SouthAfrican, Hispanic,WestAsian,PreferNotToAnswer)

# Create Race variable
df$Race <- NA

# Set Race variable to "White" when "White" is present
df$Race[df$White == "White"] <- "White"

# Set Race variable to "Asian" when "Asian" is present
df$Race[df$Asian == "Asian"] <- "Asian"

# Set Race variable to "SouthAfrican" when "SouthAfrican" is present
df$Race[df$SouthAfrican == "SouthAfrican"] <- "SouthAfrican"

# Set Race variable to "Hispanic" when "Hispanic" is present
df$Race[df$Hispanic == "Hispanic"] <- "Hispanic"

# Set Race variable to "WestAsian" when "WestAsian" is present
df$Race[df$WestAsian == "WestAsian"] <- "WestAsian"

# Set Race variable to "PreferNotToAnswer" when "PreferNotToAnswer" is present
df$Race[df$PreferNotToAnswer == "PreferNotToAnswer"] <- "PreferNotToAnswer"

# Set Race variable to "Mixed" when multiple race options are present
df$Race[rowSums(df[,2:6] > 0) > 1] <- "Mixed"

I’m trying to find a way to create a new variable called “Race” in a dataset with multiple demographic variables (White, Asian, SouthAfrican, Hispanic, WestAsian, and PreferNotToAnswer). The values in the separate columns should be merged into the new “Race” variable, and if a respondent has selected multiple races, the Race variable should be coded as “Mixed”.

Here is the code for replicating the dataset:

ID <- c(rep(c(1:8), 1))
White <- c("White",NA,NA,NA,NA,NA,"White","White")
Asian <- c(NA,NA,NA,NA,NA,"Asian",NA,"Asian")
SouthAfrican <- c(NA,"SouthAfrican",NA, NA,NA, NA, NA, "SouthAfrican")
Hispanic <- c(NA, NA, NA, NA, "Hispanic", "Hispanic", NA, NA)
WestAsian <- c(NA, NA, NA, NA, NA, NA, "WestAsian", NA)
PreferNotToAnswer <- c(NA, NA,"PreferNotToAnswer", "PreferNotToAnswer",NA, NA, NA, NA)

df <- data.frame(ID, White, Asian, SouthAfrican, Hispanic,WestAsian,PreferNotToAnswer)

The following code should create the desired “Race” variable:

# Create Race variable
df$Race <- NA

# Set Race variable to "White" when "White" is present
df$Race[df$White == "White"] <- "White"

# Set Race variable to "Asian" when "Asian" is present
df$Race[df$Asian == "Asian"] <- "Asian"

# Set Race variable to "SouthAfrican" when "SouthAfrican" is present
df$Race[df$SouthAfrican == "SouthAfrican"] <- "SouthAfrican"

# Set Race variable to "Hispanic" when "Hispanic" is present
df$Race[df$Hispanic == "Hispanic"] <- "Hispanic"

# Set Race variable to "WestAsian" when "WestAsian" is present
df$Race[df$WestAsian == "WestAsian"] <- "WestAsian"

# Set Race variable to "PreferNotToAnswer" when "PreferNotToAnswer" is present
df$Race[df$PreferNotToAnswer == "PreferNotToAnswer"] <- "PreferNotToAnswer"

# Set Race variable to "Mixed" when multiple race options are present
df$Race[rowSums(df[,2:6] > 0) > 1] <- "Mixed"

The result is a new “Race” variable with values from the separate columns merged into one. If a respondent has selected multiple races, the Race variable is coded as “Mixed”.

Autumn.OKon17 · June 24, 2023, 12:40pm

To create a new variable called “Race” in a dataset with multiple demographic variables (White, Asian, SouthAfrican, Hispanic, WestAsian, and PreferNotToAnswer) and merge the values into one, you can use the following code:

# Create Race variable
df$Race <- NA

# Set Race variable to "White" when "White" is present
df$Race[df$White == "White"] <- "White"

# Set Race variable to "Asian" when "Asian" is present
df$Race[df$Asian == "Asian"] <- "Asian"

# Set Race variable to "SouthAfrican" when "SouthAfrican" is present
df$Race[df$SouthAfrican == "SouthAfrican"] <- "SouthAfrican"

# Set Race variable to "Hispanic" when "Hispanic" is present
df$Race[df$Hispanic == "Hispanic"] <- "Hispanic"

# Set Race variable to "WestAsian" when "WestAsian" is present
df$Race[df$WestAsian == "WestAsian"] <- "WestAsian"

# Set Race variable to "PreferNotToAnswer" when "PreferNotToAnswer" is present
df$Race[df$PreferNotToAnswer == "PreferNotToAnswer"] <- "PreferNotToAnswer"

# Set Race variable to "Mixed" when multiple race options are present
df$Race[rowSums(df[,2:6] > 0) > 1] <- "Mixed"

This code assigns the appropriate values to the “Race” variable based on the values in the separate demographic columns. If a respondent has selected multiple races, the “Race” variable is coded as “Mixed”.