How to Rename Multiple Observations in a Dataframe in R

There are lots of posts that cover how to rename columns or variables of a dataframe in R, but I didn’t come up with much when I ran a search for “how to rename observations of a dataframe in r.” Most of this is because I should have searched for something like “how to replace character strings in r”, but let’s pretend you know as little as I did and that’s how you find yourself here today.

I am a field ecologist, and I work with acoustic telemetry stations. I had a dataframe of station IDs and some other associated variables. I wanted to rename the stations, but the only way I could think of to do that was to make the first column (station) into row names, and I wasn’t in to that. Here’s what my dataframe looked like:

d[1:5, 1:8]

##        station     sn     datetime_pulled datetime_redeployed codemap downloaded   upgraded    battexp
## 1 Above Lisbon 115470 2013-09-05 10:27:00 2013-09-05 10:41:00     112 2013-09-05       <NA> 2014-09-05
## 2 Above Lisbon 115470 2013-12-20 14:03:00 2013-12-20 14:06:59     112 2013-12-20       <NA> 2014-09-05
## 3 Above Lisbon 115470 2014-03-13 13:47:00 2014-03-13 13:49:00     112 2014-03-13       <NA> 2014-09-05
## 4 Above Lisbon 115470 2014-07-15 11:15:00 2014-07-15 11:29:59     113 2014-07-15 2014-07-15 2015-01-15
## 5 Above Lisbon 115470 2014-08-29 10:34:59 2014-08-29 11:00:00     113       <NA>       <NA>       <NA>

I wanted to replace the station names because in the raw data, the station names had lots of spaces and parentheses, which are not very R-friendly:

s <- unique(d$station)
s

 [1] "Above Lisbon"      "Abv_rstr"          "Abv_swanston"      "Base of Toe Drain" "BCE"               "BCE2"              "BCW"               "BCW2"              "Cache Creek"       "I80(1)"            "I80(2)"            "I80(3)"            "I80(4)"            "Kgg's Ranch"       "Lisbon Weir"       "Marker at Levee"   "Putah Creek"       "Screw Trap"        "swanston"          "Wallace Weir"      "Willow Slough"

See? Terrible. So now, I’ll create a vector of the names I want the stations to have (“s2”), then replace the vector of bad versions (“s”) using mapvalues(), a handy little function from Hadley Wickham’s plyr package:

s2 <- c("Abv_lisbon", "Abv_rstr", "Abv_swanston", "Base_TD", "BCE", "BCE2", "BCW", "BCW2", "Cache_creek",
        "I80_1", "I80_2", "I80_3", "I80_4", "Knaggs", "Lisbon", "Levee_marker", "Putah_creek", "RSTR", "Swanston", 
        "Wallace_weir", "Willow_slough")
d$station <- plyr::mapvalues(d$station, from=s, to=s2)
unique(d$station)

 [1] "Abv_lisbon"    "Abv_rstr"      "Abv_swanston"  "Base_TD"       "BCE"           "BCE2"          "BCW"           "BCW2"          "Cache_creek"   "I80_1"         "I80_2"         "I80_3"         "I80_4"         "Knaggs"        "Lisbon"        "Levee_marker"  "Putah_creek"   "RSTR"          "Swanston"      "Wallace_weir"  "Willow_slough"

Sweet! Observations renamed, and now I can move on. But let’s say you only wanted to rename a single observation type. You can still do that with mapvalues(), but there is another alternative using str_replace() from Hadley Wickham’s stringr package:

ww <- filter(d, station=="Wallace_weir") #creates a subset dataframe for just the Wallace Weir observations
ww$station <- str_replace(ww$station, "Wallace Weir", "Wallace_weir") #replace the bad station name with the good one.

Hope this is helpful!

How to Rename Multiple Observations in a Dataframe in R

Myfanwy Johnston

Tuesday, May 05, 2015