When you work with tagged fish, often one of the first things you need to do is to generate a record of each fish’s ‘encounters’ with acoustic receivers. I’ll present two different ways to do that in R, one adapted from this Stack Overflow post, and another one with a for-loop originally scripted by Anna Steel. In general, my approach has been to use the shorter method first, then check my work with the for-loop. For-loops can eat up some processing power and time, so it might be worth sub-setting the data first if you have millions of detections.
First, a reproducible dataset. This structure is representative of the raw .csvs we might export from Vemco’s VUE Software, but you can modify it to work with any acoustic telemetry dataframe. You should be able to run the script line by line, but if you want to source the whole script at once, you can download it here.
library(dplyr)
library(lubridate)
download.file('http://github.com/Myfanwy/ReproducibleExamples/raw/master/encounterhistories/sample_fishdata.RData', destfile = "sample_fishdata.Rdata")
load('sample_fishdata.Rdata')
head(d)
## DateTimeUTC Receiver TagID Station
## 1 2013-03-07 10:11:28 300557 4850 I80_1
## 2 2013-03-07 10:11:51 300557 4850 I80_1
## 3 2013-03-07 10:12:09 300557 4850 I80_1
## 4 2013-03-07 10:12:34 300557 4850 I80_1
## 5 2013-03-07 10:12:56 300557 4850 I80_1
## 6 2013-03-07 10:13:22 300557 4850 I80_1
We have a basic dataframe with Detection date/time, Receiver, TagID, and Station info in the columns. This is an example of data in long form, or ‘tidy’ format - always better to start with your data this way. We still have a little bit of work to do converting the data to the classes we need, though.
First, convert to correct date/time format, and TagIDs to characters so that they’re not treated as numerical (and therefore potentially continous) variables. There are numerous ways to get dates and times into the format you want in R, but my go-to is Hadley Wickham’s lubridate package. A good starting reference for more information is here.
library(lubridate)
d$DateTimeUTC <- ymd_hms(d$DateTimeUTC)
d$TagID <- as.character(d$TagID)
str(d)
## 'data.frame': 42819 obs. of 4 variables:
## $ DateTimeUTC: POSIXct, format: "2013-03-07 10:11:28" "2013-03-07 10:11:51" ...
## $ Receiver : int 300557 300557 300557 300557 300557 300557 300557 300557 300557 300557 ...
## $ TagID : chr "4850" "4850" "4850" "4850" ...
## $ Station : chr "I80_1" "I80_1" "I80_1" "I80_1" ...
Now we’re ready to generate our encounter histories. We do this by in two steps: first, we use dplyr to arrange the dataset by detection time, group the dataframe by tag and then by station, and filter out the duplicated stations.
library(dplyr)
d2 <- d %>%
arrange(DateTimeUTC) %>%
group_by(TagID, Station) %>%
filter(!duplicated(Station))
d2 # this dataframe is now a complete record of where each fish was detected at least once.
## Source: local data frame [95 x 4]
## Groups: TagID, Station [95]
##
## DateTimeUTC Receiver TagID Station
## (time) (int) (chr) (chr)
## 1 2013-03-07 10:11:28 300557 4850 I80_1
## 2 2013-03-07 11:17:12 300557 4844 I80_1
## 3 2013-03-07 11:50:38 300557 4848 I80_1
## 4 2013-03-07 12:27:14 300557 4842 I80_1
## 5 2013-03-07 12:51:48 300557 4859 I80_1
## 6 2013-03-07 13:56:52 300557 4862 I80_1
## 7 2013-03-07 15:58:11 300557 4858 I80_1
## 8 2013-03-07 16:30:19 300557 4855 I80_1
## 9 2013-03-07 20:36:39 300826 4844 Lisbon
## 10 2013-03-07 21:53:26 300826 4859 Lisbon
## .. ... ... ... ...
In the second step, we collapse that dataframe into a table using the table() function. This generates the encounter history by creating a contingency table of the counts at each combination of factor levels (in this case, we either have a tag at that particular station or we dont, so the levels are 0 or 1. If we hadn’t filtered out the duplicated stations, you would see a contingency table with the total numbers of detections for each fish at each station, which could be useful for other things, but not for generating this particular encounter history).
enchist <- with(d2, table(TagID, Station)) #
enchist
## Station
## TagID Base_TD BCE BCE2 BCW BCW2 I80_1 Lisbon MAE MAW Rstr
## 4842 1 1 1 1 1 1 1 1 1 1
## 4843 1 1 1 1 1 1 1 1 1 1
## 4844 1 1 1 1 1 1 1 1 1 1
## 4845 1 0 0 0 0 1 1 0 0 1
## 4847 0 0 0 0 0 1 1 0 0 0
## 4848 0 0 0 0 0 1 1 0 0 1
## 4849 0 0 0 0 0 1 0 0 0 0
## 4850 1 1 0 1 0 1 0 0 0 1
## 4851 0 0 0 0 0 1 0 0 0 0
## 4854 0 0 0 0 0 1 0 0 0 0
## 4855 1 0 0 0 0 1 1 0 0 1
## 4857 1 1 1 1 1 1 1 0 0 1
## 4858 1 1 1 1 1 1 1 1 1 1
## 4859 1 0 0 0 0 1 1 0 0 1
## 4861 1 1 1 1 1 1 1 1 1 1
## 4862 1 1 1 1 1 1 1 0 0 1
## 4863 0 0 0 0 0 1 0 0 0 0
## 4864 0 0 0 0 0 1 0 0 0 0
## 4865 0 0 0 0 0 1 1 0 0 0
We now have our encounter history by tag in the form of a ‘table’ object. This is useful thing on its own, but usually we are generating encounter histories in order to feed them into Program MARK or RMARK, so we need to take the final step and put them into the file format those programs need. You can do this with the following code:
ctab <- apply(enchist, 1, paste0, collapse="") # creates factored atomic vector with TagIDs as one level, and the collapsed encounter history as the other
data.frame(TagID=names(ctab),ch=ctab) # this is the object you would save as the .inp file for MARK.
## TagID ch
## 4842 4842 1111111111
## 4843 4843 1111111111
## 4844 4844 1111111111
## 4845 4845 1000011001
## 4847 4847 0000011000
## 4848 4848 0000011001
## 4849 4849 0000010000
## 4850 4850 1101010001
## 4851 4851 0000010000
## 4854 4854 0000010000
## 4855 4855 1000011001
## 4857 4857 1111111001
## 4858 4858 1111111111
## 4859 4859 1000011001
## 4861 4861 1111111111
## 4862 4862 1111111001
## 4863 4863 0000010000
## 4864 4864 0000010000
## 4865 4865 0000011000
This code is modified from a script originally coded by Anna Steel. For brevity’s sake, I’ll collapse this into a single code chunk with comments:
## Begin
zm <- d # just to differentiate from code above
tag.list = as.character(unique(zm$TagID)) # create a vector of all tags (codes) detected
sta.list = as.character(unique(zm$Station)) # make a vector of the station names
# create empty data frame for filling encounter history later
enc.hist = as.data.frame(matrix(rep(NA,(length(tag.list)*length(sta.list))),
length(tag.list), length(sta.list)))
colnames(enc.hist) = sta.list
rownames(enc.hist) = tag.list
# fill in data frame using a for-loop:
for (i in 1:length(sta.list))
{
sub <- zm[zm$Station == sta.list[i],] #subset datos down to just the station you're currently looping
subtags <- unique(sub$TagID) #creates vector of tags present at that station
enc.hist[,i] <- tag.list %in% subtags #fills in the column of enc.hist with True or False if that tag is seen or not
}
head(enc.hist) # you now have a matrix with TRUE (1)/FALSE (0) for encounters
## I80_1 Lisbon Rstr Base_TD BCW BCW2 BCE2 BCE MAE MAW
## 4850 TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE
## 4844 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 4848 TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 4842 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## 4859 TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 4862 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
## Finally, use logical syntax to convert TRUE to '1' and FALSE to '0'
enc.hist[enc.hist==TRUE] <- 1
enc.hist[enc.hist==FALSE] <- 0
enc.hist
## I80_1 Lisbon Rstr Base_TD BCW BCW2 BCE2 BCE MAE MAW
## 4850 1 0 1 1 1 0 0 1 0 0
## 4844 1 1 1 1 1 1 1 1 1 1
## 4848 1 1 1 0 0 0 0 0 0 0
## 4842 1 1 1 1 1 1 1 1 1 1
## 4859 1 1 1 1 0 0 0 0 0 0
## 4862 1 1 1 1 1 1 1 1 0 0
## 4858 1 1 1 1 1 1 1 1 1 1
## 4855 1 1 1 1 0 0 0 0 0 0
## 4851 1 0 0 0 0 0 0 0 0 0
## 4865 1 1 0 0 0 0 0 0 0 0
## 4864 1 0 0 0 0 0 0 0 0 0
## 4854 1 0 0 0 0 0 0 0 0 0
## 4861 1 1 1 1 1 1 1 1 1 1
## 4863 1 0 0 0 0 0 0 0 0 0
## 4857 1 1 1 1 1 1 1 1 0 0
## 4845 1 1 1 1 0 0 0 0 0 0
## 4843 1 1 1 1 1 1 1 1 1 1
## 4849 1 0 0 0 0 0 0 0 0 0
## 4847 1 1 0 0 0 0 0 0 0 0
# Now collapse to encounter histories for RMARK:
rmarktab <- apply(enc.hist, 1, paste0, collapse="")
rmarktab
## 4850 4844 4848 4842 4859
## "1011100100" "1111111111" "1110000000" "1111111111" "1111000000"
## 4862 4858 4855 4851 4865
## "1111111100" "1111111111" "1111000000" "1000000000" "1100000000"
## 4864 4854 4861 4863 4857
## "1000000000" "1000000000" "1111111111" "1000000000" "1111111100"
## 4845 4843 4849 4847
## "1111000000" "1111111111" "1000000000" "1100000000"
data.frame(TagID = names(rmarktab), ch = rmarktab)
## TagID ch
## 4850 4850 1011100100
## 4844 4844 1111111111
## 4848 4848 1110000000
## 4842 4842 1111111111
## 4859 4859 1111000000
## 4862 4862 1111111100
## 4858 4858 1111111111
## 4855 4855 1111000000
## 4851 4851 1000000000
## 4865 4865 1100000000
## 4864 4864 1000000000
## 4854 4854 1000000000
## 4861 4861 1111111111
## 4863 4863 1000000000
## 4857 4857 1111111100
## 4845 4845 1111000000
## 4843 4843 1111111111
## 4849 4849 1000000000
## 4847 4847 1100000000
That’s it! Obviously you’d want to compare the two final .inp files to make sure they recorded the same encounter history for each fish. Note that right now, the stations are left in a different order between the two methods, so you’d have to add a line of code to re-order the stations in one of the encounter history dataframes before collapsing it to compare.
The next blog post will cover how to visualize these encounter histories. :)
Special thanks to Anna Steel, and to Noam Ross for help with scripting the data-loading.