Coding Encounter Histories

When you work with tagged fish, often one of the first things you need to do is to generate a record of each fish’s ‘encounters’ with acoustic receivers. I’ll present two different ways to do that in R, one adapted from this Stack Overflow post, and another one with a for-loop originally scripted by Anna Steel. In general, my approach has been to use the shorter method first, then check my work with the for-loop. For-loops can eat up some processing power and time, so it might be worth sub-setting the data first if you have millions of detections.

First, a reproducible dataset. This structure is representative of the raw .csvs we might export from Vemco’s VUE Software, but you can modify it to work with any acoustic telemetry dataframe. You should be able to run the script line by line, but if you want to source the whole script at once, you can download it here.

library(dplyr)
library(lubridate)
download.file('http://github.com/Myfanwy/ReproducibleExamples/raw/master/encounterhistories/sample_fishdata.RData', destfile = "sample_fishdata.Rdata") 
load('sample_fishdata.Rdata')
head(d)

##           DateTimeUTC Receiver TagID Station
## 1 2013-03-07 10:11:28   300557  4850   I80_1
## 2 2013-03-07 10:11:51   300557  4850   I80_1
## 3 2013-03-07 10:12:09   300557  4850   I80_1
## 4 2013-03-07 10:12:34   300557  4850   I80_1
## 5 2013-03-07 10:12:56   300557  4850   I80_1
## 6 2013-03-07 10:13:22   300557  4850   I80_1

We have a basic dataframe with Detection date/time, Receiver, TagID, and Station info in the columns. This is an example of data in long form, or ‘tidy’ format - always better to start with your data this way. We still have a little bit of work to do converting the data to the classes we need, though.

First, convert to correct date/time format, and TagIDs to characters so that they’re not treated as numerical (and therefore potentially continous) variables. There are numerous ways to get dates and times into the format you want in R, but my go-to is Hadley Wickham’s lubridate package. A good starting reference for more information is here.

library(lubridate)
d$DateTimeUTC <- ymd_hms(d$DateTimeUTC)
d$TagID <- as.character(d$TagID)
str(d)

## 'data.frame':    42819 obs. of  4 variables:
##  $ DateTimeUTC: POSIXct, format: "2013-03-07 10:11:28" "2013-03-07 10:11:51" ...
##  $ Receiver   : int  300557 300557 300557 300557 300557 300557 300557 300557 300557 300557 ...
##  $ TagID      : chr  "4850" "4850" "4850" "4850" ...
##  $ Station    : chr  "I80_1" "I80_1" "I80_1" "I80_1" ...

Method 1: dplyr + table()

Now we’re ready to generate our encounter histories. We do this by in two steps: first, we use dplyr to arrange the dataset by detection time, group the dataframe by tag and then by station, and filter out the duplicated stations.

Step 1:

library(dplyr)
d2 <- d %>% 
  arrange(DateTimeUTC) %>% 
  group_by(TagID, Station) %>% 
  filter(!duplicated(Station))
d2 # this dataframe is now a complete record of where each fish was detected at least once.

## Source: local data frame [95 x 4]
## Groups: TagID, Station [95]
## 
##            DateTimeUTC Receiver TagID Station
##                 (time)    (int) (chr)   (chr)
## 1  2013-03-07 10:11:28   300557  4850   I80_1
## 2  2013-03-07 11:17:12   300557  4844   I80_1
## 3  2013-03-07 11:50:38   300557  4848   I80_1
## 4  2013-03-07 12:27:14   300557  4842   I80_1
## 5  2013-03-07 12:51:48   300557  4859   I80_1
## 6  2013-03-07 13:56:52   300557  4862   I80_1
## 7  2013-03-07 15:58:11   300557  4858   I80_1
## 8  2013-03-07 16:30:19   300557  4855   I80_1
## 9  2013-03-07 20:36:39   300826  4844  Lisbon
## 10 2013-03-07 21:53:26   300826  4859  Lisbon
## ..                 ...      ...   ...     ...

In the second step, we collapse that dataframe into a table using the table() function. This generates the encounter history by creating a contingency table of the counts at each combination of factor levels (in this case, we either have a tag at that particular station or we dont, so the levels are 0 or 1. If we hadn’t filtered out the duplicated stations, you would see a contingency table with the total numbers of detections for each fish at each station, which could be useful for other things, but not for generating this particular encounter history).

Step 2:

enchist <- with(d2, table(TagID, Station)) # 
enchist

##       Station
## TagID  Base_TD BCE BCE2 BCW BCW2 I80_1 Lisbon MAE MAW Rstr
##   4842       1   1    1   1    1     1      1   1   1    1
##   4843       1   1    1   1    1     1      1   1   1    1
##   4844       1   1    1   1    1     1      1   1   1    1
##   4845       1   0    0   0    0     1      1   0   0    1
##   4847       0   0    0   0    0     1      1   0   0    0
##   4848       0   0    0   0    0     1      1   0   0    1
##   4849       0   0    0   0    0     1      0   0   0    0
##   4850       1   1    0   1    0     1      0   0   0    1
##   4851       0   0    0   0    0     1      0   0   0    0
##   4854       0   0    0   0    0     1      0   0   0    0
##   4855       1   0    0   0    0     1      1   0   0    1
##   4857       1   1    1   1    1     1      1   0   0    1
##   4858       1   1    1   1    1     1      1   1   1    1
##   4859       1   0    0   0    0     1      1   0   0    1
##   4861       1   1    1   1    1     1      1   1   1    1
##   4862       1   1    1   1    1     1      1   0   0    1
##   4863       0   0    0   0    0     1      0   0   0    0
##   4864       0   0    0   0    0     1      0   0   0    0
##   4865       0   0    0   0    0     1      1   0   0    0

We now have our encounter history by tag in the form of a ‘table’ object. This is useful thing on its own, but usually we are generating encounter histories in order to feed them into Program MARK or RMARK, so we need to take the final step and put them into the file format those programs need. You can do this with the following code:

Step 3 (optional):

ctab <- apply(enchist, 1, paste0, collapse="") # creates factored atomic vector with TagIDs as one level, and the collapsed encounter history as the other

data.frame(TagID=names(ctab),ch=ctab) # this is the object you would save as the .inp file for MARK.

##      TagID         ch
## 4842  4842 1111111111
## 4843  4843 1111111111
## 4844  4844 1111111111
## 4845  4845 1000011001
## 4847  4847 0000011000
## 4848  4848 0000011001
## 4849  4849 0000010000
## 4850  4850 1101010001
## 4851  4851 0000010000
## 4854  4854 0000010000
## 4855  4855 1000011001
## 4857  4857 1111111001
## 4858  4858 1111111111
## 4859  4859 1000011001
## 4861  4861 1111111111
## 4862  4862 1111111001
## 4863  4863 0000010000
## 4864  4864 0000010000
## 4865  4865 0000011000

Method 2: for-loop

This code is modified from a script originally coded by Anna Steel. For brevity’s sake, I’ll collapse this into a single code chunk with comments:

## Begin
zm <- d # just to differentiate from code above
tag.list = as.character(unique(zm$TagID)) # create a vector of all tags (codes) detected
sta.list = as.character(unique(zm$Station)) # make a vector of the station names

# create empty data frame for filling encounter history later
enc.hist = as.data.frame(matrix(rep(NA,(length(tag.list)*length(sta.list))),
                                length(tag.list), length(sta.list)))
colnames(enc.hist) = sta.list
rownames(enc.hist) = tag.list

# fill in data frame using a for-loop:
for (i in 1:length(sta.list))
{
  sub <- zm[zm$Station == sta.list[i],] #subset datos down to just the station you're currently looping
  subtags <- unique(sub$TagID) #creates vector of tags present at that station
  enc.hist[,i] <- tag.list %in% subtags #fills in the column of enc.hist with True or False if that tag is seen or not
}
head(enc.hist) # you now have a matrix with TRUE (1)/FALSE (0) for encounters

##      I80_1 Lisbon Rstr Base_TD   BCW  BCW2  BCE2   BCE   MAE   MAW
## 4850  TRUE  FALSE TRUE    TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE
## 4844  TRUE   TRUE TRUE    TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## 4848  TRUE   TRUE TRUE   FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## 4842  TRUE   TRUE TRUE    TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## 4859  TRUE   TRUE TRUE    TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## 4862  TRUE   TRUE TRUE    TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

## Finally, use logical syntax to convert TRUE to '1' and FALSE to '0'
enc.hist[enc.hist==TRUE] <- 1
enc.hist[enc.hist==FALSE] <- 0
enc.hist

##      I80_1 Lisbon Rstr Base_TD BCW BCW2 BCE2 BCE MAE MAW
## 4850     1      0    1       1   1    0    0   1   0   0
## 4844     1      1    1       1   1    1    1   1   1   1
## 4848     1      1    1       0   0    0    0   0   0   0
## 4842     1      1    1       1   1    1    1   1   1   1
## 4859     1      1    1       1   0    0    0   0   0   0
## 4862     1      1    1       1   1    1    1   1   0   0
## 4858     1      1    1       1   1    1    1   1   1   1
## 4855     1      1    1       1   0    0    0   0   0   0
## 4851     1      0    0       0   0    0    0   0   0   0
## 4865     1      1    0       0   0    0    0   0   0   0
## 4864     1      0    0       0   0    0    0   0   0   0
## 4854     1      0    0       0   0    0    0   0   0   0
## 4861     1      1    1       1   1    1    1   1   1   1
## 4863     1      0    0       0   0    0    0   0   0   0
## 4857     1      1    1       1   1    1    1   1   0   0
## 4845     1      1    1       1   0    0    0   0   0   0
## 4843     1      1    1       1   1    1    1   1   1   1
## 4849     1      0    0       0   0    0    0   0   0   0
## 4847     1      1    0       0   0    0    0   0   0   0

# Now collapse to encounter histories for RMARK:
rmarktab <- apply(enc.hist, 1, paste0, collapse="")
rmarktab

##         4850         4844         4848         4842         4859 
## "1011100100" "1111111111" "1110000000" "1111111111" "1111000000" 
##         4862         4858         4855         4851         4865 
## "1111111100" "1111111111" "1111000000" "1000000000" "1100000000" 
##         4864         4854         4861         4863         4857 
## "1000000000" "1000000000" "1111111111" "1000000000" "1111111100" 
##         4845         4843         4849         4847 
## "1111000000" "1111111111" "1000000000" "1100000000"

data.frame(TagID = names(rmarktab), ch = rmarktab)

##      TagID         ch
## 4850  4850 1011100100
## 4844  4844 1111111111
## 4848  4848 1110000000
## 4842  4842 1111111111
## 4859  4859 1111000000
## 4862  4862 1111111100
## 4858  4858 1111111111
## 4855  4855 1111000000
## 4851  4851 1000000000
## 4865  4865 1100000000
## 4864  4864 1000000000
## 4854  4854 1000000000
## 4861  4861 1111111111
## 4863  4863 1000000000
## 4857  4857 1111111100
## 4845  4845 1111000000
## 4843  4843 1111111111
## 4849  4849 1000000000
## 4847  4847 1100000000

That’s it! Obviously you’d want to compare the two final .inp files to make sure they recorded the same encounter history for each fish. Note that right now, the stations are left in a different order between the two methods, so you’d have to add a line of code to re-order the stations in one of the encounter history dataframes before collapsing it to compare.

The next blog post will cover how to visualize these encounter histories. :)

Special thanks to Anna Steel, and to Noam Ross for help with scripting the data-loading.