R: count days that start at sunset
I'm analysing temporal patterns in a complex data set consisting of several environmental variables as well as activity data from various animal species. These data have been collected by multiple experimental setups, and data from each setup have been stored once per minute. The project has been running for several years now, so my data set is rather large.
The first few lines of one of my datasets look like this:
> head(setup_01)
DateTime Film_number unused PIR Wheel Temperature LightOld LightDay LightNight LightUV IDnumbers error mouse shrew vole rat frog rest extra_info odour
1 2015-03-10 12:27:10 x 0 0 13.40 1471.34 -0.97 1331.29 700.42 no error 0 0 0 0 0 0 1
2 2015-03-10 12:28:10 x 0 0 13.43 1471.38 -1.07 1291.11 731.32 no error 0 0 0 0 0 0 1
3 2015-03-10 12:29:10 x 0 0 13.31 1471.24 -1.08 1368.57 1016.02 no error 0 0 0 0 0 0 1
As I want to relate these variables to various natural cycles like sunrise and sunset throughout the seasons, I have made use of the package maptools
to calculate sunrise and sunset times
library(maptools)
gpclibPermit()
#set coordinates
crds=c(4.4900,52.1610)
# download the sunrise/sunset/etc data
setup_01$sunrise=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunrise")
setup_01$sunset=sunriset(matrix(crds,nrow=1),dateTime=as.POSIXct(setup_01$DateTime),POSIXct.out=TRUE,direction="sunset")
#create a variable that's 0 except at sunrise, and one that's 0 except at sunset
setup_01$sunrise_act=0
setup_01$sunset_act=0
setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunrise"]$time))<30,]$sunrise_act=1
setup_01[abs(unclass(setup_01[,"DateTime"])-unclass(setup_01[,"sunset"]$time))<30,]$sunset_act=1
As the behaviour of most animals differs, depending on whether it is day or night, I have used sunset/sunrise times to data to calculate a new variable that's 0 at night and 1 at daytime:
#create a variable that's 0 at night and 1 at daytime
setup_01$daytime=0
setup_01[setup_01[,"DateTime"]>setup_01[,"sunrise"]$time & setup_01[,"DateTime"]<setup_01[,"sunset"]$time,]$daytime=1
So far, so good... it's even possible with maptools
to use the start of civil/nautical/astronomical dusk and dawn instead of sunrise and sunset.
This, however, is where my problem starts. I want to number all the days in my experiment. And rather than increasing the day counter at midnight, as is usual and easy to do, I want to increase the day counter at sunset (or possibly in future experiments another moveable time of day like sunrise, nautical dusk and dawn,...). As sunset does not happen at the same time every day, it is not - for me - a straightforward problem to solve.
I've only come up with a for
-loop, which is not a nice way of doing things. Also, given that I have more than 6 years worth of data points collected once a minute in several setups, I can go sit and watch the tectonic plates move while R runs through a whole bunch of loops like these:
setup_01$day=0
day<-1
for(i in 1:nrow(setup_01)){
setup_01[i,]$day<-day
if(setup_01[i,]$sunset_act==1){
day<-day+1
}
}
Apart from being ugly and slow, this code has one big problem: it does not deal with missing values. Sometimes, due to equipment failure, data have not been recorded at all for hours or days. If no data have been recorded during a sunset, the above code does not increase the day counter. This means I need to - somehow - incorporate the date/time codes as well. It is easy to create a variable of days since the start of the experiment:
setup_01$daynumber<-as.integer(ceiling(difftime(setup_01$DateTime, setup_01$DateTime[1], units = "days")))
Perhaps these numbers can be used, possibly in conjunction with Heroka's nice rle
-algorithm.
I have used dput
to make a few months worth of data from one setup, including a few large chunks of missing data, as well as the newly created variables (as described in this post and in Heroka's answer) available here.
I've looked for something better, nicer and especially faster, but have been unable to come up with a nice trick. I've fiddled with subsetting my dataframe, but come to the conclusion that it is probably a silly approach. I've looked at maptools
, lubridate
, and GeoLight
. I've searched Google, Stack Overflow and various books, like Hadley Wickham's fantastic Advanced R. All to no avail. Perhaps I'm missing something obvious though. I'm hoping someone here can help me.
I prefer a solution based on precalculated tables. This is slower but I find it clearer to understand. Then I use dplyr
to arrange the info I need.
Let me show what I mean. For the sake of an example I create a list of sunset times. Of course you will need to calculate the actual ones.
library(dplyr)
n.obs=1000
set.seed(10)
t0 <- as.POSIXct('2015-03-08 18:00:00')
artificial.sunsets <- data.frame(num.day= seq(0,n.obs+35)) %>% mutate(sunset=cumsum(rlnorm(length(num.day))*30)+t0 + 24*3600*num.day)
artificial.sunsets
contains the day number and exact time of sunset but may also include more information about the day.
And some artificial data:
t0 <- as.POSIXct('2015-03-10 12:27:10')
test.data <- data.frame(DateTime=t0+ seq(0, n.obs*24*3600, by=3600), observation=rnorm(24*n.obs+1))
Then one can find the previous sunset using:
find.sunset.before <- function(x){
cbind(x,artificial.sunsets %>% filter(sunset < x$DateTime) %>% tail(.,n=1))
}
data.with.sunset=test.data %>% rowwise() %>% do(find.sunset.before(.)) %>% ungroup()%>% mutate(rel.time = DateTime-sunset)
head(data.with.sunset)
The resulting table would then contain three more columns 1) the corresponding day number 2) the corresponding sunset time, and 3) the time after sunset.
This should be robust against missing measurements as the day numbering occurs in another table. You can also easily modify the algorithm to use different times and even apply several.
Update
all this can be done much quicker using data.table:
library(data.table)
dt1 <- data.table(artificial.sunsets)
dt2 <- data.table(test.data)
dt1[,DateTime:=sunset]
setkey(dt1, DateTime)
setkey(dt2, DateTime)
r <- dt1[dt2,roll=TRUE]
r[,time.diff:=DateTime-sunset]
I tried timing it with system.time for 1000 observations - the previous takes about 1m, the data.table solution is 0.011s.
我想出了一个生成0和1的解决方案(因为您已经生成了这些),并且它与runlengths一起工作。
#sunset/sunrise is series of 0's and 1's indicating night and daytime, so solution that works for random sequence
#will work for OP's dataset
set.seed(10)
sunset <- c(1,rbinom(20,1,0.5))
#counter needs to be x for sequence of 11111 (day) and 0000(night), and then increase when 0 reappears
#counter starts at 1
#intermediate step: number each half-day
rle_sunset <- rle(sunset)
period <- rep(1:length(rle_sunset$lengths),rle_sunset$lengths)
#calculate day so that each two subsequent periods are one day
day <- ceiling(period/2)
> cbind(sunset,period,day)
sunset period day
[1,] 1 1 1
[2,] 1 1 1
[3,] 0 2 1
[4,] 0 2 1
[5,] 1 3 2
[6,] 0 4 2
[7,] 0 4 2
[8,] 0 4 2
[9,] 0 4 2
[10,] 1 5 3
[11,] 0 6 3
[12,] 1 7 4
[13,] 1 7 4
[14,] 0 8 4
[15,] 1 9 5
[16,] 0 10 5
[17,] 0 10 5
[18,] 0 10 5
[19,] 0 10 5
[20,] 0 10 5
[21,] 1 11 6
链接地址: http://www.djcxy.com/p/87340.html
下一篇: R:计算在日落时开始的日子