Separate Mixed Dates and Times
I'm exporting data from a medical record platform.
The data looks like this...
Date.time TEMP HR RR SBP DBP
1 Jun-08-2015
2 1323 36.8 O – – – –
3 931 36.8 O 76 MC 22 SP 104 MC 52 MC
4 930 – – – – –
5 929 – – – – –
6 813 36.8 O 76 MC 22 SP 104 MC 52 MC
7 126 36.3 O 78 MC 23 SP 112 MC 55 MC
8 40 36.3 O 78 MC 23 SP 112 MC 55 MC
9 Jun-07-2015
10 2307 36 O 71 MC 22 SP 120 MC 57 MC
I need to be able to have date and time on a single column, but in the following format yyyymmddhhmm
1323 931 930 929 etc
correspond to time
My expected output is...
Date.time TEMP HR RR SBP DBP
1 201506081323 36.8 O – – – –
2 201506080931 36.8 O 76 MC 22 SP 104 MC 52 MC
3 201506080930 – – – – –
4 201506080929 – – – – –
5 201506080813 36.8 O 76 MC 22 SP 104 MC 52 MC
6 201506080126 36.3 O 78 MC 23 SP 112 MC 55 MC
7 201506080040 36.3 O 78 MC 23 SP 112 MC 55 MC
8 201506072307 36 O 71 MC 22 SP 120 MC 57 MC
Separate date into date and time, fill in missing dates, then paste back date and time, convert to date class.
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(x1 = if_else(nchar(Date.time) > 4, Date.time, NA_character_),
x2 = if_else(nchar(Date.time) > 4, NA_character_, Date.time),
x2 = str_pad(x2, width = 4, side = "left", pad = "0")) %>%
fill(x1) %>%
filter(!is.na(x2)) %>%
mutate(Date.time.v1 = as.POSIXct(paste(x1, x2), format = "%b-%d-%Y %H%M")) %>%
select(-c(x1, x2))
# Date.time TEMP HR RR SBP DBP Date.time.v1
# 1 1323 36.8 O - - - - 2015-06-08 13:23:00
# 2 931 36.8 O 76 MC 22 SP 104 MC 52 MC 2015-06-08 09:31:00
# 3 930 - - - - - 2015-06-08 09:30:00
# 4 929 - - - - - 2015-06-08 09:29:00
# 5 813 36.8 O 76 MC 22 SP 104 MC 52 MC 2015-06-08 08:13:00
# 6 126 36.3 O 78 MC 23 SP 112 MC 55 MC 2015-06-08 01:26:00
# 7 40 36.3 O 78 MC 23 SP 112 MC 55 MC 2015-06-08 00:40:00
# 8 2307 36 O 71 MC 22 SP 120 MC 57 MC 2015-06-07 23:07:00
data
df1 <- read.table(text = "
Date.time TEMP HR RR SBP DBP
Jun-08-2015
1323 36.8 O - - - -
931 36.8 O 76 MC 22 SP 104 MC 52 MC
930 - - - - -
929 - - - - -
813 36.8 O 76 MC 22 SP 104 MC 52 MC
126 36.3 O 78 MC 23 SP 112 MC 55 MC
40 36.3 O 78 MC 23 SP 112 MC 55 MC
Jun-07-2015
2307 36 O 71 MC 22 SP 120 MC 57 MC
", header = TRUE, sep = "t", stringsAsFactor = FALSE)
This is what I came up with, but still had to go back to the file in EXCEL to separate the dates from times. This didn't take long at all (maybe 1 minute). All files that I plan to work with are approximately the same length, so it's not a big deal.
After doing that I ended up with a file like this...
X Date.time TEMP HR RR SBP DBP
1 NA
2 Jun-08-2015 1323 36.8 O – – – –
3 Jun-08-2015 931 36.8 O 76 MC 22 SP 104 MC 52 MC
4 Jun-08-2015 930 – – – – –
5 Jun-08-2015 929 – – – – –
6 Jun-08-2015 813 36.8 O 76 MC 22 SP 104 MC 52 MC
7 Jun-08-2015 126 36.3 O 78 MC 23 SP 112 MC 55 MC
8 Jun-08-2015 40 36.3 O 78 MC 23 SP 112 MC 55 MC
9 NA
10 Jun-07-2015 2307 36 O 71 MC 22 SP 120 MC 57 MC
After that I used the following code. Sorry for all the comments I need to make the codes as easy to understand as possible so that everyone in my lab understands what's going on.
#eliminate empty rows
SJ <- na.omit(SJ)
#Convert month to number
SJ$newdate <- strptime(as.character(SJ$X), "%b-%d-%Y")
#Eliminate dashes from date
SJ$newdate <- gsub("[[:punct:]]","",SJ$newdate)
#Add column with "0000" for later use in proper date conversion
SJ$zeros <- rep("0000",nrow(SJ))
#Combine date column with zeros column to obtain date number of correct length
SJ$date = paste(SJ$newdate, SJ$zeros, sep="")
#convert date column to number
SJ$Date.time <- as.numeric(SJ$Date.time)
#Convert time column to number
SJ$date <- as.numeric(SJ$date)
#Add time column to date column resulting in desired datetime format. Saves as vector.
Datetime <- SJ$date + SJ$Date.time
#Inserts Datetime column as first column
SJ <- cbind(Datetime,SJ)
The file now looks like this.
Datetime X Date.time TEMP HR RR SBP DBP newdate zeros date
2 201506081323 Jun-08-2015 1323 36.8 O – – – – 20150608 0000 201506080000
3 201506080931 Jun-08-2015 931 36.8 O 76 MC 22 SP 104 MC 52 MC 20150608 0000 201506080000
4 201506080930 Jun-08-2015 930 – – – – – 20150608 0000 201506080000
5 201506080929 Jun-08-2015 929 – – – – – 20150608 0000 201506080000
6 201506080813 Jun-08-2015 813 36.8 O 76 MC 22 SP 104 MC 52 MC 20150608 0000 201506080000
7 201506080126 Jun-08-2015 126 36.3 O 78 MC 23 SP 112 MC 55 MC 20150608 0000 201506080000
8 201506080040 Jun-08-2015 40 36.3 O 78 MC 23 SP 112 MC 55 MC 20150608 0000 201506080000
10 201506072307 Jun-07-2015 2307 36 O 71 MC 22 SP 120 MC 57 MC 20150607 0000 201506070000
Finally, I simply deleted the unnecessary columns. X , Date.time , newdate , zeros , date
Thank you all for your help!
链接地址: http://www.djcxy.com/p/38328.html上一篇: 使用gsub函数清理R中数据框中的列
下一篇: 单独混合日期和时间