Separate Mixed Dates and Times

I'm exporting data from a medical record platform.

The data looks like this...

      Date.time    TEMP      HR    RR     SBP    DBP
1   Jun-08-2015                                     
2          1323  36.8 O       –     –       –      –
3           931  36.8 O   76 MC 22 SP  104 MC  52 MC
4           930       –       –     –       –      –
5           929       –       –     –       –      –
6           813  36.8 O   76 MC 22 SP  104 MC  52 MC
7           126  36.3 O   78 MC 23 SP  112 MC  55 MC
8            40  36.3 O   78 MC 23 SP  112 MC  55 MC
9   Jun-07-2015                                     
10         2307    36 O   71 MC 22 SP  120 MC  57 MC

I need to be able to have date and time on a single column, but in the following format yyyymmddhhmm

1323 931 930 929 etc correspond to time

My expected output is...

      Date.time    TEMP      HR    RR     SBP    DBP
1     201506081323  36.8 O       –     –       –      –
2     201506080931  36.8 O   76 MC 22 SP  104 MC  52 MC
3     201506080930       –       –     –       –      –
4     201506080929       –       –     –       –      –
5     201506080813  36.8 O   76 MC 22 SP  104 MC  52 MC
6     201506080126  36.3 O   78 MC 23 SP  112 MC  55 MC
7     201506080040  36.3 O   78 MC 23 SP  112 MC  55 MC
8     201506072307    36 O   71 MC 22 SP  120 MC  57 MC

Separate date into date and time, fill in missing dates, then paste back date and time, convert to date class.

library(dplyr)
library(tidyr)
library(stringr)

df1 %>% 
  mutate(x1 = if_else(nchar(Date.time) > 4, Date.time, NA_character_),
         x2 = if_else(nchar(Date.time) > 4, NA_character_, Date.time),
         x2 = str_pad(x2, width = 4, side = "left", pad = "0")) %>% 
  fill(x1) %>% 
  filter(!is.na(x2)) %>% 
  mutate(Date.time.v1 = as.POSIXct(paste(x1, x2), format = "%b-%d-%Y %H%M")) %>% 
  select(-c(x1, x2))

#   Date.time   TEMP    HR    RR    SBP   DBP        Date.time.v1
# 1      1323 36.8 O     -     -      -     - 2015-06-08 13:23:00
# 2       931 36.8 O 76 MC 22 SP 104 MC 52 MC 2015-06-08 09:31:00
# 3       930      -     -     -      -     - 2015-06-08 09:30:00
# 4       929      -     -     -      -     - 2015-06-08 09:29:00
# 5       813 36.8 O 76 MC 22 SP 104 MC 52 MC 2015-06-08 08:13:00
# 6       126 36.3 O 78 MC 23 SP 112 MC 55 MC 2015-06-08 01:26:00
# 7        40 36.3 O 78 MC 23 SP 112 MC 55 MC 2015-06-08 00:40:00
# 8      2307   36 O 71 MC 22 SP 120 MC 57 MC 2015-06-07 23:07:00

data

df1 <- read.table(text = "
Date.time   TEMP    HR  RR  SBP DBP
Jun-08-2015                 
1323    36.8 O  -   -   -   -
931 36.8 O  76 MC   22 SP   104 MC  52 MC
930 -   -   -   -   -
929 -   -   -   -   -
813 36.8 O  76 MC   22 SP   104 MC  52 MC
126 36.3 O  78 MC   23 SP   112 MC  55 MC
40  36.3 O  78 MC   23 SP   112 MC  55 MC
Jun-07-2015                 
2307    36 O    71 MC   22 SP   120 MC  57 MC
", header = TRUE, sep = "t", stringsAsFactor = FALSE)

This is what I came up with, but still had to go back to the file in EXCEL to separate the dates from times. This didn't take long at all (maybe 1 minute). All files that I plan to work with are approximately the same length, so it's not a big deal.

After doing that I ended up with a file like this...

              X Date.time    TEMP      HR    RR     SBP    DBP
1                      NA                                     
2   Jun-08-2015      1323  36.8 O       –     –       –      –
3   Jun-08-2015       931  36.8 O   76 MC 22 SP  104 MC  52 MC
4   Jun-08-2015       930       –       –     –       –      –
5   Jun-08-2015       929       –       –     –       –      –
6   Jun-08-2015       813  36.8 O   76 MC 22 SP  104 MC  52 MC
7   Jun-08-2015       126  36.3 O   78 MC 23 SP  112 MC  55 MC
8   Jun-08-2015        40  36.3 O   78 MC 23 SP  112 MC  55 MC
9                      NA                                     
10  Jun-07-2015      2307    36 O   71 MC 22 SP  120 MC  57 MC

After that I used the following code. Sorry for all the comments I need to make the codes as easy to understand as possible so that everyone in my lab understands what's going on.

#eliminate empty rows
SJ <- na.omit(SJ)

#Convert month to number
SJ$newdate <- strptime(as.character(SJ$X), "%b-%d-%Y")

#Eliminate dashes from date
SJ$newdate <- gsub("[[:punct:]]","",SJ$newdate)

#Add column with "0000" for later use in proper date conversion
SJ$zeros <- rep("0000",nrow(SJ))

#Combine date column with zeros column to obtain date number of correct length
SJ$date = paste(SJ$newdate, SJ$zeros, sep="")

#convert date column to number 
SJ$Date.time <- as.numeric(SJ$Date.time)

#Convert time column to number
SJ$date <- as.numeric(SJ$date)

#Add time column to date column resulting in desired datetime format. Saves as vector.
Datetime <- SJ$date + SJ$Date.time

#Inserts Datetime column as first column
SJ <- cbind(Datetime,SJ)

The file now looks like this.

        Datetime           X Date.time    TEMP      HR    RR     SBP    DBP  newdate zeros         date
2   201506081323 Jun-08-2015      1323  36.8 O       –     –       –      – 20150608  0000 201506080000
3   201506080931 Jun-08-2015       931  36.8 O   76 MC 22 SP  104 MC  52 MC 20150608  0000 201506080000
4   201506080930 Jun-08-2015       930       –       –     –       –      – 20150608  0000 201506080000
5   201506080929 Jun-08-2015       929       –       –     –       –      – 20150608  0000 201506080000
6   201506080813 Jun-08-2015       813  36.8 O   76 MC 22 SP  104 MC  52 MC 20150608  0000 201506080000
7   201506080126 Jun-08-2015       126  36.3 O   78 MC 23 SP  112 MC  55 MC 20150608  0000 201506080000
8   201506080040 Jun-08-2015        40  36.3 O   78 MC 23 SP  112 MC  55 MC 20150608  0000 201506080000
10  201506072307 Jun-07-2015      2307    36 O   71 MC 22 SP  120 MC  57 MC 20150607  0000 201506070000

Finally, I simply deleted the unnecessary columns. X , Date.time , newdate , zeros , date

Thank you all for your help!

链接地址: http://www.djcxy.com/p/38328.html

上一篇: 使用gsub函数清理R中数据框中的列

下一篇: 单独混合日期和时间