Read multiple excel spreadsheets into R using readxl and correct variable types
I have several excel files that I am trying to read into R using the package readxl
. The excel files consist of several tabs each of 60000 rows having four columns of variables. The first column is a simple integer count to track seconds from 0, 1, 2, etc. The second column is colon separated ( :
) time in HH:MM:SS. The third column is the forward slash separated ( /
) date as MM/DD/YYYY. The fourth column is a floating point decimal (eg 338.6).
Using the following code I get four columns and some of the formatting is consistent, but some data appears to be misinterpreted as dates or decimal numbers instead of integers, time, or date.
> data1 <- lapply(excel_sheets("./file_name.xls"),
read_excel, path = "./file_name.xls",
col_names = FALSE)
> head(data1[[1]])
X1 X2 X3 X4
1 502342 02:12:50 02/04/2015 338.6
2 502341 02:12:49 02/04/2015 338.1
3 502340 02:12:48 02/04/2015 337.5
4 502339 02:12:47 02/04/2015 337.6
5 502338 02:12:46 02/04/2015 337.5
6 502337 02:12:45 02/04/2015 338.0
> head(data1[[2]])
X1 X2 X3 X4
1 483664 08:56:48 488774 08:52:22
2 08:49:32 08:56:47 488774 08:52:22
3 185.2 08:56:46 488774 485475
4 483663 08:56:45 488774 08:52:22
5 08:49:31 08:56:44 488774 08:52:22
6 483662 08:56:43 488774 485475
> class(data1[[2]]$X1)
[1] "character"
> mode(data1[[2]]$X1)
[1] "character"
> tail(data1[[1]])
X1 X2 X3 X4
59995 08:49:35 08:56:54 488774 08:52:22
59996 483666 08:56:53 488774 485475
59997 08:49:34 08:56:52 488774 08:51:50
59998 185.3 08:56:51 488774 08:51:50
59999 483665 08:56:50 488774 485475
60000 08:49:33 08:56:49 488774 485475
> tail(data1[[2]])
X1 X2 X3 X4
59995 09:29:17 497592 488774 488206
59996 485927 497591 488774 488206
59997 09:29:16 497590 488774 488206
59998 485926 363.0 488774 488206
59999 09:29:15 12:49:37 488774 488206
60000 485925 497588 488774 488206
I also try using col_types
to define the column types, but this returns a data frame full of NA's.
> data1 <- lapply(excel_sheets("./file_name.xls"),
read_excel, path = "./file_name.xls",
col_names = FALSE,
col_types = c("numeric", "numeric", "date","numeric"))
There were 50 or more warnings (use warnings() to see the first 50)
> head(data1[[1]])
X1 X2 X3 X4
1 NA NA <NA> NA
2 NA NA <NA> NA
3 NA NA <NA> NA
4 NA NA <NA> NA
5 NA NA <NA> NA
6 NA NA <NA> NA
Using lapply()
with read_excel()
returns a list of data frames. I'm not sure if I should try to change variable types or how exactly to do this. The excel files themselves look consistent in terms of variable types. I even checked line 59998 in data1[[2]]
which shows 363.0 for X2, but it should be 03:42:51.
Should I try to format these data in excel or try to change it in R? Everything currently appears to be class character. What would be the most effective way to change the variable types in R?
Thanks for your help.
链接地址: http://www.djcxy.com/p/38322.html上一篇: 用于R导入的阵列数据的最有效格式?