Big Data convert to "transactions" from arules package
The arules package in R uses the class 'transactions'. So in order to use the function apriori()
I need to convert my existing data. I've got a Matrix with 2 columns and roughly 1.6mm rows and tried to convert the data like this:
transaction_data <- as(split(original_data[,"id"], original_data[,"type"]), "transactions")
where original_data is my data matrix. Because of the amount of data I used the largest AWS Amazon machine with 64gb RAM. After a while I get
resulting vector exceeds vector length limit in 'AnswerType'
The Memory Usage of the machine was still 'only' at 60%. Is this a R-based limitation? Is there any way to work around this other than using sampling? When only using 1/4 of the data the transformation worked fine.
Edit: As pointed out, one of the variables was a factor instead of character. After changing the transformation was processed quickly and correct.
I suspect that your problem is arising because one of the functions uses integers (rather than, say, floats) to index values. In any case, the size isn't too big, so this is surprising. Maybe the data has some other issue, such as characters as factors?
In general, though, I'd really recommend using memory mapped files, via bigmemory
, which you can also split and process via bigsplit
or mwhich
. If offloading the data works for you, then you can also use a much smaller instance size and save $$. :)
上一篇: R距离矩阵和聚类的混合和大型数据集?
下一篇: 大数据从arules包转换为“交易”