Is there any way to train a sklearn model by disk data like HDF5 or such ?
In my problem, I have very large dataset which is out of my memory. I would like to train my model by using disk data like HDF5 or such. Does sklearn support this or is there any other alternative ?
What you ask for is called out-of-core or streaming learning. It is only possible with a subset of the scikit-learn models that implement the partial_fit
method for incremental fitting.
There is an example in the documentation. There is no specific utility to fit models on data in HDF5 in particular but can can adapt this example to fetch the data from any external datasource (eg HDF5 data on the local disk or a database over the network, for instance using the pandas SQL adapter).
链接地址: http://www.djcxy.com/p/85386.html上一篇: 索引无法正常工作