与星群集Ipython并行插件分布式计算实例使用情况
我正在使用Ipython插件的星群集。 当我使用负载均衡模式从Ipython笔记本运行Kmeans群集时。 它始终是100%CPU使用率的主人。 而其他EC2实例从不承担负载。
我尝试过使用大型数据集和20个节点。 结果是相同的所有的负载在主。 我试着用node001直接查看,但即使如此,master仍然有所有负载。
我配置任何错误的东西吗? 我是否需要在配置中将禁用队列设置为true? 我如何分配所有实例的负载。
我无法发布图像,因为我是一个在stackoverflow中的初学者。 这是主节点和node001的htop
图片链接:https://drive.google.com/file/d/0BzfdmaY9JuagT0lmX29xY1RiUmM/view?usp =分享
任何帮助将不胜感激。
问候,Tej。
我的模板文件:
[cluster iptemplate]
KEYNAME = ********
CLUSTER_SIZE = 2
CLUSTER_USER = ipuser
CLUSTER_SHELL = bash
REGION = us-west-2
NODE_IMAGE_ID = ami-04bedf34
NODE_INSTANCE_TYPE = m3.medium
#DISABLE_QUEUE = True
PLUGINS = pypackages,ipcluster
[plugin ipcluster]
SETUP_CLASS = starcluster.plugins.ipcluster.IPCluster
ENABLE_NOTEBOOK = True
NOTEBOOK_PASSWD = *****
[plugin ipclusterstop]
SETUP_CLASS = starcluster.plugins.ipcluster.IPClusterStop
[plugin ipclusterrestart]
SETUP_CLASS = starcluster.plugins.ipcluster.IPClusterRestartEngines
[plugin pypackages]
setup_class = starcluster.plugins.pypkginstaller.PyPkgInstaller
packages = scikit-learn, psutil, scikit-image, numpy, pyzmq
[plugin opencvinstaller]
setup_class = ubuntu.PackageInstaller
pkg_to_install = cmake
[plugin pkginstaller]
SETUP_CLASS = starcluster.plugins.pkginstaller.PackageInstaller
# list of apt-get installable packages
PACKAGES = python-mysqldb
码:
from IPython import parallel
clients = parallel.Client()
rc = clients.load_balanced_view()
def clustering(X_digits):
from sklearn.cluster import KMeans
kmeans = KMeans(20)
mu_digits = kmeans.fit(X_digits).cluster_centers_
return mu_digits
rc.block = True
rc.apply(clustering, X_digits)
我只是自己学习了starcluster / ipython,但是这个要旨似乎与@ thomas-k的评论一样,即你需要构建你的代码,以便能够传递给负载平衡映射:
https://gist.github.com/pprett/3989337
cv = KFold(X.shape[0], K, shuffle=True, random_state=0)
# instantiate the tasks - K times the number of grid cells
# FIXME use generator to limit memory consumption or do fancy
# indexing in _parallel_grid_search.
tasks = [(i, k, estimator, params, X[train], y[train], X[test], y[test])
for i, params in enumerate(grid) for k, (train, test)
in enumerate(cv)]
# distribute tasks on ipcluster
rc = parallel.Client()
lview = rc.load_balanced_view()
results = lview.map(_parallel_grid_search, tasks)
链接地址: http://www.djcxy.com/p/53429.html
上一篇: Distributed computing instance usage with starcluster Ipython parallel plugin
下一篇: How to see how many nodes a process is using on a cluster with Sun grid engine?