How to see how many nodes a process is using on a cluster with Sun grid engine?
I am (trying to) run R on a multicore computing cluster with a Sun grid engine. I would like to run R in parallel using the MPI environment and the snow / snowfall parLapply() functions. My code is working at least on my laptop, but to be sure whether it does what it is supposed to on the cluster as well, I have the following questions.
If I request a number of slots / nodes, say 4, how can I check whether a running process actually uses the full number of the requested CPUs? Is there a commend that can show details about the CPU usage on the requested nodes for a process?
In order to verify that the cluster workers really started on the appropriate nodes, I often use the following command right after creating the cluster object:
clusterEvalQ(cl, Sys.info()['nodename'])
This should match the list of allocated nodes reported by the qstat
command.
To actually get details on the CPU usage, I often ssh to each node and use commands like top
and ps
, but that can be painful if there are many nodes to check. We have the Ganglia monitoring system set up on our clusters, so I can use Ganglia's web interface to check various node statistics. You might want to check with your system administrators to see if they have set anything up for monitoring.