Splitting a list based on another list values in Mathematica
In Mathematica I have a list of point coordinates
size = 50;
points = Table[{RandomInteger[{0, size}], RandomInteger[{0, size}]}, {i, 1, n}];
and a list of cluster indices these points belong to
clusterIndices = {1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1};
what is the easiest way to split the points into two separate lists based on the clusterIndices values?
EDIT: The solution I came up with:
pointIndices =
Map[#[[2]] &,
GatherBy[MapIndexed[{#1, #2[[1]]} &, clusterIndices], First],
{2}];
pointsByCluster = Map[Part[points, #] &, pointIndices];
It there a better way to do this?
这个怎么样?
points[[
Flatten[Position[clusterIndices, #]]
]] & /@
Union[clusterIndices]
As @High Performance Mark and @Nicholas Wilson said, I'd start with combining the two lists together via Transpose
or Thread
. In this case,
In[1]:= Transpose[{clusterIndices, points}]==Thread[{clusterIndices, points}]
Out[1]:= True
At one point, I looked at which was faster, and I think Thread
is marginally faster. But, it only really matters when you are using very long lists.
@High Performance Mark makes a good point in suggesting Select
. But, it would only allow you to pull a single cluster out at a time. The code for selecting cluster 1 is as follows:
Select[Transpose[{clusterIndices, points}], #[[1]]==1& ][[All, All, 2]]
Since you seem to want to generate all clusters, I'd suggest doing the following:
GatherBy[Transpose[{clusterIndices, points}], #[[1]]& ][[All, All, 2]]
which has the advantage of being a one liner and the only tricky part was in selecting the correct Part
of the resulting list. The trick in determining how many All
terms are necessary is to note that
Transpose[{clusterIndices, points}][[All,2]]
is required to get the points back out of the transposed list. But, the "clustered" list has one additional level, hence the second All
.
It should be noted that the second parameter in GatherBy
is a function that accepts one parameter, and it can be interchanged with any function you wish to use. As such, it is very useful. However, if you'd like to transform your data as your gathering it, I'd look at Reap
and Sow
.
Edit: Reap
and Sow
are somewhat under used, and fairly powerful. They're somewhat confusing to use, but I suspect GatherBy
is implemented using them internally. For instance,
Reap[ Sow[#[[2]], #[[1]] ]& /@ Transpose[{clusterIndices, points}], _, #2& ]
does the same thing as my previous code without the hassle of stripping off the indices from the points. Essentially, Sow
tags each point with its index, then Reap gathers all of the tags ( _
for the 2nd parameter) and outputs only the points. Personally, I use this instead of GatherBy, and I've encoded it into a function which I load, as follows:
SelectEquivalents[x_List,f_:Identity, g_:Identity, h_:(#2&)]:=
Reap[Sow[g[#],{f[#]}]&/@x, _, h][[2]];
Note: this code is a modified form of what was in the help files in 5.x. But, the 6.0 and 7.0 help files removed a lot of the useful examples, and this was one of them.
Here's a succinct way to do this using the new SplitBy
function in version 7.0 that should be pretty fast:
SplitBy[Transpose[{points, clusterIndices}], Last][[All, All, 1]]
If you aren't using 7.0, you can implement this as:
Split[Transpose[{points, clusterIndices}], Last[#]==Last[#2]& ][[All, All, 1]]
Update
Sorry, I didn't see that you only wanted two groups, which I think of as clustering, not splitting. Here's some code for that:
FindClusters[Thread[Rule[clusterIndices, points]]]
链接地址: http://www.djcxy.com/p/35498.html