我需要通过Spark Graphx来完成join / joinVertices或在图形中添加一个字段
我有一个带有元组(s,p,o)的RDF图(链接),我根据它创建了一个属性图。 我的RDF属性图通过以下代码获得(完整代码):
val propGraph = Graph(vertexArray,edgeArray).cache()
propGraph.triplets.foreach(println(_))
输出如下:
((vId_src,src_att),(vId_dst,dst_att),property)
和RDF数据为:
((0,<http://umkc.edu/xPropGraph#franklin>),(1,http://umkc.edu/xPropGraph#rxin>),<http://umkc.edu/xPropGraph#advisor>)
((1,<http://umkc.edu/xPropGraph#rxin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#collab>)
((2147483648,<http://umkc.edu/xPropGraph#peter>),(4294967295,<http://umkc.edu/xPropGraph#John),<http://umkc.edu/xPropGraph#student>)
((6442450942,<http://umkc.edu/xPropGraph#istoica>),(0,<http://umkc.edu/xPropGraph#franklin>),<http://umkc.edu/xPropGraph#colleague>)
((0,<http://umkc.edu/xPropGraph#franklin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#pi>)
当我申请connectedComponents()
我得到了ccID
cc
图,
val cc = propGraph.connectedComponents().cache()
cc.triplets.foreach(println(_))
输出为:
((0,0),(2,0),<http://umkc.edu/xPropGraph#pi>)
((0,0),(1,0),<http://umkc.edu/xPropGraph#advisor>)
((1,0),(2,0),<http://umkc.edu/xPropGraph#collab>)
((2147483648,2147483648),(4294967295,2147483648),<http://umkc.edu/xPropGraph#student>)
((6442450942,0),(0,0),<http://umkc.edu/xPropGraph#colleague>)
我需要得到像这样的东西:
((vId_src,src_att),(vId_dst,dst_att),property, ccID)
即我需要在这个三元组/图形格式的结果:
((0,<http://umkc.edu/xPropGraph#franklin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#pi>,0)
((6442450942,<http://umkc.edu/xPropGraph#istoica>),(0,<http://umkc.edu/xPropGraph#franklin>),<http://umkc.edu/xPropGraph#colleague>,0)
((0,<http://umkc.edu/xPropGraph#franklin>),(1,<http://umkc.edu/xPropGraph#rxin>),<http://umkc.edu/xPropGraph#advisor>,0)
((1,<http://umkc.edu/xPropGraph#rxin>),(2,<http://umkc.edu/xPropGraph#jgonzal>),<http://umkc.edu/xPropGraph#collab>,0)
((2147483648,<http://umkc.edu/xPropGraph#peter>),(4294967295,<http://umkc.edu/xPropGraph#John),<http://umkc.edu/xPropGraph#student>,2147483648)
所以我的选择可能来自加入。 我尝试做一些类似val triplets = propGraph.joinVertices(cc.vertices)
但无法正确执行。 有什么办法可以得到这个吗?
任何帮助表示赞赏! 我是Graphx的新手。:)
正如我所查找的((vId_src,src_att),(vId_dst,dst_att),property, ccID)
所以我使用zip()
为两个RDD。
val cc: Graph[graphx.VertexId,String] = propGraph.connectedComponents().cache()
println("###GRAPH WITH CONNECTED COMPONENTS ###")
cc.triplets.foreach(println(_))
println("###VERTICES OF CONNECTED COMPONENTS GRAPH ###")
cc.vertices.foreach(println(_))
println("###EDGES OF CONNECTED COMPONENTS GRAPH ###")
cc.edges.foreach(println(_))
/**
* Alternative way for join operation*/
println("###STEP-2 GETTING ONE MERGED RDD OF NEW GRAPH###")
val newGraph: RDD[String] = propGraph.triplets.map(t =>t.srcId +","+ t.srcAttr+"),"+"("+t.dstId+","+ t.dstAttr+"),"+t.attr)
val ccID: RDD[String]=cc.triplets.map(t=>t.srcAttr+"")
val newPropGraph: RDD[(String,String)]= newGraph.zip(ccID)
newPropGraph.collect.foreach(println(_))
这样做后,我得到了以下输出:
(4294967296,<http://umkc.edu/xPropGraph#node1>),(2147483649,<http://umkc.edu/xPropGraph#node2>),<http://umkc.edu/xPropGraph#prop1>,0)
(2147483649,<http://umkc.edu/xPropGraph#node2>),(6442450942,<http://umkc.edu/xPropGraph#node4>),<http://umkc.edu/xPropGraph#prop5>,0)
(4294967295,<http://umkc.edu/xPropGraph#node5>),(2147483648,<http://umkc.edu/xPropGraph#node6>),<http://umkc.edu/xPropGraph#prop3>,2147483648)
(0,<http://umkc.edu/xPropGraph#node3>),(6442450942,<http://umkc.edu/xPropGraph#node4>),<http://umkc.edu/xPropGraph#prop2>,0)
(2147483649,<http://umkc.edu/xPropGraph#node2>),(0,<http://umkc.edu/xPropGraph#node3>),<http://umkc.edu/xPropGraph#prop4>,0)
链接地址: http://www.djcxy.com/p/65855.html
上一篇: I need to do join/joinVertices or add a field in tuple in graph by Spark Graphx