ConnectionLoss for / hbase +连接由对等重置?
我在本地机器上运行Hadoop MapReduce作业(伪分布式),可以读取和写入HBase。 我间歇性地收到一个错误,这会干扰作业,即使计算机处于独立状态而没有其他重要进程在运行 - 请参阅下面的日志。 作业完成后ZooKeeper Dump的输出如下所示,失败运行后客户端数量增加:
HBase is rooted at /hbase
Master address: SS-WS-M102:60000
Region server holding ROOT: SS-WS-M102:60020
Region servers:
SS-WS-M102:60020
Quorum Server Statistics:
ss-ws-m102:2181
Zookeeper version: 3.3.3-cdh3u0--1, built on 03/26/2011 00:20 GMT
Clients:
/192.168.40.120:58484[1](queued=0,recved=39199,sent=39203)
/192.168.40.120:37129[1](queued=0,recved=162,sent=162)
/192.168.40.120:58485[1](queued=0,recved=39282,sent=39316)
/192.168.40.120:58488[1](queued=0,recved=39224,sent=39226)
/192.168.40.120:58030[0](queued=0,recved=1,sent=0)
/192.168.40.120:58486[1](queued=0,recved=39248,sent=39267)
我的开发团队目前正在使用CDH3U0发行版,所以HBase 0.90.1 - 这个问题在更新版本中解决了吗? 还是应该有一些我可以用目前的设置做些什么? 我是否应该期望重新启动ZK并定期关闭客户端? 我愿意接受任何可以让我的工作一贯完成的合理选择。
2012-06-27 13:01:07,289 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server SS-WS-M102/192.168.40.120:2181
2012-06-27 13:01:07,289 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to SS-WS-M102/192.168.40.120:2181, initiating session
2012-06-27 13:01:07,290 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server SS-WS-M102/192.168.40.120:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
at sun.nio.ch.IOUtil.read(IOUtil.java:169)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
[lines above repeat 6 more times]
2012-06-27 13:01:17,890 ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormat: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:991)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.setupZookeeperTrackers(HConnectionManager.java:302)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:293)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:156)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:167)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:145)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:91)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:605)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hbase.ZooKeeperConnectionException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:147)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getZooKeeperWatcher(HConnectionManager.java:989)
... 15 more
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:902)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:133)
... 16 more
事实证明,我正在达到ZooKeeper的低默认限制(我相信它已经在更多的当前版本中有所增加)。 我曾尝试在hbase-site.xml中设置更高的限制:
<property>
<name>hbase.zookeeper.property.maxClientCnxns</name>
<value>35</value>
</property>
但它似乎没有工作,除非它是(也?)在zoo.cfg中指定的:
# can put this number much higher if desired
maxClientCnxns=35
这项工作现在可以运行几个小时,我的ZK客户端列表在12个条目处达到峰值。
检查以下参数:
zookeeper会话超时(zookeeper.session.timeout) - >尝试增加并检查
zookeeper ticktime(tickTime) - >增加并测试
检查ulimit(linux命令检查您运行hadoop / hbase的用户)specificat
在ulimit的情况下,你必须有更多的followin参数。
打开的文件使这个有点32K或更多
最大用户进程使其成为无限制
在做这些更改后,最有可能的是错误将会消失
过去我遇到过类似的问题。 大量的时间使用HBase / Hadoop,你会看到错误信息,这些信息并不指出你正在遇到的真正问题,所以你必须对它有创意。
这是我找到的,它可能会或可能不适用于你:
你是否打开了很多与桌子的连接,并在完成后关闭它们? 如果您在Mapper或Reducer中执行扫描/获取(如果可以避免,我不认为您想这样做),这可能发生在MR作业中。
另外,如果我的Mapper或Reducer写入同一行很多,有时候我会遇到类似的问题。 尝试分发写入或最小化写入以减少此问题。
如果你详细了解你的MR工作的性质,这也会有所帮助。 它有什么作用? 你有示例代码吗?
链接地址: http://www.djcxy.com/p/61017.html