Mapreduce作业失败,但未进行测试

我在Python中有一个mapreduce作业,当我在测试中运行它时,它可以工作,但是当我通过hadoop时失败。

我正在使用我从以下文件获得的文件:

wget http://stat-computing.org/dataexpo/2009/2008.csv.bz2

地图工作是:

#!/usr/bin/python
import sys
for line in sys.stdin:
    values = line.split(',')
    if values[13] != 'AirTime' and values[13] != 'NA':
        print '%s%st%st%s' % (values[8], values[9], 'flights', 1)
        print '%s%st%st%s' % (values[8], values[9], 'airTime', float(values[13]))

减少工作:

#!/usr/bin/python
import sys
(lastFlight, total, time) = (None, 0, 0)
for line in sys.stdin:
    (flight, key, value) = line.split('t')
    if lastFlight and flight !=lastFlight:
        if total > 0:
            print '%st%f' % (lastFlight, time/total)
        lastFlight = flight
        if key == 'flights':
            (flight, total, time) = (value, float(value), 0)
        elif key == 'airTime':
            (flight, total, time) = (value, 0, float(value))
    else:
        lastFlight = flight
        (total, time) = (total + float(value), time + float(value))
if lastFlight:
    if total > 0:
        print '%st%f' % (lastFlight, time/total)

测试说明:

head *.csv ¦ ./map.py ¦ sort ¦ ./reduce.py >out.log 2>&1 

我可以看到输出没有错误

hadoop指令:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop- 
streaming-2.2.0.2.0.10.0-1.jar –input /user/flight/*.csv 
–output /user/flight/result1 –file map.py –file reduce.py 
–mapper map.py –mapper map.py –combiner reduce.py –reducer reduce.py

地图起作用了,但是我在减少时遇到了一个错误。 错误不是非常具体:

16/11/26 11:17:10 INFO mapreduce.Job:  map 100% reduce 28%
16/11/26 11:17:11 INFO mapreduce.Job: 
    Task Id : attempt_1480024909550_0014_r_000000_0, 
    Status : FAILED
Error: java.lang.RuntimeException: 
    PipeMapRed.waitOutputThreads(): subprocess failed with code 1

如果我看看我得到的工作日志:

line 7, in <module>
(flight, key, value) = line.split('t')

任何想法为什么减少部分会失败

谢谢

链接地址: http://www.djcxy.com/p/67713.html

上一篇: Mapreduce job fails but not in test

下一篇: How to unit test Hadoop MapReduce?