Mapreduce job fails but not in test

2018-06-24 04:02:21

I have a mapreduce job in Python that when I run it in test it works, but when I pass it through hadoop fails.

I'm using a file I got from:

wget http://stat-computing.org/dataexpo/2009/2008.csv.bz2

The map job is:

#!/usr/bin/python
import sys
for line in sys.stdin:
    values = line.split(',')
    if values[13] != 'AirTime' and values[13] != 'NA':
        print '%s%st%st%s' % (values[8], values[9], 'flights', 1)
        print '%s%st%st%s' % (values[8], values[9], 'airTime', float(values[13]))

the reduce job:

#!/usr/bin/python
import sys
(lastFlight, total, time) = (None, 0, 0)
for line in sys.stdin:
    (flight, key, value) = line.split('t')
    if lastFlight and flight !=lastFlight:
        if total > 0:
            print '%st%f' % (lastFlight, time/total)
        lastFlight = flight
        if key == 'flights':
            (flight, total, time) = (value, float(value), 0)
        elif key == 'airTime':
            (flight, total, time) = (value, 0, float(value))
    else:
        lastFlight = flight
        (total, time) = (total + float(value), time + float(value))
if lastFlight:
    if total > 0:
        print '%st%f' % (lastFlight, time/total)

The test instruction:

head *.csv ¦ ./map.py ¦ sort ¦ ./reduce.py >out.log 2>&1

I can see the output generated without errors

The hadoop instruction:

hadoop jar /usr/lib/hadoop-mapreduce/hadoop- 
streaming-2.2.0.2.0.10.0-1.jar –input /user/flight/*.csv 
–output /user/flight/result1 –file map.py –file reduce.py 
–mapper map.py –mapper map.py –combiner reduce.py –reducer reduce.py

The map works, but I get an error with the reduce. The error is not very specific:

16/11/26 11:17:10 INFO mapreduce.Job:  map 100% reduce 28%
16/11/26 11:17:11 INFO mapreduce.Job: 
    Task Id : attempt_1480024909550_0014_r_000000_0, 
    Status : FAILED
Error: java.lang.RuntimeException: 
    PipeMapRed.waitOutputThreads(): subprocess failed with code 1

If I look at the log of the job I get:

line 7, in <module>
(flight, key, value) = line.split('t')

Any ideas why the reduce part would fail

Thanks

链接地址: http://www.djcxy.com/p/67714.html

上一篇: 如何在intelliJ IDEA中分离spring上下文

下一篇: Mapreduce作业失败，但未进行测试