Mapreduce job fails but not in test
I have a mapreduce job in Python that when I run it in test it works, but when I pass it through hadoop fails.
I'm using a file I got from:
wget http://stat-computing.org/dataexpo/2009/2008.csv.bz2
The map job is:
#!/usr/bin/python
import sys
for line in sys.stdin:
values = line.split(',')
if values[13] != 'AirTime' and values[13] != 'NA':
print '%s%st%st%s' % (values[8], values[9], 'flights', 1)
print '%s%st%st%s' % (values[8], values[9], 'airTime', float(values[13]))
the reduce job:
#!/usr/bin/python
import sys
(lastFlight, total, time) = (None, 0, 0)
for line in sys.stdin:
(flight, key, value) = line.split('t')
if lastFlight and flight !=lastFlight:
if total > 0:
print '%st%f' % (lastFlight, time/total)
lastFlight = flight
if key == 'flights':
(flight, total, time) = (value, float(value), 0)
elif key == 'airTime':
(flight, total, time) = (value, 0, float(value))
else:
lastFlight = flight
(total, time) = (total + float(value), time + float(value))
if lastFlight:
if total > 0:
print '%st%f' % (lastFlight, time/total)
The test instruction:
head *.csv ¦ ./map.py ¦ sort ¦ ./reduce.py >out.log 2>&1
I can see the output generated without errors
The hadoop instruction:
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-
streaming-2.2.0.2.0.10.0-1.jar –input /user/flight/*.csv
–output /user/flight/result1 –file map.py –file reduce.py
–mapper map.py –mapper map.py –combiner reduce.py –reducer reduce.py
The map works, but I get an error with the reduce. The error is not very specific:
16/11/26 11:17:10 INFO mapreduce.Job: map 100% reduce 28%
16/11/26 11:17:11 INFO mapreduce.Job:
Task Id : attempt_1480024909550_0014_r_000000_0,
Status : FAILED
Error: java.lang.RuntimeException:
PipeMapRed.waitOutputThreads(): subprocess failed with code 1
If I look at the log of the job I get:
line 7, in <module>
(flight, key, value) = line.split('t')
Any ideas why the reduce part would fail
Thanks
链接地址: http://www.djcxy.com/p/67714.html上一篇: 如何在intelliJ IDEA中分离spring上下文
下一篇: Mapreduce作业失败,但未进行测试