Apache flume twitter agent not streaming data
I am trying to stream twitter feeds to hdfs and then use hive. But the first part, streaming data and loading to hdfs is not working and giving Null Pointer Exception.
This is what I have tried.
1. Downloaded apache-flume-1.4.0-bin.tar . Extracted it. Copied all the contents to /usr/lib/flume/ . in /usr/lib/ i changed owner to the user for flume directory. When I do ls command in /usr/lib/flume/ , it shows
bin CHANGELOG conf DEVNOTES docs lib LICENSE logs NOTICE README RELEASE-NOTES tools
2. Moved to conf/ directory. I copied file flume-env.sh.template
as flume-env.sh And I edited the JAVA_HOME to my java path /usr/lib/jvm/java-7-oracle
.
3. Next I created a file called flume.conf in same conf
directory and added following contents
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <Twitter Application API key>
TwitterAgent.sources.Twitter.consumerSecret = <Twitter Application API secret>
TwitterAgent.sources.Twitter.accessToken = <Twitter Application Access token>
TwitterAgent.sources.Twitter.accessTokenSecret = <Twitter Application Access token secret>
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, couldera, data science, data scientist, business intelligence, mapreduce, datawarehouse, data ware housing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 600
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
I created an app in twitter. Generated token and added all the keys to above file. API Key I added as consumer key .
I downloaded the flume-sources jar from cloudera -files as they mentioned in here.
4. I added the flume-sources-1.0-SNAPSHOT.jar to /user/lib/flume/lib .
5. Started Hadoop and done the following
hadoop fs -mkdir /user/flume/tweets
hadoop fs -chown -R flume:flume /user/flume
hadoop fs -chmod -R 770 /user/flume
6. I run the following in /user/lib/flume
/usr/lib/flume/conf$ bin/flume-ng agent -n TwitterAgent -c conf -f conf/flume-conf
It is showing JARs it is showing and then exiting.
When I checked the hdfs, there is no files in that. hadoop fs -ls /user/flume/tweets
and it is showing nothing.
In hadoop, the core-site.xml file has following configuration
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
<fina1>true</fina1>
</property>
</configuration>
Thanks
我运行以下命令,它已经工作
bin/flume-ng agent –conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n TwitterAgent
我使用这个命令,它正在工作
flume-ng agent --conf /etc/flume-ng/conf/ -f /etc/flume-ng/conf/flume.conf - Dflume.root.logger=DEBUG,console -n TwitterAgent
链接地址: http://www.djcxy.com/p/78724.html
上一篇: 指向功能和ODR的指针