Get JSON elements from a web with Apache Flink
After reading several documentation pages of Apache Flink (official documentation, dataartisans) as well as the examples provided in the official repository, I keep seeing examples where they use as the data source for streamming a file already downloaded, connecting always to the localhost.
I am trying to use Apache Flink to download JSON files which contain dynamic data. My intention is to try to stablish the url where I can access the JSON file as the input source of Apache Flink, instead of downloading it with another system and processing the downloaded file with Apache Flink.
Is it possible to stablish this net connection with Apache Flink?
You can define the URLs you want to download as your input DataStream
and then download the documents from within a MapFunction
. The following code demonstrates this:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> inputURLs = env.fromElements("http://www.json.org/index.html");
inputURLs.map(new MapFunction<String, String>() {
@Override
public String map(String s) throws Exception {
URL url = new URL(s);
InputStream is = url.openStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(is));
StringBuilder builder = new StringBuilder();
String line;
try {
while ((line = bufferedReader.readLine()) != null) {
builder.append(line + "n");
}
} catch (IOException ioe) {
ioe.printStackTrace();
}
try {
bufferedReader.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
return builder.toString();
}
}).print();
env.execute("URL download job");
链接地址: http://www.djcxy.com/p/90474.html
上一篇: 日期关联优化不会更改计划