Hadoop testing using MRUnit

I'm retrofitting a bunch of existing Hadoop unit tests that were previously run in an in-memory cluster (Using MiniMRCluster) into MRUnit. The existing test cases essentially provide input to the Map phase and then test the output from the Reduce phase.

I have three questions, and the best answer to any of them will qualify:

1) What do I lose, architecturally, by unit testing with MRUnit instead of an in-memory cluster?

2) Is it worthwhile to break the existing test cases up into Map-only tests and Reduce-only tests or not? Are there any cases where I would have to break them up?

3) Are there any testing scenarios that MRUnit is unable to cover?


The retrofitting process has taught me some potential answers, which I'm going to post here. I would still prefer to hear what others have to say, though, so I won't accept this answer.

1) I lose at least two things. First, the MR plumbing is mocked. So, there is a chance that some of the 'mocking' hides a problem that may exist in the MR job. Second, an MR job consists of the input from the file system and the output to the file system, in addition to partitioning and ordering between the map and reduce phase. MRUnit doesn't completely handle these aspects of Hadoop, so if an MR job depends on these functions, they can't be tested. It is still possible to rewrite the tests to test just the Map/Reduce parts, though.

2) For the most part, it isn't worthwhile to break up existing tests. If an existing test depends on a partitioner, for example, then it may make sense to break up the test so that the Map and Reduce can be tested without the partitioner involved. In general, though, it isn't worth doing "just to do it."

3) Yes -- Partitioners for one. Output formats for another. This may not be quite as big a deal for some people, but many of our existing jobs rely on these two features and since the unit tests are against the final output from the the output format, I'm having to rewrite quite a few tests to get them to work.

[edit]

just read a blog post from Cloudera that goes to the answer as well:

http://www.cloudera.com/blog/2009/07/debugging-mapreduce-programs-with-mrunit/


看看MRUNIT-101,在一个星期左右的时间里,我们将增加测试真实输出格式的能力

链接地址: http://www.djcxy.com/p/67708.html

上一篇: 如何在mapreduce进行单元测试时跳过真正的调用?

下一篇: 使用MRUnit进行Hadoop测试