Limiting size of hierarchical data for reproducible example
I am trying to come up with reproducible example (RE) for this question: Errors related to data frame columns during merging. To be qualified as having a RE, the question lacks only reproducible data. However, when I tried to use pretty much standard approach of dput(head(myDataObj))
, the output produced is 14MB size file. The problem is that my data object is a list of data frames, so head()
limitation doesn't appear to work recursively .
I haven't found any options for dput()
and head()
functions that would allow me to control data size recursively for complex objects. Unless I am wrong on the above, what other approaches to creating a minimal RE dataset would you recommend me in this situation?
Along the lines of @MrFlick's comment of using lapply
, you may use any of the apply
family of functions to perform the head
or sample
functions depending on your needs in order to reduce the size for both REs and for testing purposes (I've found that working with subsets or subsamples of large sets of data is preferable for debugging and even charting).
It should be noted that head
and tail
provide the first or last bits of a structure, but sometimes these don't have sufficient variance in them for RE purposes, and are certainly not random, which is where sample
may become more useful.
Suppose we have a hierarchical tree structure (list of lists of...) and we want to subset each "leaf" while preserving the structure and labels in the tree.
x <- list(
a=1:10,
b=list( ba=1:10, bb=1:10 ),
c=list( ca=list( caa=1:10, cab=letters[1:10], cac="hello" ), cb=toupper( letters[1:10] ) ) )
NOTE: In the following, I actually can't tell the difference between using how="replace"
and how="list"
.
ALSO NOTE: This won't be great for data.frame
leaf nodes.
# Set seed so the example is reproducible with randomized methods:
set.seed(1)
You can use the default head
in a recursive apply in this way:
rapply( x, head, how="replace" )
Or pass an anonymous function that modifies the behavior:
# Complete anonymous function
rapply( x, function(y){ head(y,2) }, how="replace" )
# Same behavior, but using the rapply "..." argument to pass the n=2 to head.
rapply( x, head, how="replace", n=2 )
The following gets a randomized sample
ordering of each leaf:
# This works because we use minimum in case leaves are shorter
# than the requested maximum length.
rapply( x, function(y){ sample(y, size=min(length(y),2) ) }, how="replace" )
# Less efficient, but maybe easier to read:
rapply( x, function(y){ head(sample(y)) }, how="replace" )
# XXX: Does NOT work The following does **not** work
# because `sample` with a `size` greater than the
# item being sampled does not work (when
# sampling without replacement)
rapply( x, function(y){ sample(y, size=2) }, how="replace" )
链接地址: http://www.djcxy.com/p/4224.html
下一篇: 限制分层数据的大小以重现示例