Tricks to manage the available memory in an R session

What tricks do people use to manage the available memory of an interactive R session? I use the functions below [based on postings by Petr Pikal and David Hinds to the r-help list in 2004] to list (and/or sort) the largest objects and to occassionally rm() some of them. But by far the most effective solution was ... to run under 64-bit Linux with ample memory.

Any other nice tricks folks want to share? One per post, please.

# improved list of objects
.ls.objects <- function (pos = 1, pattern, order.by,
                        decreasing=FALSE, head=FALSE, n=5) {
    napply <- function(names, fn) sapply(names, function(x)
                                         fn(get(x, pos = pos)))
    names <- ls(pos = pos, pattern = pattern)
    obj.class <- napply(names, function(x) as.character(class(x))[1])
    obj.mode <- napply(names, mode)
    obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
    obj.size <- napply(names, object.size)
    obj.dim <- t(napply(names, function(x)
                        as.numeric(dim(x))[1:2]))
    vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
    obj.dim[vec, 1] <- napply(names, length)[vec]
    out <- data.frame(obj.type, obj.size, obj.dim)
    names(out) <- c("Type", "Size", "Rows", "Columns")
    if (!missing(order.by))
        out <- out[order(out[[order.by]], decreasing=decreasing), ]
    if (head)
        out <- head(out, n)
    out
}
# shorthand
lsos <- function(..., n=10) {
    .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
}

Ensure you record your work in a reproducible script. From time-to-time, reopen R, then source() your script. You'll clean out anything you're no longer using, and as an added benefit will have tested your code.


I use the data.table package. With its := operator you can :

  • Add columns by reference
  • Modify subsets of existing columns by reference, and by group by reference
  • Delete columns by reference
  • None of these operations copy the (potentially large) data.table at all, not even once.

  • Aggregation is also particularly fast because data.table uses much less working memory.
  • Related links :

  • News from data.table, London R presentation, 2012
  • When should I use the := operator in data.table?

  • Saw this on a twitter post and think it's an awesome function by Dirk! Following on from JD Long's answer, I would do this for user friendly reading:

    # improved list of objects
    .ls.objects <- function (pos = 1, pattern, order.by,
                            decreasing=FALSE, head=FALSE, n=5) {
        napply <- function(names, fn) sapply(names, function(x)
                                             fn(get(x, pos = pos)))
        names <- ls(pos = pos, pattern = pattern)
        obj.class <- napply(names, function(x) as.character(class(x))[1])
        obj.mode <- napply(names, mode)
        obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class)
        obj.prettysize <- napply(names, function(x) {
                               format(utils::object.size(x), units = "auto") })
        obj.size <- napply(names, object.size)
        obj.dim <- t(napply(names, function(x)
                            as.numeric(dim(x))[1:2]))
        vec <- is.na(obj.dim)[, 1] & (obj.type != "function")
        obj.dim[vec, 1] <- napply(names, length)[vec]
        out <- data.frame(obj.type, obj.size, obj.prettysize, obj.dim)
        names(out) <- c("Type", "Size", "PrettySize", "Length/Rows", "Columns")
        if (!missing(order.by))
            out <- out[order(out[[order.by]], decreasing=decreasing), ]
        if (head)
            out <- head(out, n)
        out
    }
    
    # shorthand
    lsos <- function(..., n=10) {
        .ls.objects(..., order.by="Size", decreasing=TRUE, head=TRUE, n=n)
    }
    
    lsos()
    

    Which results in something like the following:

                          Type   Size PrettySize Length/Rows Columns
    pca.res                 PCA 790128   771.6 Kb          7      NA
    DF               data.frame 271040   264.7 Kb        669      50
    factor.AgeGender   factanal  12888    12.6 Kb         12      NA
    dates            data.frame   9016     8.8 Kb        669       2
    sd.                 numeric   3808     3.7 Kb         51      NA
    napply             function   2256     2.2 Kb         NA      NA
    lsos               function   1944     1.9 Kb         NA      NA
    load               loadings   1768     1.7 Kb         12       2
    ind.sup             integer    448  448 bytes        102      NA
    x                 character     96   96 bytes          1      NA
    

    NOTE: The main part I added was (again, adapted from JD's answer) :

    obj.prettysize <- napply(names, function(x) {
                               print(object.size(x), units = "auto") })
    
    链接地址: http://www.djcxy.com/p/63270.html

    上一篇: 增加(或减少)R进程可用的内存

    下一篇: 技巧来管理R会话中的可用内存