Accessing local variable doesn't improve performance

****Clarification**: I'm not looking for the fastest code or optimization. I would like to understand why some code that seem to not be optimized or optimal run in fact in general consistently faster.

The short version

Why is this code:

var index = (Math.floor(y / scale) * img.width + Math.floor(x / scale)) * 4;

More performant than this one?

var index = Math.floor(ref_index) * 4;

The long version

This week, the author of Impact js published an article about some rendering issue:

http://www.phoboslab.org/log/2012/09/drawing-pixels-is-hard

In the article there was the source of a function to scale an image by accessing pixels in the canvas. I wanted to suggest some traditional ways to optimize this kind of code so that the scaling would be shorter at loading time. But after testing it my result was most of the time worst that the original function.

Guessing this was the JavaScript engine that was doing some smart optimization I tried to understand a bit more what was going on so I did a bunch of test. But my results are quite confusing and I would need some help to understand what's going on.

I have a test page here:

http://www.mx981.com/stuff/resize_bench/test.html

jsPerf: http://jsperf.com/local-variable-due-to-the-scope-lookup

To start the test, click the picture and the results will appear in the console.

There are three different versions:

The original code:

for( var y = 0; y < heightScaled; y++ ) {
    for( var x = 0; x < widthScaled; x++ ) {
        var index = (Math.floor(y / scale) * img.width + Math.floor(x / scale)) * 4;
        var indexScaled = (y * widthScaled + x) * 4;
        scaledPixels.data[ indexScaled ] = origPixels.data[ index ];
        scaledPixels.data[ indexScaled+1 ] = origPixels.data[ index+1 ];
        scaledPixels.data[ indexScaled+2 ] = origPixels.data[ index+2 ];
        scaledPixels.data[ indexScaled+3 ] = origPixels.data[ index+3 ];
    }
}

jsPerf: http://jsperf.com/so-accessing-local-variable-doesn-t-improve-performance

One of my attempt to optimize it:

var ref_index = 0;
var ref_indexScaled = 0
var ref_step = 1 / scale;
for( var y = 0; y < heightScaled; y++ ) {
    for( var x = 0; x < widthScaled; x++ ) {
        var index = Math.floor(ref_index) * 4;
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index ];
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+1 ];
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+2 ];
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+3 ];

        ref_index+= ref_step;
    }
}

jsPerf: http://jsperf.com/so-accessing-local-variable-doesn-t-improve-performance

The same optimized code but with recalculating the index variable each time (Hybrid)

var ref_index = 0;
var ref_indexScaled = 0
var ref_step = 1 / scale;
for( var y = 0; y < heightScaled; y++ ) {
    for( var x = 0; x < widthScaled; x++ ) {
        var index = (Math.floor(y / scale) * img.width + Math.floor(x / scale)) * 4;
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index ];
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+1 ];
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+2 ];
        scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ index+3 ];

        ref_index+= ref_step;
    }
}

jsPerf: http://jsperf.com/so-accessing-local-variable-doesn-t-improve-performance

The only difference in the two last one is the calculation of the 'index' variable. And to my surprise the optimized version is slower in most browsers (except opera).

Results of personal testing (not the jsPerf tests):

  • Opera

    Original:  8668ms
    Optimized:  932ms
    Hybrid:    8696ms
    
  • Chrome

    Original:  139ms
    Optimized: 145ms
    Hybrid:    136ms
    
  • Safari

    Original:  433ms
    Optimized: 853ms
    Hybrid:    451ms
    
  • Firefox

    Original:  343ms
    Optimized: 422ms
    Hybrid:    350ms
    
  • After digging around, it seems an usual good practice is to access mainly local variable due to the scope lookup. Because The optimized version only call one local variable it should be faster that the Hybrid code which call multiple variable and object in addition to the various operation involved.

    So why the "optimized" version is slower?

    I thought that it might be because some JavaScript engine don't optimize the Optimized version because it is not hot enough but after using --trace-opt in chrome, it seems all version are properly compiled by V8.

    At this point I am a bit clueless and wonder if somebody would know what is going on?

    I did also some more test cases in this page:

    http://www.mx981.com/stuff/resize_bench/index.html


    As silly as it sounds, the Math.whatever() calls might be tricky to optimize and inline for the JS engines. Whenever possible, prefer an arithmetic operation (not a function call) to achieve the same result.

    Adding the following 4th test to http://www.mx981.com/stuff/resize_bench/test.html

    // Test 4
    console.log('- p01 -');
    start = new Date().getTime();
    for (i=0; i<nbloop; i++) {
      var index = 0;
      var ref_indexScaled = 0
      var ref_step=1/scale;
    
    
      for( var y = 0; y < heightScaled; y++ ) {
        for( var x = 0; x < widthScaled; x++ ) {
          var z= index<<2;
          scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ z++ ];
          scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ z++ ];
          scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ z++ ];
          scaledPixels.data[ ref_indexScaled++ ] = origPixels.data[ z++ ];
    
          index+= ref_step;
        }
      }
    }
    end = new Date().getTime();
    console.log((end-start)+'ms');
    

    Yields the following numbers in Opera Next:

  • Original - 2311ms
  • refactor - 112ms
  • hybrid - 2371ms
  • p01 - 112ms

  • Using some basic techniques you can highly optimize performance:

  • When running multiple loops in loops, use:

    while (i--) { /* some code here */ }

  • ... where i is a value greater than 0.

  • Caching variables / localizing variables appropriately to minimize calculations. For larger calculations this means to place part of the calculation at the right layer of abstraction.

  • Re-using variables (re-initialization overhead can become a problem for large amounts of data processing). NOTE: This IS a bad programming design principle but a great performance principle!

  • Reduce property depth. Using object.property kills performance vs just a var containing "object_propertyvalue".

  • Using those principles you can achieve better performance. Now from a high level, looking at the article you derived this function from, it was flawed in a few ways. So to really optimize the full function instead of just the one line you stated:

    function resize_Test5( img, scale ) {
        // Takes an image and a scaling factor and returns the scaled image
    
        // The original image is drawn into an offscreen canvas of the same size
        // and copied, pixel by pixel into another offscreen canvas with the 
        // new size.
    
        var widthScaled = img.width * scale;
        var heightScaled = img.height * scale;
    
        var orig = document.createElement('canvas');
        orig.width = img.width;
        orig.height = img.height;
        var origCtx = orig.getContext('2d');
        origCtx.drawImage(img, 0, 0);
        var origPixels = origCtx.getImageData(0, 0, img.width, img.height);
    
        var scaled = document.createElement('canvas');
        scaled.width = widthScaled;
        scaled.height = heightScaled;
        var scaledCtx = scaled.getContext('2d');
        var scaledPixels = scaledCtx.getImageData( 0, 0, widthScaled, heightScaled );
    
        // optimization start
        var old_list = origPixels.data;
        var image_width = img.width;
        var h = heightScaled;
        var w = widthScaled;
        var index_old;
        var index_new;
        var h_scale;
        var new_list = [];
        var pre_index_new;
    
        while(h--){
            h_scale = Math.floor(h / scale) * image_width;
            pre_index_new = h * widthScaled;
            while(w--){
                index_old = (h_scale + Math.floor(w / scale)) * 4;
                index_new = (pre_index_new + w) * 4;
                new_list[ index_new ]     = old_list[ index_old ];
                new_list[ index_new + 1 ] = old_list[ index_old + 1 ];
                new_list[ index_new + 2 ] = old_list[ index_old + 2 ];
                new_list[ index_new + 3 ] = old_list[ index_old + 3 ];
            }
        }
        scaledPixels.data = new_list;
        // optimization stop
    
        scaledCtx.putImageData( scaledPixels, 0, 0 );
        return scaled;
    }
    
    链接地址: http://www.djcxy.com/p/64158.html

    上一篇: 未定义参数的性能损失

    下一篇: 访问本地变量不会提高性能