Render script rendering is much slower than OpenGL rendering on Android

BACKGROUND:

I want to add live filter based on the code of Android camera app. But the architecture of Android camera app is based on OpenGL ES 1.x. I need to use shader to custom our filter implementation. However, it is too difficult to update the camera app to OpenGL ES 2.0. Then I have to find some other methods to implement live filter instead of OpenGL. I decided to use render script after some research.

PROBLEM:

I have wrote a demo of a simple filter by render script. It shows that the fps is much lower than implementing it by OpenGL. About 5 fps vs 15 fps.

QUESTIONS:

  • The Android official offsite says: The RenderScript runtime will parallelize work across all processors available on a device, such as multi-core CPUs, GPUs, or DSPs, allowing you to focus on expressing algorithms rather than scheduling work or load balancing. Then why is render script implementation slower?

  • If render script cannot satisfy my requirement, is there a better way?

  • CODE DETAILS:

    Hi I am in the same team with the questioner. We want to write a render-script based live-filter camera. In our test-demo-project, we use a simple filter: a YuvToRGB IntrinsicScript added with a overlay-filter ScriptC script. In the OpenGL version, we set the camera data as textures and do the image-filter-procss with shader. Like this:

        GLES20.glActiveTexture(GLES20.GL_TEXTURE0);
        GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, textureYHandle);
        GLES20.glUniform1i(shader.uniforms.get("uTextureY"), 0);
        GLES20.glTexSubImage2D(GLES20.GL_TEXTURE_2D, 0, 0, 0, mTextureWidth,
                mTextureHeight, GLES20.GL_LUMINANCE, GLES20.GL_UNSIGNED_BYTE,
                mPixelsYBuffer.position(0));
    

    In the RenderScript version, we set the camera data as Allocation and do the image-filter-procss with script-kernals. Like this:

        // The belowing code is from onPreviewFrame(byte[] data, Camera camera) which gives the camera frame data 
        byte[] imageData = datas[0];
        long timeBegin = System.currentTimeMillis();
        mYUVInAllocation.copyFrom(imageData);
    
        mYuv.setInput(mYUVInAllocation);
        mYuv.forEach(mRGBAAllocationA);
        // To make sure the process of YUVtoRGBA has finished!
        mRGBAAllocationA.copyTo(mOutBitmap);    
        Log.e(TAG, "RS time: YUV to RGBA : " + String.valueOf((System.currentTimeMillis() - timeBegin)));   
    
        mLayerScript.forEach_overlay(mRGBAAllocationA, mRGBAAllocationB);
        mRGBAAllocationB.copyTo(mOutBitmap);    
        Log.e(TAG, "RS time: overlay : " + String.valueOf((System.currentTimeMillis() - timeBegin)));
    
        mCameraSurPreview.refresh(mOutBitmap, mCameraDisplayOrientation, timeBegin);
    

    The two problems are : (1) RenderScript process seems slower than OpenGL process. (2) According to our time-log, the process of YUV to RGBA which uses intrinsic script is very quick, takes about 6ms; but the process of overlay which uses scriptC is very slow, takes about 180ms. How does this happen?

    Here is the rs-kernal code of the ScriptC we use(mLayerScript):

    #pragma version(1)
    #pragma rs java_package_name(**.renderscript)
    #pragma stateFragment(parent)
    
    #include "rs_graphics.rsh"
    
    static rs_allocation layer;
    static uint32_t dimX;
    static uint32_t dimY;
    
    void setLayer(rs_allocation layer1) {
        layer = layer1;
    }
    
    void setBitmapDim(uint32_t dimX1, uint32_t dimY1) {
        dimX = dimX1;
        dimY = dimY1;
    }
    
    static float BlendOverlayf(float base, float blend) {
        return (base < 0.5 ? (2.0 * base * blend) : (1.0 - 2.0 * (1.0 - base) * (1.0 - blend)));
    }
    
    static float3 BlendOverlay(float3 base, float3 blend) {
        float3 blendOverLayPixel = {BlendOverlayf(base.r, blend.r), BlendOverlayf(base.g, blend.g), BlendOverlayf(base.b, blend.b)};
        return blendOverLayPixel;
    }
    
    uchar4 __attribute__((kernel)) overlay(uchar4 in, uint32_t x, uint32_t y) {
        float4 inPixel = rsUnpackColor8888(in);
    
        uint32_t layerDimX = rsAllocationGetDimX(layer);
        uint32_t layerDimY = rsAllocationGetDimY(layer);
    
        uint32_t layerX = x * layerDimX / dimX;
        uint32_t layerY = y * layerDimY / dimY;
    
        uchar4* p = (uchar4*)rsGetElementAt(layer, layerX, layerY);
        float4 layerPixel = rsUnpackColor8888(*p);
    
        float3 color = BlendOverlay(inPixel.rgb, layerPixel.rgb);
    
        float4 outf = {color.r, color.g, color.b, inPixel.a};
        uchar4 outc = rsPackColorTo8888(outf.r, outf.g, outf.b, outf.a);
    
        return outc;
    }
    

    Renderscript does not use any GPU or DSPs cores. That is a common misconception encouraged by Google's deliberately vague documentation. Renderscript used to have an interface to OpenGL ES, but that has been deprecated and has never been used for much beyond animated wallpapers. Renderscript will use multiple CPU cores, if available, but I suspect Renderscript will be replaced by OpenCL.

    Take a look at the Effects class and the Effects demo in the Android SDK. It shows how to use OpenGL ES 2.0 shaders to apply effects to images without writing OpenGL ES code.

    http://software.intel.com/en-us/articles/porting-opengl-games-to-android-on-intel-atom-processors-part-1

    UPDATE:

    It's wonderful when I learn more answering a question than asking one and that is the case here. You can see from the lack of answers that Renderscript is hardly used outside of Google because of its strange architecture that ignores industry standards like OpenCL and almost non-existent documentation on how it actually works. Nonetheless, my answer did evoke a rare response from the Renderscrpt development team which includes only one link that actually contains any useful information about renderscript - this article by Alexandru Voica at IMG, the PowerVR GPU vendor:

    http://withimagination.imgtec.com/index.php/powervr/running-renderscript-efficiently-with-powervr-gpus-on-android

    That article has some good information which was new to me. There are comments posted there from more people who are having trouble getting Renderscript code to actually run on the GPU.

    But, I was incorrect to assume that Renderscript is no longer being developed at Google. Although my statement that "Renderscript does not use any GPU or DSPs cores." was true until just fairly recently, I have learned that this has changed as of one of the Jelly Bean releases. It would have been great if one of the Renderscript developers could have explained that. Or even if they had a public webpage that explains that or that lists which GPUs are actually supported and how can you tell if your code actually gets run on a GPU.

    My opinion is that Google will replace Renderscript with OpenCL eventually and I would not invest time developing with it.

    链接地址: http://www.djcxy.com/p/18992.html

    上一篇: 更改条形色彩使模糊区域变成黑色和白色

    下一篇: 渲染脚本渲染比Android上的OpenGL渲染慢得多