1

I wrote the following shader to render a pattern with a bunch of concentric circles. Eventually I want to have each rotating sphere be a light emitter to create something along these lines.

Of course right now I'm just doing the most basic part to render the different objects.

Unfortunately the shader is incredibly slow (16fps full screen on a high-end macbook). I'm pretty sure this is due to the numerous for loops and branching that I have in the shader. I'm wondering how I can pull off the geometry I'm trying to achieve in a more performance optimized way:

EDIT: you can run the shader here: https://www.shadertoy.com/view/lssyRH

One obvious optimization I am missing is that currently all the fragments are checked against the entire 24 surrounding circles. It would be pretty quick and easy to just discard these checks entirely by checking if the fragment intersects the outer bounds of the diagram. I guess I'm just trying to get a handle on how the best practice is of doing something like this.

#define N 10
#define M 5
#define K 24
#define M_PI 3.1415926535897932384626433832795

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    float aspectRatio = iResolution.x / iResolution.y;

    float h = 1.0;
    float w = aspectRatio;

    vec2 uv = vec2(fragCoord.x / iResolution.x * aspectRatio, fragCoord.y / iResolution.y); 

    float radius = 0.01;
    float orbitR = 0.02;
    float orbiterRadius = 0.005;
    float centerRadius = 0.002;
    float encloseR = 2.0 * orbitR;
    float encloserRadius = 0.002;
    float spacingX = (w / (float(N) + 1.0));
    float spacingY = h / (float(M) + 1.0);
    float x = 0.0;
    float y = 0.0;
    vec4 totalLight = vec4(0.0, 0.0, 0.0, 1.0);
    for (int i = 0; i < N; i++) {
        for (int j = 0; j < M; j++) {
            // compute the center of the diagram
            vec2 center = vec2(spacingX * (float(i) + 1.0), spacingY * (float(j) + 1.0));
            x =  center.x + orbitR * cos(iGlobalTime);
            y =  center.y + orbitR * sin(iGlobalTime);
            vec2 bulb = vec2(x,y);
            if (length(uv - center) < centerRadius) {
                // frag intersects white center marker                   
                fragColor = vec4(1.0);
                return;               
            } else if (length(uv - bulb) < radius) {
                // intersects rotating "light"
                fragColor = vec4(uv,0.5+0.5*sin(iGlobalTime),1.0);
                return;
            } else {
                // intersects one of the enclosing 24 cylinders
                for(int k = 0; k < K; k++) {
                    float theta = M_PI * 2.0 * float(k)/ float(K);
                    x = center.x + cos(theta) * encloseR;
                    y = center.y + sin(theta) * encloseR;
                    vec2 encloser = vec2(x,y);
                    if (length(uv - encloser) < encloserRadius) {
                        fragColor = vec4(uv,0.5+0.5*sin(iGlobalTime),1.0);
                    return;
                    }
                }   
            }
        }
    }


}
2
  • Can you pre-calculate those zillions of sin() and cos() and send them somehow to the shader instead of calculating them inside the shader? Commented Feb 14, 2017 at 21:32
  • Your shader doesn't even work, at least for me there is lots of artifacting and you have lots of unused variables in it... Commented Feb 15, 2017 at 0:18

2 Answers 2

1

Keeping in mind that you want to optimize the fragment shader, and only the fragment shader:

  1. Move the sin(iGlobalTime) and cos(iGlobalTime) out of the loops, these remain static over the whole draw call so no need to recalculate them every loop iteration.
  2. GPUs employ vectorized instruction sets (SIMD) where possible, take advantage of that. You're wasting lots of cycles by doing multiple scalar ops where you could use a single vector instruction(see annotated code) [Three years wiser me here: I'm not really sure if this statement is true in regards to how modern GPUs process the instructions, however it certainly does help readability and maybe even give a hint or two to the compiler]
  3. Do your radius checks squared, save that sqrt(length) for when you really need it
  4. Replace float casts of constants(your loop limits) with a float constant(intelligent shader compilers will already do this, not something to count on though)
  5. Don't have undefined behavior in your shader(not writing to gl_FragColor)

Here is an optimized and annotated version of your shader(still containing that undefined behavior, just like the one you provided). Annotation is in the form of:

// annotation
// old code, if any
new code
#define N 10
// define float constant N
#define fN 10.
#define M 5
// define float constant M
#define fM 5.
#define K 24
// define float constant K
#define fK 24.
#define M_PI 3.1415926535897932384626433832795
// predefine 2 times PI
#define M_PI2 6.28318531

void mainImage( out vec4 fragColor, in vec2 fragCoord )
{
    float aspectRatio = iResolution.x / iResolution.y;

    // we dont need these separate
    // float h = 1.0;
    // float w = aspectRatio;

    // use vector ops(2 divs 1 mul => 1 div 1 mul)
    // vec2 uv = vec2(fragCoord.x / iResolution.x * aspectRatio, fragCoord.y / iResolution.y); 
    vec2 uv = fragCoord.xy / iResolution.xy;
    uv.x *= aspectRatio;

    // most of the following declarations should be predefined  or marked as "const"...

    float radius = 0.01;
    // precalc squared radius
    float radius2 = radius*radius;
    float orbitR = 0.02;
    float orbiterRadius = 0.005;
    float centerRadius = 0.002;
    // precalc squared center radius
    float centerRadius2 = centerRadius * centerRadius;
    float encloseR = 2.0 * orbitR;
    float encloserRadius = 0.002;
    // precalc squared encloser radius
    float encloserRadius2 = encloserRadius * encloserRadius;

    // Use float constants and vector ops here(2 casts 2 adds 2 divs => 1 add 1 div)
    // float spacingX = w / (float(N) + 1.0);
    // float spacingY = h / (float(M) + 1.0);
    vec2 spacing = vec2(aspectRatio, 1.0) / (vec2(fN, fM)+1.);

    // calc sin and cos of global time
    // saves N*M(sin,cos,2 muls) 
    vec2 stct = vec2(sin(iGlobalTime), cos(iGlobalTime));
    vec2 orbit = orbitR * stct;

    // not needed anymore
    // float x = 0.0;
    // float y = 0.0;

    // was never used
    // vec4 totalLight = vec4(0.0, 0.0, 0.0, 1.0);

    for (int i = 0; i < N; i++) {
        for (int j = 0; j < M; j++) {
            // compute the center of the diagram
            // Use vector ops
            // vec2 center = vec2(spacingX * (float(i) + 1.0), spacingY * (float(j) + 1.0));
            vec2 center = spacing * (vec2(i,j)+1.0);

            // Again use vector opts, use precalced time trig(orbit = orbitR * stct)
            // x = center.x + orbitR * cos(iGlobalTime);
            // y = center.y + orbitR * sin(iGlobalTime);
            // vec2 bulb = vec2(x,y);
            vec2 bulb = center + orbit;
            // calculate offsets
            vec2 centerOffset = uv - center;
            vec2 bulbOffset = uv - bulb;
            // use squared length check
            // if (length(uv - center) < centerRadius) {
            if (dot(centerOffset, centerOffset) < centerRadius2) {
                // frag intersects white center marker                   
                fragColor = vec4(1.0);
                return;               
            // use squared length check
            // } else if (length(uv - bulb) < radius) {
            } else if (dot(bulbOffset, bulbOffset) < radius2) {
                // Use precalced sin global time in stct.x
                // intersects rotating "light"
                fragColor = vec4(uv,0.5+0.5*stct.x,1.0);
                return;
            } else {
                // intersects one of the enclosing 24 cylinders
                for(int k = 0; k < K; k++) {
                    // use predefined 2*PI and float K
                    float theta = M_PI2 * float(k) / fK;
                    // Use vector ops(2 muls 2 adds => 1 mul 1 add)
                    // x = center.x + cos(theta) * encloseR;
                    // y = center.y + sin(theta) * encloseR;
                    // vec2 encloser = vec2(x,y);
                    vec2 encloseOffset = uv - (center + vec2(cos(theta),sin(theta)) * encloseR);
                    if (dot(encloseOffset,encloseOffset) < encloserRadius2) {
                        fragColor = vec4(uv,0.5+0.5*stct.x,1.0);
                        return;
                    }
                }   
            }
        }
    }
}
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you LJ - this is very helpful. Definitely a lot of improvements to be made to the computations.
0

I did a little more thinking ... I realized the best way to optimize it is to actually change the logic so that before doing intersection tests on the small circles it checks the bounds of the group of circles. This got it to run at 60fps:

Example here: https://www.shadertoy.com/view/lssyRH

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.