I have recently been working with generating mesh data for a spherical planet made from 6 subdivided planes in Unity3d (like this). I created an algorithm in C# that lets me make the meshes on the CPU and for learning purposes I decided to port this algorithm to the GPU by using a compute shader. While it appears as if all the vertices and normals are getting calculated correctly, the populated triangle array ends up being half empty creating an undesirable mess of a mesh.
Since I consider each of the planes on the planet as a 2d grid of vertices (even though they get warped into a sphere), I compute the triangles on the CPU by looping over the width and height, taking the quad created by the current vertex and the 3 vertices behind and above it (top left, top right, bottom left, current vertex) and assigning the appropriate vertex indexes to the triangle array. On the GPU I copied my code almost line for line as the syntax is very similar and yet it seems to stop computing triangles after a certain point.
To test if it was the fault of the triangles or vertices I overwrote my GPU calculated triangle vertex indexes with the ones calculated by the C# algorithm. This gave me the shape that I wanted indicating that it was indeed just a problem with my triangles.
The resulting mesh data on the CPU correctly looks like this: Correct CPU implementation
The mesh generated incorrectly from the GPU looks like this: Incorrect GPU implementation
The difference between the CPU calculated triangle array and the GPU calculated triangle array can be seen in this image where the GPU is on the left, and the CPU on the right: Triangle array difference
I believe my problem is most likely misunderstanding how the compute shaders actually go out performing their thread calculations / indexing, but could be a problem with my implementation. Any help with understanding why my compute shader implementation is failing me would really help me out. I have included links to my source code on GitHub rather than put code snippets here as they might be difficult to understand out of context.
CPU implementation source code: Source
GPU shader dispatcher source code: Source
GPU compute shader source code: Source