Vertex transformation bottleneck?

Apprentice
Posts: 10
Joined: 2008.11
Post: #1
Hi,

So after porting some OpenGL code over to the iPhone I encountered a rather interesting (but not yet frustrating Smile performance issue.

I'm rendering 3 parallel cylinders (64 triangles each) along the Z axis (into the screen). When they're far away, it's all cool - steady 60 fps, no worries. But when they come nearer and start covering a large chunk of the screen the framerate drops to 30 and below.

I thought I must've hit the fillrate limit, but when I added some quads covering half the screen there was no considerable impact on the framerate. When I made the cylinders shorter on the far side the framerate went up a lot compared to the freed up screen space, which was maybe a couple thousands pixels at most. So it couldn't have been the rasterizer causing the bottleneck.

Then I decided to reduce the vertex count to 8 polys per cylinder and... here we go, steady 60 fps regardless of distance. The objects covered approximately the same screen space, so definately no fillrate bottleneck.

This made no sense at all until I read somewhere about the tile based PowerVR MBX renderer. What I'm thinking is, when the GPU draws a tile it needs to transform all polygons covering that tile, which wouldn't be a problem if the triangles were small. But when a polygon covers a large number of tiles, it needs to be individually transformed for each tile. In my case, the tiles covering the parts of the cylinders far away would have caused the renderer to process all 64 polygons over and over again, leading to the vertex stage bottleneck.

Are there any other plausible explanations? Any comments appreciated!
Quote this message in a reply
Apprentice
Posts: 10
Joined: 2008.11
Post: #2
To answer my own question, it seems I was right...

Quoting from the PowerVR Technology Overwiew:

Quote:The ISP processes all triangles affecting a tile one by one. Calculating the triangle equation and projecting a ray at each position in the triangle return accurate depth information for all pixels. This depth information is then compared with the values in the tile’s depth buffer to determine whether these pixels are visible or not.

What this means is that an object made of long but thin triangles will be rendered slower than a larger object consisting of the same amount of polygons. This is because the ISP unit will be transforming and sorting the same polygons over and over again, but for different tiles. For example, an extreme case would be two straight long lines consisting of, lets say 2 and 20 polygons. The 20-poly line would need 18 * number_of_covered_tiles more transformations instead of only the assumed 18. Larger objects don't suffer from this because the polygons are more evenly distributed across the tiles.

In case anyone wondered Smile
Quote this message in a reply
Post Reply