glDrawElements vs. glDrawArrays - The numbers are in!

Member
Posts: 145
Joined: 2002.06
Post: #16
Quote:Originally posted by henryj
Code:
glLockArraysEXT( 0, size);
glDrawElements( GL_TRIANGLES, indices->Size(), GL_UNSIGNED_INT, indices->Data());
glUnlockArraysEXT();

is a waste of time.
FALSE.

I just ran some tests after an inexplicable framerate drop in my app. Uncommenting the lock arrays called (used exactly as you show) increased my max polygon rate by over 300% on my GeForce2MX, and by about 100% on the Rage128. Only the iMac/RagePro didn't care if they were locked or not it seems.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
henryj
Unregistered
 
Post: #17
This is all quite interesting...

I commented out the lock calls in my game and it made NO difference at all, zero, zip.

Apple engineers have said for a while that display lists are slow on OSX, but a colleague of mine has tested them against vertex arrays and they were faster.

What does this mean?
OpenGL performance varies depending on the day, weather, colour of your wall paper?
Tests designed to test performance don't reflect 'real world' conditions?
Who knows?
Best bet is to profile your own code and work from there.

I will download you test and get back to you.
Quote this message in a reply
henryj
Unregistered
 
Post: #18
I've given your test prog a spin and this is what I have found...

Performance on the same run varied up to 10% on the same test.
Performance on different runs varied up to 20%

I'm getting around 1100k polys per frame on a G4 dual 500 Radeon 10.1.5. This seems quite slow. I would expect around 4 million polys/sec.

The reason why the lock calls made such a difference is because you are rendering the same geometry every frame. This to be expected as I said...

Quote:CVA is only going to benefit you if you are touching the same geometry multiple times. eg doing multi passes for lightmapping or you have some objects that some how share exactly the same vertices. You should be calling glDrawElements lots between your lock calls.

You are effectively caching your mesh on the video card and repeatedly calling drawElements, which is the ideal situation but not very representative. I don't know a lot of games that only have one object.
My game comprises over 200 different meshes and renders about 90 per frame, with about 30 different textures and about 15 different material setups. How about doing a real world test. Load 2 different meshes with different textures and render them alternatively.

Good work though. It was quite interesting.
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #19
Quote:Originally posted by henryj
The reason why the lock calls made such a difference is because you are rendering the same geometry every frame. This to be expected as I said...
I lock and unlock the arrays immidiately before and after each glDrawElements. Unless the driver is doing something really sneaky (checksuming a small sample of the array data?), the geomtry is being re-submitted to the card for every draw. Check the source.

Quote:My game comprises over 200 different meshes and renders about 90 per frame, with about 30 different textures and about 15 different material setups. How about doing a real world test. Load 2 different meshes with different textures and render them alternatively.
This wasn't designed to be a real world speed test. It was only designed to determine what the fastest way to submit polygons to the graphics card was.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
henryj
Unregistered
 
Post: #20
Quote:I lock and unlock the arrays immidiately before and after each glDrawElements. Unless the driver is doing something really sneaky...

The driver may be caching the geometry. How else do you explain the differences we have seen? The best way to tell for sure is to add another mesh.

Quote:This wasn't designed to be a real world speed test. It was only designed to determine what the fastest way to submit polygons to the graphics card was.

Then what's the point. People are going to make decisions on their render path based on this data. If you just wanted to see the theoretical limit just render one static triangle strip of 1 pixel triangles. This is the fastest method.
I'm not criticising what you have done, it's really good, but why not make it so the data is actually useful. It wont take much more work to add another mesh. You could make 2 TriMesh of the same data and draw them alternatively. This would tells us whether the driver is doing a comparison of the pointers being passed to gl*Pointer(). Being sneaky as you said.

I for one would be interested in the results.
Quote this message in a reply
Member
Posts: 145
Joined: 2002.06
Post: #21
Quote:Originally posted by OneSadCookie
That's really strange that randomizing the triangles order would be consistently better.. maybe you've got something else going on (if it were Radeon/GF3, I'd say Z-buffer , but that doesn't seem to make sense for Rage Pro).
Alright, blast from the past time, but I think I've figured out why this strange behavior occurs.

Lets say it takes a non-trivial amount of CPU computation to throw out a back-facing poly, and that the video card does not have an draw queue, or has a very small one. If you sort the polygons into tri-strips, it's likely that there are long groups of sequential back-facing and front-facing polygons. during the back-facing groups the graphics card is idle while the CPU throws out the polys. During the front-facing sets the CPU is idle while it waits for the graphics card to accept the next poly.

Now, if you randomize the polygons, the chance of a sequence of more than a few front- or back-facing polygons becomes negligible. This means that while the CPU is waiting to submit the next polygon there's a good chance that it could throw out a back facing polygon. I think this slight efficiency gain might be enough to create the ~7% speed increase seen from randomizing the polygons.

As a result, if you're really targeting the Rage Pro (which you've got to be either insane, working on a demo, or hopelessly stuck in the near past to do), it might be interesting to try optimizing your polygons so that they're ordered so that any two sequential polygons are as close to facing directly away from each other as possible. this would pretty much guarantee that the CPU throws out one poly while it waits for another to draw every time.

"He who breaks a thing to find out what it is, has left the path of wisdom."
- Gandalf the Gray-Hat

Bring Alistair Cooke's America to DVD!
Quote this message in a reply
Luminary
Posts: 5,143
Joined: 2002.04
Post: #22
Wow, this is a real blast from the past!

Sounds like a good theory. I'm just glad it's a non-issue these days Smile
Quote this message in a reply
Moderator
Posts: 608
Joined: 2002.04
Post: #23
Completely off topic, but speaking of the past, did anyone else notice who posted to this thread? Maybe I am seeing ghosts...
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  OpenGL glDrawArrays not working dotbianry 12 11,252 Dec 21, 2012 09:21 AM
Last Post: Skorche
  Drawing using glDrawArrays agreendev 9 17,060 Jul 17, 2010 05:20 AM
Last Post: Bersaelor
  glDrawElements and Face indices Ashford 8 12,398 Nov 11, 2009 03:03 PM
Last Post: Ashford
  glColor4f not working after glDrawArrays Technoman 2 6,345 Aug 15, 2009 08:09 AM
Last Post: Technoman
  Agh! glDrawElements kills my artwork ferum 2 3,523 Nov 23, 2006 09:05 AM
Last Post: ferum