View Full Version : Fast face normals
Ian Kerr
2002.08.28, 12:25 AM
In my app I can get about 65fps for a 5.5K model with lighting, texturing etc. However I only get this when I don't update my normals (note that I am still passing the normals to OGL, just not updating them). When I do update my normals then it slows down to about 13fps.
I am calculating the normals by finding the cross product of the face. Does anyone know of a really fast way to generate face normals or is it better to precompute them and interpolate them at runtime (I'm using animated models)?
monteboyd
2002.08.28, 12:33 AM
This is just a guess but would it be quicker if you transformed the normals the same way you transform the faces?
rangaroek
2002.08.28, 04:10 AM
Originally posted by Ian Kerr
In my app I can get about 65fps for a 5.5K model with lighting, texturing etc. However I only get this when I don't update my normals (note that I am still passing the normals to OGL, just not updating them). When I do update my normals then it slows down to about 13fps.
I am calculating the normals by finding the cross product of the face. Does anyone know of a really fast way to generate face normals or is it better to precompute them and interpolate them at runtime (I'm using animated models)?
Hi, you need to update your normals when your geometry changes (character animation) or when you use scaling transforms. So if both cases are not true for you don't recalculate your normals. For character animation you can precalculate normals for the keyframes of an animation. Then you can interpolate the normals between the keyframes. (it's not exact but fast).
When you want to use scaling transforms use only uniform scaling (sx=sy=sz)
and enable GL_RESCALE_NORMALS so GL recalculates the normals for you.
By the way, when you need to transform your normals to world space (face normals for collision detection etc), you do it by multiplying them with the inverse transpose of the matrix which you use to transform your vertices.
The correct way to calc normals for a vertex is:
1: calc the unnormalized normal for every face
2: for every vertex take the sum of all face normals of faces which share the vertex
3: div the result by the count of faces the vertex share
4: normalize the result
hopes it helps
rangaroek
codemattic
2002.08.28, 09:05 AM
Originally posted by rangaroek
The correct way to calc normals for a vertex is:
1: calc the unnormalized normal for every face
2: for every vertex take the sum of all face normals of faces which share the vertex
3: div the result by the count of faces the vertex share
4: normalize the result
hopes it helps
rangaroek
I dont think you need #3. Dividing a vector by a scalar and then normalizing it would leave you with the same vector as if you only normalized it. Right? Im guessing you are doing #3 because you are thinking of *averaging* the vectors - but it doesnt seem necessary to me.
Hope I just saved you one divide per normal!
Codemattic
rangaroek
2002.08.28, 11:38 AM
Originally posted by codemattic
I dont think you need #3. Dividing a vector by a scalar and then normalizing it would leave you with the same vector as if you only normalized it. Right? Im guessing you are doing #3 because you are thinking of *averaging* the vectors - but it doesnt seem necessary to me.
Hope I just saved you one divide per normal!
Codemattic
You are completely right, step three is obsolete ... and I didn't find it in my code ... it was a little mistake of mine ... :)
bye, rangaroek
Shouldn't your normalize the normals before you sum them?
OneSadCookie
2002.08.28, 04:20 PM
Should you? The un-normalized normal for a face is going to have length roughly proportional to the area of that face.
So if all your faces are roughly similar in size, you probably won't notice the difference, and if some faces are much larger than others, you probably wanted to give them more weight than the smaller faces anyway.
That sound reasonable? (All off the top of my head...)
Hmm, that seems reasonable actually.
Btw, tip for the original poster: If you're using CW and only targeting 604s and later, you can use __frsqrte(x), which estimates 1/sqrt(x) in like 4 cycles (as opposed to several hundred for the real thing).
__fsqrt(1.000000) = 0.984375 = 1/sqrt(1.031998)
__fsqrt(0.250000) = 1.968750 = 1/sqrt(0.258000)
__fsqrt(4.000000) = 0.492188 = 1/sqrt(4.127992)
__fsqrt(2.000000) = 0.695312 = 1/sqrt(2.068426)
__fsqrt(0.500000) = 1.390625 = 1/sqrt(0.517106)
There's probably some iterative process that can produce more accurate values if you need them.
Ian Kerr
2002.08.29, 01:28 AM
What header file is __fsqrt in?
gack, I typed the printf wrong :)
It's __frsqrte(), and it's not in any header file. It's an intrinsic function that compiles to a single instruction. This only works in CodeWarrior.
OneSadCookie
2002.08.29, 06:00 PM
Or PB/GCC in Jaguar:
#include <ppc_intrinsics.h>
note that __fsqrte is for doubles, there's a separate __fsqrtes for floats.
Originally posted by OneSadCookie
note that __fsqrte is for doubles, there's a separate __fsqrtes for floats.
It's __frsqrte (http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixassem/alangref/frsqrte.htm), and there isn't any __fsqrtes.
You're probably thinking of __fsqrt (http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixassem/alangref/fsqrt.htm) and __fsqrts (http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixassem/alangref/fsqrt.htm). I'm not sure which chips these are implemanted on. If they are on any, they're probably SLOW (though not as slow as sqrt()).
OneSadCookie
2002.08.29, 09:42 PM
Ooops, sorry, made the same mistake as you :p
The functions in ppc_intrinsics are called __frsqrte and __frsqrtes, but both compile to a single frsqrte instruction. Guess that means it doesn't really matter if you use __frsqrte for floats.
__frsqrte() wasn't accurate enough, I decided to make an improved version:
float inv = __frsqrte(f);
float better = 0.5*(__fres(f*inv) + inv);
This is derived from the fact that 1-k*x^2=0 when x=1/sqrt(k)
sample results:
input frsqete check improved check
1.000000 0.984375 1.031998 1.000000 1.000000
0.250000 1.968750 0.258000 2.000000 0.250000
4.000000 0.492188 4.127992 0.500000 4.000000
2.000000 0.695312 2.068426 0.707153 1.999737
0.500000 1.390625 0.517106 1.414307 0.499934
3.141593 0.558594 3.204851 0.564209 3.141376
2.718282 0.609375 2.692965 0.606445 2.719047
0.318310 1.765625 0.320777 1.772461 0.318307
0.367879 1.648438 0.368006 1.648682 0.367897
314.1592 0.056641 311.7051 0.056419 314.1546
271.8281 0.061523 264.1914 0.060654 271.8226
the check columns are 1/x^2 of the previous column
Here's an ASM version of it:
static const float gOneHalf = 0.5;
double asm fastSqrtRecip(register double x) {
frsqrte fp2,x; // inv = 1/sqrt(x)
lfs fp3,gOneHalf(RTOC);
fmuls x,x,fp2; // f*inv
fres x,x; // 1/(f*inv)
fadds x,x,fp2; // 1/(f*inv)+inv
fmuls x,x,fp3; // .5*(1/(f*inv)+inv)
blr;
}
My guess is that this takes around 20 cycles on a G3 or G4 (two cycle throughput for regular instructions, load is free if cached, fres takes 10 (yes, 2 cycle throughput for frsqrte)). Properly pipelined into the normal calculations that could be dramatically reduced.
Here's the complete derivation for those that might be interested:
x = 1/sqrt(k)
x*sqrt(k) = 1
kx^2 = 1
kx^2 - 1 = 0
if we have a guess (g) for a solution of that, we can use a technique who's name I can't remember to refine this solution.
f(x) = kx^2 - 1
f'(x) = 2kx
better guess = g - f(g)/f'(g)
= g - (kg^2 - 1)/(2kg)
= g - (g/2 - 1/2kg)
= g - g/2 + 1/2kg
= g/2 + 1/2kg
= (g + 1/kg)/2
UPDATE! I have been informed that the refinement method I refered to is known is the Newton-Rhapson Method.
kainsin
2002.08.31, 12:04 PM
Math rules.
Originally posted by kainsin
Math rules. Well put.
BTW, a friend of mine realized this just now:
(1/sqrt(x))*x = sqrt(x)
DOY!
so to summarize: static const float gOneHalf = 0.5;
// calculate 1/sqrt(x) with pretty good accuacy
// x is double to prevent frsp generation before call
// return type is float to prevent frsp generation after call
float asm fastSqrtRecip(register double x) {
frsqrte fp2,x; // inv = 1/sqrt(x)
lfs fp3,gOneHalf(RTOC);
fmuls x,x,fp2; // f*inv
fres x,x; // 1/(f*inv)
fadds x,x,fp2; // 1/(f*inv)+inv
fmuls x,x,fp3; // .5*(1/(f*inv)+inv)
blr;
}
// calculate sqrt(x) with pretty good accuacy
// x is double to prevent frsp generation before call
// return type is float to prevent frsp generation after call
float asm fastSqrt(register double x) {
frsqrte fp2,x; // inv = 1/sqrt(x) // inv = 1/sqrt(x)
lfs fp3,gOneHalf(RTOC); // load 0.5
fmuls x,x,fp2; // f*inv
fres fp2,fp2; // 1/inv
fadds x,x,fp2; // 1/inv+f*inv
fmuls x,x,fp3; // .5*(1/inv+f*inv)
blr;
}
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.