View Full Version : Random Optimization Questions
wyrmmage
2007.06.11, 07:02 PM
So, I was bored, and wondering about optimizing things...here are some questions that I'm curious about; some are more theoretical, some aren't :)
When the C language was being programmed, it was programmed in assembler, right? As far as I know, assembler automatically optimizes the code for the processor that the code is compiled with, so how did the good people who made C get it to not be processor specific?
Why does C enforce concrete variable types like 'int', 'double', and 'char', while assembler does not?
Does C/C++ optimize switch statements more than if/else if/else statements? A friend of mine told me that it does, but I've never heard of that, so I thought that I'd ask...would I get any (however small)of an optimzation out of something like this:
int i = 20;
switch(i)
{
case 5:
cout << "is five";
break;
case 20:
cout << "is twenty";
break;
default:
cout << "is not five or twenty";
break;
}
over something like this:
int i = 20;
if(i==5)
{
cout << "is five";
}
else if(i==20)
{
clout << "is twenty";
}
else
{
cout << "is not five or twenty";
}
?
In C++, does setting something to 'protected' or 'private' instead of 'public' have anny optimization benefit over setting it to public, or are those purely for information hiding?
Is there any difference between these two declarations:
public void whatever();
public void whatever(void);
?
Ok, so, in C++, I have some code like this:
bool hasTexture;
bool hasMaterial;
//several other booleans
public void draw()
{
if(hasTexture)
{
//bind texture
}
if(hasMaterial)
{
//attach material
}
//several more 'if's that do things before drawing
}
so, when I instantiate my class, I set all of the boolean variables. When I draw my object, based on if the booleans are true or false, my code does something (binds the texture, etc.). This 'something' never changes. Now, couldn't you (theoretically) have something that, when you instantiated the class and set the boolean variables, went through and took all of the code that would execute if the booleans were 'true', and compile it into binary? Then, when you called your drawing code, you would just execute the binary, and save yourself the time of evaluating however boolean if statements you had. Obviously this would be a really small optimization, but it still seems like a cool thought to me ;)
Just curious...
Thanks :)
-wyrmmage
AnotherJake
2007.06.11, 07:28 PM
Sorry I don't have any specific answers for you here, I gave up worrying about little details in performance many years ago in favor of easier to write code. I am certain others will chime in with more specifics.
Pre-mature optimization is indeed the root of all evil. Trying not to do really dumb things can help, but whatever. Write it, and if it works great then it's written correctly ;) If it's a little slow, then profile and see if you can tighten up a few spots.
As to the assemblers and optimizations? All assemblers have to target a specific processor, and all processors have certain ways they work best for given situations, so optimizations across the processor spectrum are largely impossible. C is not assembly language, it's a higher-level language which sits on top of the assembler/machine code level, so that's how it can be the same across platforms.
C enforces data types because it can. It's one of the features of the language to help you write better, bug-free code. Assemblers vary on their features, but they usually assume you know what you meant, and don't try to stop you from shooting yourself in the foot if you so desire.
As I understand it, switch statements are often faster than if-else, but I don't usually care unless it makes the code prettier ;)
wyrmmage
2007.06.11, 07:38 PM
hmmm...thanks for the notes :)
I'm not actually trying to implement any of this stuff at the moment, I was just bored and thought I'd ask. I know that assembler code optimized for the specific processor, I'm just wondering how the people who made C managed to avoid that.
Thanks for the input :)
-wyrmmage
AnotherJake
2007.06.11, 07:42 PM
I know that assembler code optimized for the specific processor, I'm just wondering how the people who made C managed to avoid that.
They didn't. You can't. Not for different language architectures anyway. Often times optimizations for one processor in a *family* of processors, like the POWER architecture or X86 architecture, can affect performance across the family, but not across different architectures.
wyrmmage
2007.06.11, 07:44 PM
But...a program written in C that is compiled on an Intel processor seems to run at the same speed on an AMD processor (well, not really, because some processors run at different speeds, etc., but what I'm saying is that C code doesn't seem to be faster or slower on different processors in the same way the Assembler is; why?)
-wyrmmage
AnotherJake
2007.06.11, 07:55 PM
Oh, I see what you're asking!
The C compiler can take slight differences between the Intel processor and the AMD processor into consideration and generate [an instruction stream] which optimizes for those conditions. Writing assembler by hand to work on both of them wouldn't do that unless you specifically made the adjustments yourself. Nowadays though, you'd be pretty hard-pressed to out-geek the compilers!
Compilers always try to generate optimal instructions for the processor they are targetting. Some compilers are better at it for any given processor than others, but they are almost always better at it than anybody hand-writing with assembly nowadays.
wyrmmage
2007.06.11, 10:06 PM
ok, thanks, that answers a lot of my questions :)
So...does anyone know the answer to any of my other questions?
Thanks again :)
-wyrmmage
akb825
2007.06.11, 10:10 PM
Private/protected/public are only hints for visibility.
Having void in the middle of the parenthesis is the same as having nothing at all.
Your last thing isn't really possible without having separate functions or versions of the class to handle each case separately without if statements.
OneSadCookie
2007.06.11, 11:50 PM
Having void in the middle of the parenthesis is the same as having nothing at all.
In C++ that's true, in C it's not.
Your last thing isn't really possible without having separate functions or versions of the class to handle each case separately without if statements.
Or doing something like just-in-time compilation.
Switch statements can occasionally be optimized into a computed jump, however, modern processors don't like computed jumps, so I don't think I've ever seen the compiler do anything other than a bunch of if/else statements.
Basically, none of what you've asked about is at all relevant to optimization :p
The first thing you should optimize for is algorithms and data structures -- those usually make the biggest difference.
Once you've got the best algorithms and data structures you can think of, function inlining is usually the biggest win. You can often get away going only this far.
I'd put vectorization at the next kind of level, since it's a lot of work for a fairly unknown performance benefit, but I can see arguments for it being either higher (since it's often an algorithmic modification) or lower (since it's harder than many of the things below).
Then you get into micro-optimizations like
* avoiding expensive math operations (sqrt, /, etc)
* shadowing globals with locals
* unrolling loops and interleaving iterations
* avoiding branches
* issuing cache prefetch instructions
* etc
There's quite a lot of speed you can gather at this level if you know what you're doing, but it's usually not at all worthwhile to bother trying :)
AnotherJake
2007.06.12, 01:22 AM
The first thing you should optimize for is algorithms and data structures -- those usually make the biggest difference.
As an example: Recently I had changed a duplicate vertex removal routine to use a hash lookup (just an NSMutableDictionary) instead of a brute-force linear search through an array in straight C. Even with the overhead of Obj-C messaging to use the NSDictionary, the gain was huge! For relatively large amounts of vertices, on the order of 75,000 triangles, it went from taking around seven minutes, to taking around seven seconds. I could have optimized the linear search until the computer bled bits and it wouldn't have come remotely close to simply changing the approach. I am certain I could shave another five or six seconds off of that by moving the whole thing to straight C, using a nice balanced tree. But seven seconds to load a 75k poly model [and removing duplicate vertices [and tessellation!] at the same time] seems more than adequate to me considering that's about as much as I'd ever want to push per frame in one scene on modern hardware right now anyway.
As an example: Recently I had changed a duplicate vertex removal routine to use a hash lookup (just an NSMutableDictionary) instead of a brute-force linear search through an array in straight C. Even with the overhead of Obj-C messaging to use the NSDictionary, the gain was huge! For relatively large amounts of vertices, on the order of 75,000 triangles, it went from taking around seven minutes, to taking around seven seconds. I could have optimized the linear search until the computer bled bits and it wouldn't have come remotely close to simply changing the approach. I am certain I could shave another five or six seconds off of that by moving the whole thing to straight C, using a nice balanced tree. But seven seconds to load a 75k poly model [and removing duplicate vertices [and tessellation!] at the same time] seems more than adequate to me considering that's about as much as I'd ever want to push per frame in one scene on modern hardware right now anyway.I may show my own ignorance here but... I imagine an NSMutableDictionary is O(1) or O(n) depending on how much it sucks right? So I'm curious what your linear search was? If it was really O(n)... them's some big constant factors.
AnotherJake
2007.06.12, 02:06 AM
Ignorance? Ha! I fart in your general direction! I know NOTHING about big-O notation because I suck at math that bad. As far as I know, it went from O(alot) to O(notsomuch)...
And maybe even `linear search' isn't the proper terminology here (I'm just a dumb hobbiest with no CS training), but it went something like this:
- search for existing vertex in final vertex array by starting at index 0 and iterating through the whole thing until you find one matching the vertex, normal and texCoord
- if you find a match then stop searching and add that index to the index array
- otherwise, after you've searched the whole friggen array and didn't find a match, add the new unique vertex to the vertex array and add the new index to the index array
[for the fast search] I generate the key for the NSMutableDictionary by simply formatting an NSString with %0.3f for each value of the vertex (v3, n3, t2). I tried it with %0.6f and it worked as expected: didn't weld as many vertices because the tolerance was lower. So far I haven't seen the key hash fail to make a unique value for each vertex like this, but I haven't tested it thoroughly (maybe only a half-dozen test models). Sure would be nice if it stays solid!
I was originally planning to go the tree route but figured dictionaries are so easy to work with that it was at least worth a try. So far so good!
[edit] this is the actual [fast] search code:
- (void)addVertexDataToCurrRenderingGroup:(VertexDat a *)vertexData
{
NSString *key;
NSData *data;
NSNumber *index;
totalVertsBeforeDupeRemoval++;
key = [NSString stringWithFormat:@"%0.3f%0.3f%0.3f%0.3f%0.3f%0.3f%0.3f%0.3f",
vertexData->vertex[0], vertexData->vertex[1], vertexData->vertex[2],
vertexData->normal[0], vertexData->normal[1], vertexData->normal[2],
vertexData->texCoord[0], vertexData->texCoord[1]];
index = [vertexDataDictionary objectForKey:key];
if (index == nil)
{
data = [NSData dataWithBytesNoCopy:vertexData length:sizeof(VertexData) freeWhenDone:NO];
[vertexDataArray addObject:data];
index = [NSNumber numberWithUnsignedInt:[vertexDataArray count] - 1];
[vertexDataDictionary setObject:index forKey:key];
[vertexDataIndexArray addObject:index];
}
else
{
[vertexDataIndexArray addObject:index];
}
}
akb825
2007.06.12, 03:11 AM
For vertex removal, I believe it's usually O(n^2), since for every vertex you need to look at every other for duplicates. For that I I've used a std::map in C++, which is O(lg(n)) since it's using a tree, which improves the overall algorithm to O(n*lg(n)). When doing this, I would get improvements from several minutes to a few seconds with large meshes having hundreds of thousands of vertices. I suppose I also could have used a hash_map (should be more or less the same to NSDictionary), but I would have had to come up with a hashing function, (or could have just created a string) and also it would have the cost of rehashing constantly, which I figured would hurt it in the end. (though it would be interesting to actually test the 2)
There are oftentimes several choices such as this that you need to consider when choosing a data structure to be at the heart of your algorithm. If you have a tight loop and/or recursion with just some math then another call, then you may need to worry about things like expensive math, doing branches, etc. If you're using other things like a data structure that ends up dominating your algorithm, it isn't nearly as important.
Just as an FYI: using GCC, if you choose optimization level 3, it ends up doing a lot of optimizations for you. Things like unrolling loops, re-aranging instructions, inlining, etc. (in fact, I had one case where manually unrolling a loop made something noticeably slower with that level of optimization enabled)
AnotherJake
2007.06.12, 03:32 AM
For vertex removal, I believe it's usually O(n^2), since for every vertex you need to look at every other for duplicates.
That sounds about right in my situation -- still don't know what big-O notation means, but little models seemed to load instantaneously and it just kept getting slower and slowwer and sssllloooowwwwwerr the bigger the models got in polycount, using the `linear search' I was talking about.
Again, it just shows that higher level optimizations in algorithms and data structures can save truck-loads of time. Screw cycles, we're talking minutes in some cases!
akb825
2007.06.12, 05:38 AM
Big-O notation is a rough estimate of the number of operations you need to perform based on how many items you have. In this case, the items are vertices, so n is the number of vertices. In order to walk through an array, it O(n), since you must visit every element of the array. In this case, you walk through the array, then for each item you walk through it again. (to find the duplicates) So you have O(n) for the first level, and for each of those n steps you also have O(n) for the second level. Together they are O(n*n), or O(n^2).
In my case with a tree, a tree takes O(lg(n)) to traverse. (where lg(n) is log based 2 of n) This is because for each level of the tree you can cut out half of the remaining subtree by choosing to go left or right. You still have to visit each vertex once when you're searching for duplicates of that vertex, so on the top level you still have O(n). However, instead of a linear search for each of those n steps, you have the tree lookup, so you have O(lg(n)) on each step. This means you have O(n*lg(n)) total.
Generally, if you do a series of actions, then afterwards you do another series of actions, you add the 2 time costs together. (though for the estimate you only take the bigger of the 2, since that's the one that dominates your runtime) If you do an action within another action, such as nested loops, you multiply the time costs. As you may have noticed, this isn't an exact representation of the runtime, since constants are ignored and so are smaller terms that are added: it's meant to be an estimate on how the time grows when n grows large.
I hope that could clear some things up.
OneSadCookie
2007.06.12, 07:56 AM
To expand slightly, there are 3 functions for measuring algorithmic complexity. O(f) is an upper bound on the complexity; Ω(f) is a lower bound on the complexity, and Θ(f) is a tight bound on the complexity (which only exists where O(f) = Ω(f)).
For example, "quicksort" is Ω(n log n) but O(n^2), where "insertion sort" is Ω(n) but O(n^2) and "merge sort" is Θ(n log n).
Notice that these say nothing about constant factors, how often the best or worst cases actually get hit or anything, they're just an indication of proportionally how much longer the algorithm will take to run as its input size increases.
As a kind of throw-away line, it's usually best to keep ones algorithms inside O(n log n) if at all possible ;)
AnotherJake
2007.06.12, 12:30 PM
I think I might actually get that! Maybe...
So in my case I'm searching for duplicates for each vertex, so the first operation is O(n). Then I'm doing a hash lookup, but I don't know what the efficiency of that is. Perhaps in the best case scenario it's O(1) (O(Ω) right?) and the worst case it's O(n) (O(f) right?) as Josh suggested. Could you then say that an unknown hash algorithm could be expected to yield O(n/2) as an average of Ω and f? If so, then I guess the vertex removal would work out to O(n * n/2) or O((n^2)/2) right? But according to my results with the NSDictionary it sounds more like I got similar benefits to akb's tree route, but perhaps not quite as good. Would it then be okay to say I'd estimate that it's perhaps more like O(n 2log(n)) ?
AnotherJake
2007.06.12, 12:46 PM
Hmm... Well after doing a short Google on it, it appears that since the probability of O(n) happening in a hash look up is so small, both the best case and average case are considered O(1). That's pretty incredible, because according to how I understand this now, the vertex removal would work out to O(n * 1), or simply O(n). Wow! The only thing limiting speed is the hashing function, which in my case is a combination of formatting a string for the key (fairly slow I suppose) and whatever NSDictionary uses to hash that key behind the scenes. Oh, plus obj-c messaging overhead, but that appears to be an awful lot lower than I used to think it was!
[edit]I just realized that I don't think O(n) can't happen with an NSMutableDictionary because if an equivalent object is found in the dictionary it will be replaced with the one being added. And as I read somewhere in the Apple docs, an object is considered equal if its hash is equal, so I guess there won't be any collisions. I could be all wrong about this though...
wyrmmage
2007.06.12, 01:10 PM
Very interesting stuff...thanks for all of the answers and tips guys! ^_^
I tried to avoid the problem of fast loading times for meshes by running all of my files (which were VRML 2.0 (.wrl)) through a program that I made that 'unrolled' the format and put it into a .mesh custom format that I made. This reduces the load time considerably, and puts the waiting time on the artist/programmer when they're making the game, instead of the gamer waiting for the game to load.
Again, thanks for all of the tips :)
-wyrmmage
AnotherJake
2007.06.12, 01:41 PM
I tried to avoid the problem of fast loading times for meshes by running all of my files (which were VRML 2.0 (.wrl)) through a program that I made that 'unrolled' the format and put it into a .mesh custom format that I made. This reduces the load time considerably, and puts the waiting time on the artist/programmer when they're making the game, instead of the gamer waiting for the game to load.
I would most certainly advocate loading pre-optimized formats for any final release of a game.
In my case, I wanted to be able to load obj files from various sources, and specifically from my own 3D editors *during development*. I can't stand having to filter output into my own format every single time I want to try out a new object. A person could make the argument that you could write an export plugin for your 3D editor, and that is totally true, and something I have done as well. But the converse to that is: what about downloading models off the internet to act as stand-ins or to try them out? In that case, obj is pretty widely accepted, so I went with it for that purpose. So in this case (development) it makes lots of sense to do filtering on load. Heck I even added automatic tessellation a few days ago, so now I don't even have to bother triangulating before I export from whatever 3D editor I'm in at the time. A lot of models out there aren't made for games, and so contain un-optimized geometry such as n-sided polygons, and even complex polygons -- specifically, I have run across concave polygons in an obj file, even though I think the obj spec says there shouldn't be any concave polygons. So the bottom line is that during development, load times aren't about the gamer but me! I need to retain my sanity too. :)
[adding] I would much rather put up with seven seconds on load for an unknown 75,000 polygon model, rather than having to import that into a 3D editor, select options to triangulate it, export it back to disk, filter it through my own optimizer, and *then* load it. If I only did that once in a while it wouldn't bother me, but if you do that more than a dozen times in a day it gets old real fast!
Cochrane
2007.06.12, 02:41 PM
[edit]I just realized that I don't think O(n) can't happen with an NSMutableDictionary because if an equivalent object is found in the dictionary it will be replaced with the one being added. And as I read somewhere in the Apple docs, an object is considered equal if its hash is equal, so I guess there won't be any collisions. I could be all wrong about this though...
You are :p
If two objects are equal (as determined by isEqual:), then they have to have the same hash. However, this doesn't work in reverse. Two objects can have the same hash yet be not equal. As an example, NSString considers the first eight characters, the last eight characters and the length when creating a hash value. It's easy to construct two strings that are not equal but give equal hash values. An example can be found here: http://www.omnigroup.com/mailman/archive/macosx-dev/2003-April/045655.html (first thing that popped up in Google)
akb825
2007.06.12, 02:48 PM
The hash table would be Ω(1), but would be O(n) for a lookup. (remember: Ω is the lower bound, O is the upper bound) So your overall algorithm is Ω(n), but O(n^2). However, there are also other things you must consider, mainly when you're storing the vertices: inserting a single item is actually Ω(1) and O(n^2). This is because for every item, you can insert it directly to where the hash function says to, but it's possible that you will need to visit every item to insert it. That by itself isn't that bad, but when the hash table gets full, it must then rehash the items, or allocate a new array and insert each item individually using the hash function. That's where you get the O(n^2) from: you have n items, each of which is O(n) to insert since it's possible that it would need to visit every position in the table. Therefore, to insert each of your vertices into the hash table in the beginning, it's Ω(n), but it's O(n^3). Its average case isn't nearly as bad as n^3, probably much closer to between n and n^2 since you will need to rehash throughout your insert, but this is why I chose not to use a hash table.
I generally use a hash table when I have relatively few inserts, but lots of lookups, but a tree-based map when I have around the same number of inserts and lookups. (maybe somebody with more experience, such as OSC, can input their advice about this tactic?) When I do vertex elimination, I actually let the insert do all the work (using collisions to determine what index to use), then use an iterator to copy the vertices to an array when I'm done. In that case, I have n inserts but 0 lookups. (the time complexity is still the same as your method just with a tree instead of a hash table, though: Θ(n*lg(n)); with what I said above, your overall algorithm with a hash table would be Ω(n), but O(n^3), though the average case isn't nearly as bad as the worst case)
AnotherJake
2007.06.12, 03:03 PM
You are :p
If two objects are equal (as determined by isEqual:), then they have to have the same hash. However, this doesn't work in reverse. Two objects can have the same hash yet be not equal. As an example, NSString considers the first eight characters, the last eight characters and the length when creating a hash value. It's easy to construct two strings that are not equal but give equal hash values. An example can be found here: http://www.omnigroup.com/mailman/archive/macosx-dev/2003-April/045655.html (first thing that popped up in Google)
Thanks for pointing that out! I was wondering what the hash for strings would be. Unfortunately, that example you linked to must be out of date because I tried it and it did return two different hashes. Now I gotta go find out what the current rules are on string hashing and find out when they changed, because there could be a significant number of digits in the keys I'm generating, depending on the model, and that'd really screw things up. My confidence in the solidity of my method of using NSDictionary for duplicate vertex removal just took a big hit. :(
AnotherJake
2007.06.12, 03:09 PM
The hash table would be Ω(1), but would be O(n) for a lookup. (remember: Ω is the lower bound, O is the upper bound) So your overall algorithm is Ω(n), but O(n^2). However, there are also other things you must consider, mainly when you're storing the vertices: inserting a single item is actually Ω(1) and O(n^2). This is becau ... blahblahblahblah...
:blink:
Yeah, okay, I totally can't follow that. I guess I'm stuck in ignorance for now. Blissfully so though.:)
aarku
2007.06.12, 03:14 PM
:blink:
Yeah, okay, I totally can't follow that. I guess I'm stuck in ignorance for now. Blissfully so though.:)
It's good to know. :) No sense optimizing an implemention of an inefficient algorithm.
http://en.wikipedia.org/wiki/Polynomial_time
-Jon
AnotherJake
2007.06.12, 03:38 PM
Thanks for the link and the little extra push, Jon.
My issue at the moment is not optimization of my method though; it's that according to that link Cochrane put up, it might not work at all! Although it has worked perfectly so far, so I don't know what's up with that yet. The speed is absolutely wonderful as far as I'm concerned, and I'd be pretty hard-pressed to beat the code simplicity. But if it can break, it's pretty much just a quick hack which may or may not work, depending on it's input. Doing a tree may very well be best in this situation after all (as I initially thought it would be), but I still haven't figured out a bullet-proof way to make a search key for that either.
BTW akb, after reading your post another ten times, it does start to make a little more sense to me. Looking at it from the standpoint of a hobbiest, this whole thing on polynomial time and big-O notation is easily one of the tougher things I've ever attempted to grasp. I really appreciate everybody's attempt here to shed a bit of light on my dim understanding of it.
akb825
2007.06.12, 04:30 PM
This is the main idea with Big-O notation: Ω(f(n)) gives you the asymptotic lower bounds of an algorithm. That means that your algorithm can be no faster than the cost function f(n). O(g(n)) gives you the asymptotic upper bounds of an algorithm, which means that your algorithm can be no slower than the cost function g(n). If f(n) == g(n), then it is Θ(f(n)), so your algorithm is directly proportional to f(n). In the case of a hash table, looking up an item could hit it directly each time if you have perfect hashing, so it is Ω(1). However, if you have lots of collisions, it's possible (though highly unlikely) that you need to visit each item, so it is O(n). There are similar cases for insertion, which I mentioned in my last post.
For your algorithm not working, you could make a simple test to see if the hash table checks to see if strings with identical hash values are stored separately. Otherwise, you could use a tree. For the stl map (which is what I've used), you only need to define less-then relations. What I do is I go element by element to see if they are less than the corresponding element. For example, I first check to see if this object's x position is < the other object's x position. If so, I return true (this object < the other object). If this x position > the other x position, then I return false. Otherwise, if they are equal, I move on tot the y direction, then the z direction, then the normal's x component, and so on. This works fine if you already have exact copies of the vertices (such as loading from a file), but if you're doing any kinds of calculations where they might be off by a rounding error, you may need to add an epsilon in your compares to ensure you aren't treating 2 equal vertices as unequal by a 0.0000000001 discrepancy. :p
Cochrane
2007.06.12, 06:07 PM
Thanks for the link and the little extra push, Jon.
My issue at the moment is not optimization of my method though; it's that according to that link Cochrane put up, it might not work at all! Although it has worked perfectly so far, so I don't know what's up with that yet. The speed is absolutely wonderful as far as I'm concerned, and I'd be pretty hard-pressed to beat the code simplicity. But if it can break, it's pretty much just a quick hack which may or may not work, depending on it's input. Doing a tree may very well be best in this situation after all (as I initially thought it would be), but I still haven't figured out a bullet-proof way to make a search key for that either.
BTW akb, after reading your post another ten times, it does start to make a little more sense to me. Looking at it from the standpoint of a hobbiest, this whole thing on polynomial time and big-O notation is easily one of the tougher things I've ever attempted to grasp. I really appreciate everybody's attempt here to shed a bit of light on my dim understanding of it.
Don't worry too much about that. In fact, if I understand correctly what you are doing (i.e. using an NSDictionary instead of just using hash directly), don't worry about it at all. NSDictionary will make sure that even if the hash is the same, different objects still get treated as different objects. If you are writing your own code for this purpose, just do an isEqual once you know that the hashes are equal.
wyrmmage
2007.06.12, 06:38 PM
oh, another question: anyone know of a profiler that works well will .dylibs (I also need one that works well with .dll's on Windows) that is free?
Thanks,
-wyrmmage
AnotherJake
2007.06.12, 06:39 PM
This thread is starting to seem like it might need a split.
Cochrane: Yeah, I'm just using NSDictionary, as listed exactly in the code snippet I put up above. I'm concerned with the example you linked to because he's saying that it only takes the last eight bytes and the first eight bytes to generate the hash from an NSString. Since I tried out his example, I can say that that does not hold true on my machine. But that doesn't change the fact that I still have been unable to find out whether or not generating a key by formatting a string with floats from my vertex data will screw up. It does say in the NSString documentation that the hash value it returns can not be relied upon across different versions of OS X, but I don't care about the value, only the method. I don't even know if NSDictionary uses that hash or creates it's own. Further, how many characters (or bytes) will it use for calculating a hash? Can I rely on NSDictionary to handle this consistently across different OS versions? All I know is that it appears to work perfectly right now, even with testing up to like 250,000 unique vertices (all loading and tessellation and duplicate removal via NSDictionary done in about seven seconds). But will that break down if I have a model with say, coordinates like 32134345345345.000000? Not that I will, but I wonder how many digits I can get away with? Since it works so far, the only sensible thing I can think to do right now is just keep an eye out for failures, since I don't know what's going on behind the scenes with NSDictionary.
akb825: I'm still trying to grasp the big-O concept. I really appreciate all your help so far! :) Let me run these thoughts by you, and maybe you can tell me if I'm getting warmer or colder:
I've been seeing O(n) referred to most often, presumably because O is the value we'd be most interested in when evaluating algorithmic efficiency. IOW, it's the worst-case and we're trying to design conservatively. A tree could be Ω(1), a linear search could be Ω(1), a hash lookup could be Ω(1). But (as an example) for a linear search it could just as well be O(n) and that is the value we assume for the worst case. In the case of vertices we're searching n (number of vertices) times through a list of n unique vertices we've already added to our array. So the worst case is O(n * n), or O(n^2). Best case is Ω(n). That's how I understood it earlier for the linear search. Insertion in this case is always O(1) because each vertex would be simply tacked onto the end of the array (according to Jon's link, that's called constant time). You could say it's Θ(1) as well, since Ω(1) is also the best case, right?
"So your overall algorithm is Ω(n), but O(n^2)", which makes sense because I'm doing n lookups and if everything went perfectly on every lookup using the NSDictionary, it'd only have to iterate over one vertex every search (thus yielding Ω(n)). But if the hashing went poorly on insertion into the dictionary and they collided every time, I could be searching in one line of the table through all the existing vertices, for every lookup (thus yielding O(n*n)). Am I looking at this correctly?
Then you pointed out that this is the same case for insertion in the table as well (which I indeed did not think about). So you aren't just doing a lookup, you're also doing an insertion at some point, and that has a cost too -- specifically, it costs Ω(1) at best and O(n) at worst for each insertion. But again, since this has to be done n times, it actually winds up with a total cost of between Ω(n) and O(n^2) to build the table.
I think I'm following along okay up to here. But after that I'm hazy again.
If I have an insertion cost of between Ω(n) and O(n^2), and I have a lookup cost of Ω(n) and O(n^2), those need to be combined to get my overall cost with my NSDictionary vertex removal method don't they? If they were combined, wouldn't that be Ω(n^2) but O(n^4)? I still don't understand how you came up with Ω(n), but O(n^3) for it.
As to the float compares after calculations, so far I've been careful to structure the loader class so that I don't do calcs until after lookup. At one point in the development iteration I was doing some float compares but I was using something like if (fabs(v1.x - v2.x) < tolerance), etc. for that. Obviously that's not nearly as efficient as simply comparing the floats straight out of the file, so I re-wrote it to avoid that (at least for now).
akb825
2007.06.12, 08:38 PM
What Cochrane is saying is NSDictionary won't come up with any problems because, though hash values may be the same, it still sees if the objects themselves are the same. If they aren't, then it just moves on to the next possible place to insert it into the hash table.
For the big-O notation, technically you could say that everything is Ω(1) and O(n^n^n^n^n^n...), but the values I'm giving are the tightest bounds. In the case of the hash table, it's actually O(n^3) to build the table. This is because of the fact that when inserting a single item, it might need to rebuild the entire table if it grows to a point that the current table size is suboptimal, so inserting that single item is O(n^2). (since you're re-hashing each of the n items, each of which can possibly try every position in the hash table) Since that's the worse-case time for each insertion, and you're inserting n items, it's O(n^3) to build the table. To find the total complexity of the algorithm, since you build the table and afterwards search for the vertices, you add the complexities together. In this case it's O(n^3) + O(n^2), which is O(n^3 + n^2). Since n^3 > n^2, when n gets very large the n^3 term overshadows the n^2 term, so it's O(n^3) total. Like I said, though: this is absolute worse case, and on average it will be a lot lower, otherwise the brute-force way of nested for loops would be a faster choice. Whether the hash table or tree method is a better choice is a much harder question. For the tree you can say with certainty that the algorithm is Θ(n*lg(n)) and it will always lay in that realm. The hash table will vary based on the input, though, since the lower and upper bounds aren't the same...
AnotherJake
2007.06.12, 09:34 PM
Awesome! It looks like I get it! Thanks! :D
I see where you got the O(n^3) now -- for some reason, it didn't quite register that re-hashing is also a worst-case scenario that must be considered, even though you specifically said that, sorry.:rolleyes: If everything went to hell in a hand-basket, hashing could be an extremely costly affair, but luckily that's not what happens in reality. But even though in practice hash table use is usually wickedly fast (considered effectively O(1) as I've read in a few places), it still doesn't change the fact that it's evaluated as possibly being O(n^3) to build it and possibly O(n^2) to lookup. And I also see now that one would *add* the insertion and lookup together, not multiply them as I was thinking, since they are completely different operations.
I get the NSDictionary thing now too. It doesn't matter how it hashes. For some reason, I was thinking it would throw away my original NSString key and I'd lose the effect, but that wouldn't make sense for the case when it needs to do a finer-level discrimination during a hash collision. Two identical vertices will still hash to the same value regardless. I couldn't quite get the reason why if the objects are equal they will hash to the same value, but that doesn't need to be true in reverse (two different objects having the same hash), but I get it now. Even if that hash value is already used it'll find the right one regardless by comparing the objects (keys) -- just like the man said. Cool!
I realize I've been a bit of a bother about this today, but I really can't thank you guys enough for helping me out here. :D
akb825
2007.06.12, 09:49 PM
I was looking for a distraction from studying, so don't worry. ;) Regardless, this is a very important thing to understand when deciding how to implement algorithms, as there's often different choices each with their different strengths and weaknesses. Sometimes you can immediately choose or reject a method, but oftentimes not. Hopefully some other people will also read through this as well and find it helps them understand the finer aspects of time complexity. :)
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.