PDA

View Full Version : Handling memory errors.


Skorche
2006.11.28, 03:20 PM
Right now in my physics library I've just been using assert() calls to check that malloc() and its cousins don't return NULL.

Is it even worth the time to write code to recover from a memory error on modern operating systems? Do they actually happen in non-extreme cases?

akb825
2006.11.28, 04:33 PM
I figure that if your 2 choices are exit because of lack of memory or simply crash because of lack of memory, there's no point in checking every single call to malloc you have. Besides, with virtual memory you can technically have the entire address space available to your program, so memory should only be a problem if there's a bug that's allocating way too much memory or if their program is doing way too much stuff.

I'm sure that there's plenty of people who advocate that you check every memory call, though.

OneSadCookie
2006.11.28, 04:36 PM
Most UNIX programs won't bother, assuming that if you've reached the limits of the address space (2-4GB allocated to your program on a 32-bit system, depending on your OS) then you have worse problems than just crashing, and that if space can't be allocated for the memory (eg. the partition on which the page file resides is full) you also have worse problems than just crashing.

Lots of old-school Mac programmers find this attitude very cavalier, and like to attempt to catch and either recover from or report all out-of-memory errors.

Make your choice ;)

AnotherJake
2006.11.28, 05:04 PM
Memory was usually quite limited (per application) in Classic, although you could adjust it somewhat, so it made sense to check to see if you just ran out! I don't bother with malloc checks anymore on OS X because of the reasons already listed -- if malloc fails, then there are certainly much bigger problems going on! In fact, just this very afternoon I was cleaning up some old code and removed the malloc checks while I was at it.

Skorche
2006.11.28, 05:24 PM
Most UNIX programs won't bother, assuming that if you've reached the limits of the address space (2-4GB allocated to your program on a 32-bit system, depending on your OS) then you have worse problems than just crashing, and that if space can't be allocated for the memory (eg. the partition on which the page file resides is full) you also have worse problems than just crashing.

I kinda figured that was the case. I guess I'll just stick with the assert statements then.

Frogblast
2006.11.29, 12:33 AM
On modern operating systems (I know that at least OS X and Linux do this, and likely almost all others), a successful malloc doesn't even guarentee that the memory is available. Memory is significantly overcommitted.

If new VM space is alloated, the pages will be marked such that you'll trap when first accessing them, and only then will the OS allocate storage. If that storage can't be allocated, you have no way to check for or recover from that situation. Most likely, the OS will terminate one or more processes to make space (possibly yours, but not necessarily).

However, if you're going to be allocating 2-4GB of VM space, it's still worth checking, because malloc will obviously fail if you have no address space left.

My rule: Don't bother checking for fixed size allocations (especially anything less than a few pages). For anything large, very high quantity, or where the user can significantly change the allocation size, then you need to check.

Finally, be careful where you put your asserts. If you ever disable assertions, remember that any expression inside will go away. Once I had a lot of code like:

assert(pthread_mutex_lock(x) == 0).

Removing assertions for release caused strange things to happen...

AnotherJake
2006.11.29, 12:50 AM
However, if you're going to be allocating 2-4GB of VM space, it's still worth checking...
[chuckle] If you're allocating that kind of memory, you should probably be pretty suspicious in the first place of why your program keeps crashing! Of course, you'd first have to wonder why it takes forever and a day to load that entire DVD into memory every time you test it.

Andrew
2006.12.06, 05:41 PM
I was actually thinking about this just the other day. If you really want a fault tolerant system, you could write wrappers around the malloc family of functions:

void * myMalloc(size_t size)
{
void * ptr = malloc(size);
while (ptr == NULL) {
sleep(5);
ptr = malloc(size);
}
return ptr;
}

While this might be useful for NASA programmers, a modern pragmatic game developer shouldn't have any need to do such a thing.

If you were developing an embedded system, you might think about combining the above code with a watchdog timer (http://en.wikipedia.org/wiki/Watchdog_timer).

OneSadCookie
2006.12.06, 06:02 PM
That only helps if your application is leak-free (if you leak, you'll hang forever holding 2-4GB of RAM, which is very unhelpful), and if you think that hanging is a better response to the system's inability to provide you with memory than crashing, which is highly debatable...

If you're NASA, you have strict real-time constraints, so this is completely inappropriate :p

unknown
2006.12.06, 06:41 PM
Code excerpt from Wolfenstein 3D


void *AllocSomeMem(LongWord Size)
{
void *MemPtr;
Word Stage;

Stage = 0;
do {
Stage = FreeStage(Stage,Size);
MemPtr = NewPtr(Size); /* Get some memory */
if (MemPtr) {
return MemPtr; /* Return it */
}
} while (Stage);
if (!NoSystemMem) {
MemPtr = NewPtrSys(Size);
}
return MemPtr;
}


static Word FreeStage(Word Stage,LongWord Size)
{
switch (Stage) {
case 1:
PurgeAllSounds(Size); /* Kill off sounds until I can get memory */
break;
case 2:
PlaySound(0); /* Shut down all sounds... */
PurgeAllSounds(Size); /* Purge them */
break;
case 3:
PlaySong(0); /* Kill music */
FreeSong(); /* Purge it */
PurgeAllSounds(Size); /* Make SURE it's gone! */
break;
case 4:
return 0;
}
return Stage+1;
}


In a freaking TINY box :/

Andrew
2006.12.08, 07:35 AM
What if the reason that the system can't give you any more memory is because some OTHER process has a severe memory leak? In such a case, I think it would be appropriate to call malloc every 5 seconds until it succeeds. After all, the system is sure to do something about that runaway process at some point, right?

OneSadCookie
2006.12.08, 07:50 AM
Only if that processes aren't written the same way... then the user will be stuck with a full hard disk and no way to kill any processes :p

Andrew
2006.12.08, 07:51 AM
According to this guy (http://www.eskimo.com/~scs/cclass/asgn.int/PS5.html),

... while actual out-of-memory conditions may be rare, other kinds of malloc failures are not so rare, especially in a program under development. If you misuse the memory which malloc gives you, perhaps by asking for 16 bytes of memory and then writing 17 characters to it (and this is all too easy to do), malloc tends to notice, or at least to be broken by your carelessness, and will return a null pointer next time you call it. When malloc returns a null pointer for this reason, it can be difficult to track down the actual error (because it typically occurred somewhere in your code before the call to malloc that failed), but if we blindly used the null pointer which malloc returned to us, we'd only defer the eventual crash even farther, and it might be quite mysterious, much more so than a definitive "out of memory'' message. Therefore, using a wrapper function like chkmalloc is a definite improvement, because we get error messages as soon as malloc fails, and we don't have to scatter tests all through our code to get them.

Andrew
2006.12.08, 07:53 AM
Only if that processes aren't written the same way... then the user will be stuck with a full hard disk and no way to kill any processes :p

Quite true :lol:

ThemsAllTook
2006.12.08, 09:58 AM
but if we blindly used the null pointer which malloc returned to us, we'd only defer the eventual crash even farther
Dereferencing a NULL pointer crashes immediately on Mac OS X (and other UNIX-based systems, and Mac OS 9, and Windows 2000/XP, if I'm not mistaken).

OneSadCookie
2006.12.08, 05:56 PM
I'm not against having a replacement for malloc which checks every result, and exits with a message to standard error if it fails, I just don't think you should randomly hang when something that dramatic goes wrong -- assuming it's not your fault is arrogant and probably unhelpful.