View Full Version : Little Endian to Big Endian
Jones
2006.07.16, 06:31 PM
I am attempting to load a file type that is always encoded as little endian. I have all the code for opening the file type and taking the data in done, I just need to know how I should go about converting the endianess of said data. Apparently, converting it to "Host" format, can do this, but only sometimes, so that's not the greatest solution. I found this code on a forum:
inline void endian_swap(unsigned short& x)
{
x = (x>>8) |
(x<<8);
}
inline void endian_swap(unsigned int& x)
{
x = (x>>24) |
((x<<8) & 0x00FF0000) |
((x>>8) & 0x0000FF00) |
(x<<24);
}
// __int64 for MSVC, "long long" for gcc
inline void endian_swap(unsigned __int64& x)
{
x = (x>>56) |
((x<<40) & 0x00FF000000000000) |
((x<<24) & 0x0000FF0000000000) |
((x<<8) & 0x000000FF00000000) |
((x>>8) & 0x00000000FF000000) |
((x>>24) & 0x0000000000FF0000) |
((x>>40) & 0x000000000000FF00) |
(x<<56);
}
I'm not sure how it works, except that it can convert unsigned shorts and ints. Which I need, so if that works, good, but I also need to be able to read char[] data from the file. What kind of routine would I use in that case? And do I convert the data *after* it's read from the file? (Into designated structs.)
Also, does this process cause a big performance drop, usually? If it does, I'll just right a program for taking all my games data, and cycling it to big endian before hand. (Before distributing the game, for example.)
Any help/suggestions, please?
Thanks! :)
OneSadCookie
2006.07.16, 06:38 PM
a char is only one byte, so doesn't need its byte reordered :p
imikedaman
2006.07.16, 06:43 PM
There are two ways to read in char arrays from what I know, although it depends on how it's being stored in the file:
1) Each "string" in the file has a short or long specifying the length of the string, then you just read in that many chars. If you need this method, make sure you byte-swap the length of the string of course!
2) If the file does not have a length specified, odds are it just loops by reading in one char at a time until it hits the null character.
Swapping endianness is very quick, so don't bother creating new data files for everything.
And finally, storing the raw data into structs depends entirely on the format of the file and your implementation of the data structures, so I can't really answer that.
Jones
2006.07.16, 07:46 PM
Thanks for your help, OneSadCookie and mikedaman, your help has clarified things greatly. Is there any way of checking the endianess of the system by code is running on? (If I wanted to make a universal binary for PPC and x86 macs, or I wanted my code to be completely window/mac compile compatible.)
PS: Perhaps a silly question, but the 64bit-dness of some processors won't affect endianess in any way, would it?
akb825
2006.07.16, 07:56 PM
int checkEndian()
{
int test = 1;
if ((*(char *)&test[3] == 1)
return BIG_ENDIAN;
else
return LITTLE_ENDIAN;
}
64 bit computers are the same as 32 bit as far as endianness is concerned.
Jones
2006.07.16, 08:30 PM
int checkEndian()
{
int test = 1;
if ((*(char *)&test[3] == 1)
return BIG_ENDIAN;
else
return LITTLE_ENDIAN;
}
64 bit computers are the same as 32 bit as far as endianness is concerned.
Thanks! I actually found that same source within a couple of minutes of searching. :p Should've googled first.
OneSadCookie mentioned that char's are one byte, so no byte ordering problem. But what If I'm reading a set of chars of unknown length, into an array. Would the word "hello" be stored (figuratively) as "olleh" if read on the wrong endianess of machine. Then again, ever item in the array would not actually become an array until it is loaded onto the memory stack anyway, correct?
EDIT: What if I need to swap... *gasp* a float? (That decimal point must thrown in a complication... *quakes in fear*).
OneSadCookie
2006.07.16, 08:38 PM
any array is stored with element zero first, then one, etc. Only bytes within primitives may be swapped.
you never need to determine endianness at runtime; it's always known at compile time.
#if defined(__BIG_ENDIAN__)
// something like PowerPC
#else
// something like Intel
#endif
Jones
2006.07.16, 08:50 PM
But my code should be able to swap stuff only if It needs too.
(I'm not sure what you mean.)
OneSadCookie
2006.07.16, 08:57 PM
I mean you should test at compile time (with the C preprocessor and #if) to see whether you need to swap stuff, not (redundantly, since you already know the answer) at run-time.
eg. if you have a little-endian data file:
#if defined(__BIG_ENDIAN__)
readMyDataFileWithByteSwapping();
#else
readMyDataFileTheEasyWay();
#endif
Fenris
2006.07.16, 09:23 PM
First off, avoid writing pure floats to file or network, since they're a bit unreliable. Instead, convert them to a fixed 16.16 format or something like it. If you insist, read this article: http://www.gamedev.net/reference/articles/article2091.asp :)
Jones
2006.07.16, 09:41 PM
First off, avoid writing pure floats to file or network, since they're a bit unreliable. Instead, convert them to a fixed 16.16 format or something like it. If you insist, read this article: http://www.gamedev.net/reference/articles/article2091.asp :)
Well, I'm not the one who's encoding them. Whoever wrote the format of this file type decided it would have floats in it. (Damn them... :p ;) )
Thanks for the link tho!
Jones
2006.07.16, 09:43 PM
I mean you should test at compile time (with the C preprocessor and #if) to see whether you need to swap stuff, not (redundantly, since you already know the answer) at run-time.
eg. if you have a little-endian data file:
#if defined(__BIG_ENDIAN__)
readMyDataFileWithByteSwapping();
#else
readMyDataFileTheEasyWay();
#endif
If I wrote this:
static bool CPU_BIG = FALSE;
#if defined(__BIG_ENDIAN__)
CPU_BIG = TRUE;
#endif
And then checked what CPU_BIG was every time I read, it would be ok, si? :\
akb825
2006.07.16, 09:49 PM
To swap floats, you can do a cast such as this:
*(unsigned int *)&myFloat
and then swap it as you would an unsigned int. I've had no problems with floats, since they are an IEEE standard. To answer your other question, the only time you need to swap the bytes are if each individual piece of data is > 1 byte. For example, if you have an array, you count each element separately. If you have a string, each element is only 1 byte, so no swapping is necessary. I've you have an array of shorts, ints, floats, etc. you will need to swap each individual item accordingly.
OneSadCookie
2006.07.16, 10:11 PM
If I wrote this:
static bool CPU_BIG = FALSE;
#if defined(__BIG_ENDIAN__)
CPU_BIG = TRUE;
#endif
And then checked what CPU_BIG was every time I read, it would be ok, si? :\
Now you're doing a run-time test again, when you don't need to!
[spooky voice]Uuuse the #iiiiiiif...[/spooky voice]
Jones
2006.07.16, 11:08 PM
Now you're doing a run-time test again, when you don't need to!
[spooky voice]Uuuse the #iiiiiiif...[/spooky voice]
Wh..wh...who... who said that? *terrified* ;)
You mean... declare the same function, but in one if it's with a conversion, and the other it isn't? Why? That just makes the code harder to read for n00bs like me with no understanding of such things. :p
OneSadCookie
2006.07.16, 11:33 PM
I don't care where you test for endianness -- whether you have two completely separate file-loading functions, or whether you have two different implementations of a readInt32 function, or whether you have a big ugly mess with a bunch of
#if defined(__BIG_ENDIAN__)
swap32(data);
#endif
scattered everywhere, or whether you write a function like this:
static inline uint32_t uint32_little_to_host(uint32_t n)
{
#if defined(__BIG_ENDIAN__)
return ((n & 0x000000ff) << 24) |
((n & 0x0000ff00) << 8) |
((n & 0x00ff0000) >> 8) |
((n & 0xff000000) >> 24);
#else
return n;
#endif
}
or whatever strikes your fancy; just use a compile-time check, not a run-time one!
imikedaman
2006.07.17, 01:24 AM
64 bit computers are the same as 32 bit as far as endianness is concerned.
This is directed towards the original author, but keep in mind that on 64-bit computers, sizeof(float/int/bool/etc.) are 8 bytes instead of 4 (64 bits = 8 bytes). This means that when you want to read in a float or an int or something, make sure you just type a 4 instead of using sizeof or it might be unreliable.
fread(&num, 4, 1, file) <--- something like that I think
akb825
2006.07.17, 01:49 AM
This is directed towards the original author, but keep in mind that on 64-bit computers, sizeof(float/int/bool/etc.) are 8 bytes instead of 4 (64 bits = 8 bytes). This means that when you want to read in a float or an int or something, make sure you just type a 4 instead of using sizeof or it might be unreliable.
fread(&num, 4, 1, file) <--- something like that I think
You are mistaken. The only difference (at least on the Mac) is that longs are 64 bits instead of 32. bool, int, and float are all still 32 bit. (and double is still 64 bit, and long double is still undefined, but I believe it is 128 bits on my G5)
OneSadCookie
2006.07.17, 01:59 AM
For reference:
Mac, 32-bit PPC:
sizeof(char) == 1
sizeof(short) == 2
sizeof(int) == sizeof(long) == sizeof(void*) == sizeof(size_t) == sizeof(float) == sizeof(bool) == 4
sizeof(long long) == sizeof(double) == 8
Mac, 64-bit PPC:
sizeof(char) == 1
sizeof(short) == 2
sizeof(int) == sizeof(float) == sizeof(bool) == 4
sizeof(long) == sizeof(void*) == sizeof(size_t) == sizeof(long long) == sizeof(double) == 8
Mac, 32-bit i386:
sizeof(char) == sizeof(bool) == 1
sizeof(short) == 2
sizeof(int) == sizeof(long) == sizeof(void*) == sizeof(size_t) == sizeof(float) == 4
sizeof(long long) == sizeof(double) == 8
Mac, 64-bit i386:
Find out in early August ;)
Note that on Win64, long is still 32 bits, unlike Mac and Linux.
Note that sizeof(long double) on Mac OS X depends both on architecture and on compiler version. I think that sizeof(long double) == 8 (GCC 3.x/PPC), 16 (GCC 4.x/PPC) and 16 (GCC 4.x/x86), but on x86 the math is done on the x87 unit, which means it's done at only 80 bits of precision, and with a few other caveats. I can't verify those numbers now though (not on a Mac).
If you *ever* care about the size of an integer type, you should use the types in <stdint.h>:
int32_t is a 32-bit integer
uint64_t is an unsigned 64-bit integer
etc.
imikedaman
2006.07.17, 02:15 AM
Really? If that's true, it gives me one more reason to stop listening to my ass hole computer science teachers. Thanks for the enlightenment.
akb825
2006.07.17, 02:48 AM
Either that computer science teacher has no idea what he's talking about, or he's using some obscure compiler to have those sizes.
imikedaman
2006.07.17, 11:56 AM
Either that computer science teacher has no idea what he's talking about, or he's using some obscure compiler to have those sizes.
He says Mac OS Ex. I rest my case. :wacko:
Jones
2006.07.17, 01:33 PM
What's wrong with run time checks? Please, tell me! :blink:
ThemsAllTook
2006.07.17, 02:38 PM
No one seems to have bothered to explain themselves. Here are some reasons:
Since they never change at runtime, checking at runtime wastes CPU cycles.
Since they should never change at runtime, you'd have a potential source of obscure program failure if your endianness variable somehow gets a different value written to it.
Runtime checks are unconventional. (Pretty weak reason, but worth mentioning.)
For reference, here's how I handle it:
#define swapInt32(int32) ((((int32) >> 24) & 0x000000FF) | \
(((int32) >> 8) & 0x0000FF00) | \
(((int32) << 8) & 0x00FF0000) | \
(((int32) << 24) & 0xFF000000))
#if defined(__BIG_ENDIAN__)
#define swapInt32Big(int32) (int32)
#define swapInt32Little(int32) swapInt32(int32)
#elif defined(__LITTLE_ENDIAN__)
#define swapInt32Big(int32) swapInt32(int32)
#define swapInt32Little(int32) (int32)
#else
#error Endianness unknown; cannot proceed
#endif
When I'm reading or writing a big-endian value, I call swapInt32Big. When I'm reading or writing a little-endian value, I call swapInt32Little. When/if I need other types than int32, I can easily add similar macros to support them.
Fenris
2006.07.17, 03:21 PM
I just go with htonl() and ntohl() and be done with it.
Jones
2006.07.17, 03:38 PM
No one seems to have bothered to explain themselves. Here are some reasons:
Since they never change at runtime, checking at runtime wastes CPU cycles.
Since they should never change at runtime, you'd have a potential source of obscure program failure if your endianness variable somehow gets a different value written to it.
Runtime checks are unconventional. (Pretty weak reason, but worth mentioning.)
For reference, here's how I handle it:
#define swapInt32(int32) ((((int32) >> 24) & 0x000000FF) | \
(((int32) >> 8) & 0x0000FF00) | \
(((int32) << 8) & 0x00FF0000) | \
(((int32) << 24) & 0xFF000000))
#if defined(__BIG_ENDIAN__)
#define swapInt32Big(int32) (int32)
#define swapInt32Little(int32) swapInt32(int32)
#elif defined(__LITTLE_ENDIAN__)
#define swapInt32Big(int32) swapInt32(int32)
#define swapInt32Little(int32) (int32)
#else
#error Endianness unknown; cannot proceed
#endif
When I'm reading or writing a big-endian value, I call swapInt32Big. When I'm reading or writing a little-endian value, I call swapInt32Little. When/if I need other types than int32, I can easily add similar macros to support them.
Thats what I meant when I said to OneSadCookie, "declare the same function, but in one if it's with a conversion, and the other it isn't". Admittedly, It was not very clear. I guess I can understand the lost CPU cycles (I hate coding something while knowing all along it's performance is less than it could be.).
Very well, I shall fix it. :)
Just Checking, doing this is ok, oui?
#if defined(__BIG_ENDIAN__)
swapINT(num1);
swapINT(num2);
#endif
Supposing, num1 and num2 had just been loaded from a file.
ThemsAllTook
2006.07.17, 04:05 PM
Just Checking, doing this is ok, oui?
#if defined(__BIG_ENDIAN__)
swapINT(num1);
swapINT(num2);
#endif
Supposing, num1 and num2 had just been loaded from a file.
Sure, that'll work. The only thing is that it clutters your file I/O code a bit. If you use a macro or a function which behaves differently depending on the endianness (as in my example), you only have to have the preprocessor checks in that one place.
OneSadCookie
2006.07.17, 05:32 PM
I just go with htonl() and ntohl() and be done with it.
That's of no use if you're trying to read a little-endian file...
Fenris
2006.07.17, 06:28 PM
That's of no use if you're trying to read a little-endian file...
Oops. I've been following a similar thread over at GameDev.net, forgot that his file format was LE. Sorry 'bout that.
Jones
2006.07.19, 01:18 PM
Is there any difference between swapping an unsigned int and a normal int? They're both 4 bytes.
I'm guessing no.
akb825
2006.07.19, 02:25 PM
There is not, but you would likely want to typecast to an unsigned int before swapping if you're using shifts, otherwise it will do nasty things with the sign bit.
Jones
2006.07.19, 05:26 PM
There is not, but you would likely want to typecast to an unsigned int before swapping if you're using shifts, otherwise it will do nasty things with the sign bit.
Not *exactly* sure what you mean, but I've got two seperate functions. They both do the same thing, but one with int's and one with unsigned ints. Here they are:
#define TINY_ENDIAN 8956432
#define HUGE_ENDIAN 8956234
inline void swapUSHORT(unsigned short& swap_this_short)
{
swap_this_short = (swap_this_short>>8) |
(swap_this_short<<8);
}
inline void swapSHORT(short& swap_this_short)
{
swap_this_short = (swap_this_short>>8) |
(swap_this_short<<8);
}
/*
inline void swapUINT(unsigned int& swap_this_int)
{
swap_this_int = (swap_this_int>>24) |
((swap_this_int<<8) & 0x00FF0000) |
((swap_this_int>>8) & 0x0000FF00) |
(swap_this_int<<24);
}
*/
inline void swapINT(int& swap_this_int) {
swap_this_int = (swap_this_int>>24) |
((swap_this_int<<8) & 0x00FF0000) |
((swap_this_int>>8) & 0x0000FF00) |
(swap_this_int<<24);
}
void swapLONG(long swap_this_long)
{
unsigned char sl_b1, sl_b2, sl_b3, sl_b4;
sl_b1 = swap_this_long & 255;
sl_b2 = ( swap_this_long>> 8 ) & 255;
sl_b3 = ( swap_this_long>>16 ) & 255;
sl_b4 = ( swap_this_long>>24 ) & 255;
swap_this_long = ((int)sl_b1 << 24) + ((int)sl_b2 << 16) + ((int)sl_b3 << 8) + sl_b4;
}
void swapFLOAT(float swap_this_float)
{
union
{
float sff;
unsigned char sfb[4];
} sf_dat1, sf_dat2;
sf_dat1.sff = swap_this_float;
sf_dat2.sfb[0] = sf_dat1.sfb[3];
sf_dat2.sfb[1] = sf_dat1.sfb[2];
sf_dat2.sfb[2] = sf_dat1.sfb[1];
sf_dat2.sfb[3] = sf_dat1.sfb[0];
swap_this_float = sf_dat2.sff;
}
int getENDIAN()
{
int ge_i = 1;
char *ge_p = (char *) &ge_i;
if (ge_p[0] == 1)
return TINY_ENDIAN;
else
return HUGE_ENDIAN;
}
I included the entire file for a reason, is it ok to do the same with shorts, like I did?
Thanks!
OneSadCookie
2006.07.19, 05:35 PM
gah! You and your run-time checks! There is no need for a getENDIAN() function, you already know what it's going to return as you compile your program!
If you wanted to have it, just for debugging, you should still write it as
int getEndian()
{
#if defined(__BIG_ENDIAN__)
return HUGE_ENDIAN;
#else
return TINY_ENDIAN;
#endif
}
Yes, it does matter that you don't byte-swap signed integers with the functions you've written. C makes no guarantee as to whether >> is arithmetic or logical, and Microsoft's compiler at least (don't know about GCC, it could be architecture-specific even) treats it as arithmetic.
akb825
2006.07.19, 05:40 PM
What I mean by shifting and the sign bit is, if the number is a signed integer and is negative, it will try to keep the signed bit a 1 to keep it a negative number. As a result, if your number is negative to begin with (swapped or unswapped), it will be negative afterwards regardless of if it should or should not be negative.
Also, just FYI, for the float, instead of using a union you could also just do this:
unsigned int temp = *(unsigned int *)&swap_this_float;
swapUINT(temp);
swap_this_float = *(float *)&temp;
You can swap doubles the exact same way, but with long longs. BTW, your swapLONG function makes no sense to me... Also, keep in mind that longs are 32 bits on 32 bit machines and 64 bits on 64 bit machines. (on OS X, at least)
Edit: OSC, are the __BIG_ENDIAN__ and __LITTLE_ENDIAN__ flags defined in Microsoft's compiler and others? If it's only defined in gcc, and you want to target other platforms, then you may not be able to rely on compile-time checks.
OneSadCookie
2006.07.19, 05:51 PM
__BIG_ENDIAN__ is a GCC thing, but there'll always be an equivalent check for other platforms.
#if defined(__LITTLE_ENDIAN__) || defined(_WIN32)
for example ;)
OneSadCookie
2006.07.19, 05:52 PM
The union approach is better than the type-punned-pointer approach, because it doesn't break strict aliasing rules. (-fstrict-aliasing in GCC)
Jones
2006.07.19, 07:45 PM
What I mean by shifting and the sign bit is, if the number is a signed integer and is negative, it will try to keep the signed bit a 1 to keep it a negative number. As a result, if your number is negative to begin with (swapped or unswapped), it will be negative afterwards regardless of if it should or should not be negative.
If it was negative to begin with, wouldn't I want to keep it negative? Do you mean a case where it's not meant to be negative, but a big endian loads as negative from a little endian file because it's all greek as far as it's concerned?
BTW, your swapLONG function makes no sense to me...
Me neither. I borrowed it from some other code, and never used it. :lol:
OneSadCookie
2006.07.19, 08:08 PM
(int)0x80000000 >> 24 may be either (int)0xffffff80 or 0x00000080, depending on the compiler's whimsy. If it's the former, you will not get the result you want. (unsigned int)0x80000000 >> 24 is always 0x00000080.
akb825
2006.07.19, 08:53 PM
What I mean by "keeping negative" is if the swapped value, which isn't what you want, is negative, then it will make the final value negative. For example, if the pre-swapped value is 0x80000000, you would expect the final value to be 0x00000080. However, the final value would end up being 0xFFFFFF80 because it extends the signed bit, and you will therefore need to make it unsigned.
The compiler is supposed to use arithmetic shifting if it's a signed value, and logical for an unsigned value. I'm pretty sure that gcc follows that. It's not like it's too difficult to follow, though, seeing that AFAIK they are 2 separate instructions. I guess that it automatically treats hex as an unsigned value, which is why you need to cast it to make it signed..
OneSadCookie
2006.07.19, 10:38 PM
No, the compiler is allowed to do whatever it wants for a right shift of a negative signed value:
http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1124.pdf
(section 6.5.7, paragraph 5)
akb825
2006.07.19, 11:55 PM
Huh, interesting. At least it's consistent with unsigned values. :p
Funny, though, saying the section like that reminds me of quoting Star Fleet regulations. ;)
Jones
2006.07.20, 03:42 PM
Ack! I still have no idea what I should do here. I think I'll just make all my ints unsigned, that'll simplify things. :)
Jones
2006.07.20, 04:07 PM
Ack! I still have no idea what I should do here. I think I'll just make all my ints unsigned, that'll simplify things. :)
Holy... you aren't gonna believe this... but I converted all my ints to uints, and the code works great! (My MD2 class problem is solved!)
vBulletin® v3.6.8, Copyright ©2000-2008, Jelsoft Enterprises Ltd.