iDevGames Forums

Full Version: Need A Very Fast Blit Routine
You're currently viewing a stripped down version of our content. View the full version with proper formatting.


I am working on a 2d isometric engine for an rpg. The core of my graphics engine is nearly finished. Im not using OpenGL or SW or any of those libraries for any graphics routines at all. I use CopyBits() to copy my assembled frame from an offscreen gworld to the screen. This one bit of code takes up 85-90% of my cpu time (literally). I have put together a couple of simple blit routines in place of copybits whch run approximately 20 frames per second faster, but I have an aversion to drawing directly to the screen myself without fully understanding the implications of all of my code. I would like very much to avoid using OpenGL, SpriteWorld, and any other libraries such as these for my blit routine. Does anybody know where I can get very fast (and simple, easy to incorperate) blit routine that I can simply code into my engine without including libraries?
Welcome to iDevGames!

If SpriteWorld's is so good, it's yours to use. It's free software. (pretty much zlib/libpng license) You're probably thinking of BlitPixie, which has the same license I believe.

Before this, though, you probably want to look into ways to reduce the amount you are copying. Have you done this? (Only copying rectangles that are "dirty" and need updating, for instance)

And of course, you're not gaining anything by avoiding OpenGL these days.

JohnDoe555 Wrote:...but I have an aversion to drawing directly to the screen myself without fully understanding the implications of all of my code. I would like very much to avoid using OpenGL...
It isn't possible to get direct access to the screen in OS X (that I know of). Yer gonna hafta go through some lib or system API, and OpenGL is best for best performance. glTexSubImage2D is the routine to call. There are some Apple documented DMA techniques to make it quicker, but I don't have the link handy. Basically, what you wind up doing is rendering to a buffer and uploading it really fast to OpenGL as a texture on a quad that covers the screen, or window. It seems strange if you're still stuck in the old school, but that's the way it is now.
AnotherJake Wrote:It isn't possible to get direct access to the screen in OS X (that I know of).
Yeah, you can. That's what BlitPixie does. You can draw over everything...

Yeah, but it's still through a lib, not a memory address.


In response to the first person who responded to my post I will try to clarify my situation; my engine renders tiles and then sprites to an offscreen gworld (it does so very fast and with no speed problems whatsoever). Once the frame is fully assembled on this offscreen gworld it is ready to be copied to the screen --- here is my problem..... I need to copy the offscreen gworld to my screen (I am currently using CopyBits() and it is way too slow). And by the way I have been using this forumn for years under the name DrKane555 or something like that, ty for the welcome anyways....

Maybe I'm incorrect (I don't think I am) but I believe that you can draw directly to the screen in OSX(I'm using Carbon if that matters?). In the simplistic blit routine I put together, the pixels are copied from the source offscreen gwrold to the screen device's pixelmap.--- I guess all that I really am asking for is one very optimized blit routine (like copybits but optimized) which copies a pixelmap (a buffer of pixels) from a source buffer (an offscreen gworld) to a destination buffer(the screen device's pixelmap). I would like to save myself the trouble of using a whole API for one single blit routine if possible.
You get the pointers through QuickDraw and pass them to BlitPixie. Okay, so it's deprecated in Tiger. It works. Smile
Something like this:

PHP Code:
#include <QuickDraw.h>

GDHandle    mainGDH;
PixMapHandle    basePixMap;
Ptr baseAddress;

mainGDH GetMainDevice();
basePixMap = (**mainGDH).gdPMap;
baseAddress = (**basePixMap).baseAddr

Some reference:

I paste this with an asterisk. That asterisk: Ride the OpenGL walrus. Get a context, stick your tiles into textures, draw a few quads, and get a much higher performance. How can you be obsessed with the performance of a blitter when you don't look at the blindingly obvious option blinking to you on a neon scrolling marquee?

QuickDraw is ded.

JohnDoe555 - CopyBits() is way too slow now. It was the thing to use back in the Classic days, but not anymore. The best way to improve your rasterization performance is to use OpenGL. Look, you don't need to learn very much at all to use OpenGL just to texture a quad covering your window or screen. Only the very basics are required. You can do the first few lessons at NeHe and be a pro with what you're trying to do. OpenGL is part of OS X so you needn't look at it as some extra complication. What you're looking to do doesn't require more than a couple days worth of investigation and implementation if you keep your sights clean - a week tops, if you get distracted easily.
You should be using OpenGL. OpenGL is /the/ way to get fast graphics on Mac OS X. You gain nothing by avoiding it, and lose much.

You /can/ blit directly to the screen on Mac OS X (there's a function in CoreGraphics to get the base address of the screen), but assuming you know the layout of the framebuffer is, IMO, too big a guess to make. Particularly when there are faster, more flexible methods available of getting stuff to the screen.
BlitPixie is what you want. You can get it without all the extra SpriteWorld cruft. Up to X.2 it was the fastest blitter around (and I tried hard to beat it). I haven't used it since X.2, because that style of drawing is dead on the Mac. OpenGL all the way.
... particularly when you'll blit on top of iChat hovering windows, the volume/brightness overlays, and other apps that are supposed to be composited on top of your app.
... particularly when you need to accommodate the user's display color depth. Your blit routine handles 15 bpp and 32 bpp?
... particularly when you need to accommodate the user's multiple displays, and your window spanning multiple displays, each possibly with different color depths, refresh rates, aspect ratios...
... particularly when Quickdraw is old and busted and OpenGL is the new hotness.
AnotherJake Wrote:QuickDraw is ded.

It's deprecated.

How to do the same with CoreGraphics then:

*waves OpenGL neon scrolling marquee around some more*
aarku Wrote:It's deprecated.
It's ded.

Anyway, hair-splitting aside, I was curious about just how hard it would be to come up with a reasonably simple example of how to render into a buffer and call it an OpenGL texture. I did this a couple years ago with another little hair-brained project I was working on but I forgot how easy it was. I just redid it and got about 50 fps on my dual G4 867 with GeForce4mx. It ain't great, but it's not horrid either. Radeon 7k on a G4 500 got around 25 fps. I tried some DMA techniques earlier too, but they didn't work out as well as I had hoped so I left it with just the rectangle textures, which means it won't work on a Rage128, but oh well... You have to provide your own OpenGL context, but here's the meat:
// NOTE: This is an attempt to plan for endian issues with future
// Intel-based Macs.  Performance might be a problem but it should
// at least keep the colors consistent.
#if __BIG_ENDIAN__

// alternatively, the colors could be swapped here for endian issues
#define BYTE_A    0
#define BYTE_R    1
#define BYTE_G    2
#define BYTE_B    3

#define BYTES_PER_PIXEL  4
#define BUFFER_WIDTH     640
#define BUFFER_HEIGHT    480

unsigned char    textureBuffer[ROW_BYTES * BUFFER_HEIGHT];
GLuint            textureID;

void InitGLTextureToBeUsedAsAFrameBuffer(void);
void DrawGLFrame(float viewWidth, float viewHeight);
void DrawSoftwareFrame(int rows, int rowBytes, unsigned char *frameBuffer);

// call once from initialization code
void InitGLTextureToBeUsedAsAFrameBuffer(void)
    unsigned int    grayPixel = 0xff7f7f7f;

    memset((unsigned int*)textureBuffer, grayPixel, BUFFER_HEIGHT * ROW_BYTES);
    glGenTextures(1, &textureID);
    glBindTexture(GL_TEXTURE_RECTANGLE_EXT, textureID);
                0, GL_BGRA, CLIENT_IMAGE_FORMAT, textureBuffer);

// call once every time through the drawing loop, or timer
void DrawGLFrame(float viewWidth, float viewHeight)
    DrawSoftwareFrame(BUFFER_HEIGHT, ROW_BYTES, textureBuffer);

    glViewport(0, 0, viewWidth, viewHeight);
    gluOrtho2D(0.0f, 1.0f, 0.0f, 1.0f);

    glBindTexture(GL_TEXTURE_RECTANGLE_EXT, textureID);
                    BUFFER_HEIGHT, GL_BGRA, CLIENT_IMAGE_FORMAT, textureBuffer);
        glTexCoord2f(0.0f, 0.0f);
        glVertex2f(0.0f, 0.0f);
        glTexCoord2f(BUFFER_WIDTH, 0.0f);
        glVertex2f(1.0f, 0.0f);
        glTexCoord2f(BUFFER_WIDTH, BUFFER_HEIGHT);
        glVertex2f(1.0f, 1.0f);
        glTexCoord2f(0.0f, BUFFER_HEIGHT);
        glVertex2f(0.0f, 1.0f);


// fills the entire buffer with animating shade of blue from dark to light
void DrawSoftwareFrame(int rows, int rowBytes, unsigned char *frameBuffer)
    int           i, j, rowOffset, index;
    static int    value = 0;

    value += 10;
    if (value > 255) value = 0;
    for (i = 0; i < rows; i++)
        rowOffset = (i * rowBytes);
        for (j = 0; j < rowBytes; j += BYTES_PER_PIXEL)
            index = rowOffset + j;
            frameBuffer[index + BYTE_R] = 0;
            frameBuffer[index + BYTE_G] = 0;
            frameBuffer[index + BYTE_B] = value;
I did one version that used an AGP hint with TextureRange and it got up to over 200 fps, but I didn't do any software rendering, just an upload of a static texture every frame. For whatever reason, leaving out TextureRange helped speed things up when I included DrawSoftwareFrame in the mix. That helps suggest to me that a lot of this code's performance is limited by the software rendering routine, but that's still largely conjecture since I didn't bother profiling. I'd bet that threading the software rendering routine on dual proc machines would really hum along.

NOTE: Using OpenGL for all of your primitives and sprites is far faster than the method I'm presenting here.
Reference URL's