Need help with pixel oriented framebuffer

Apprentice
Posts: 13
Joined: 2011.02
Post: #16
Thanks Warmi, I have a few questions on your swizzling. I read in another thread that if you use GL_APPLE_texture_format_BGRA8888, you can avoid swizzling, and you would be stuck with 888 format instead of 565. So 24 bit (32 with alpha) instead of 16 bit. Is there BGRA565?

In any case, since the iPhone 3GS is 18bits only, and the iPhone 4 is 24bits, targeting the iPhone 4 with 565 would be a waste, as the IPS display would show the banded colors when you can have smooth shades with 888.

But I am swaying from the topic. I want to know how much speed increase you get when using BGRA instead of regular RGBA. I am thinking Apple uses non-standard sub-pixel arrangement
on the iPhone screen. Blue-Green-Red Blue-Green Red... Instead of the standard
Red Green Blue Red Green Blue. In this case, I would think it is rearranging the sub-pixel values to match the screen.

This is getting a little technical. How does cache coherency relate to this? Does anyone know (an expert) how after changing to BGRA, what other twiddling needs to be done? And if this is so slow (this swizzling and twiddling), can one simply bypass this slowness by using ImageUI? Maybe apple purposely provide a non-directly way to access the hidden surface buffer api via ImageUI?
Quote this message in a reply
Member
Posts: 166
Joined: 2009.04
Post: #17
(Feb 28, 2011 01:34 AM)edepot Wrote:  Thanks Warmi, I have a few questions on your swizzling. I read in another thread that if you use GL_APPLE_texture_format_BGRA8888, you can avoid swizzling, and you would be stuck with 888 format instead of 565. So 24 bit (32 with alpha) instead of 16 bit. Is there BGRA565?

In any case, since the iPhone 3GS is 18bits only, and the iPhone 4 is 24bits, targeting the iPhone 4 with 565 would be a waste, as the IPS display would show the banded colors when you can have smooth shades with 888.

But I am swaying from the topic. I want to know how much speed increase you get when using BGRA instead of regular RGBA. I am thinking Apple uses non-standard sub-pixel arrangement
on the iPhone screen. Blue-Green-Red Blue-Green Red... Instead of the standard
Red Green Blue Red Green Blue. In this case, I would think it is rearranging the sub-pixel values to match the screen.

This is getting a little technical. How does cache coherency relate to this? Does anyone know (an expert) how after changing to BGRA, what other twiddling needs to be done? And if this is so slow (this swizzling and twiddling), can one simply bypass this slowness by using ImageUI? Maybe apple purposely provide a non-directly way to access the hidden surface buffer api via ImageUI?

You may find this blog post interesting.

http://cmgresearch.blogspot.com/2010/10/...ios40.html
Quote this message in a reply
Apprentice
Posts: 13
Joined: 2011.02
Post: #18
Sorry for being such a noob, but I still have a lot of questions. Yes, I am getting to the plotting code, but there is just so much complexity with all the different frameworks and how they all tie together and influence each other. I don't yet have a complete picture yet, just fragments of this and that, and it is slowing me down.

For example, on pixel shaders, is it available in OpenGS 1.1 AND 2.0? Or just 2.0? I know that the MBX only supports 1.1, and the SGX in 3gs and 4 supports both 1.1 and 2.0. Is pixel shaders one way operation? I am also new to pixel shaders. Someone mentioned that you can only read the current value, but I thought Pixel shaders are to do stuff to the pixels, not just read. So in this case, to poke the whole screen pixel by pixel using pixel shaders:

Begin per pixel function:
Read current value <- can probably be commented out
And with 123
Or with 123
End pixel function

Wouldn't the above ignore the current pixel value and poke it with a value? (123 in this instance). And even better, can't I just replace both the And and Or operation and substitute it with a write value 123? Am I totally on the wrong track? And this is not how pixel shaders work? Don't tell me pixel shaders only allow you to read the value and then you must manually write an extra OpenGL operation after reading it? I thought you can both read and do stuff to pixels in one atom operation?
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #19
(Mar 1, 2011 01:01 AM)edepot Wrote:  Sorry for being such a noob, but I still have a lot of questions.

The definitive answer to all of your "how does it work?" OpenGL questions is: read the specification.

Start here. You will see that OpenGL ES is based on OpenGL (the "Difference Specification" highlights the changes, for programmers already familiar with OpenGL) so you might also want to look here.

There are hundreds of tutorials and books for learning OpenGL. All of them are distilled from the specification, usually adding the author's personal experience about "what works well".



edepot Wrote:there is just so much complexity with all the different frameworks

Nobody said it would be easy. Truthfully, programming is hard. OpenGL in particular is quite complicated.



edepot Wrote:I am also new to pixel shaders.

Fragment shaders are only in ES2.0. ES1.1 is fixed-function only. ES2.0 is shaders only. Desktop OpenGL is both.

Fragment shaders in ES2.0 can not read the current value of a pixel in the framebuffer, although this is an often-requested feature.

Generally speaking, the "massively parallel" execution design of GPUs is optimized for unidirectional data flow. Imagine a thousand processors, all executing the same program, and each writing out a single output (color.) Now imagine another block of hardware responsible for storing those outputs into the framebuffer (and doing a bunch of raster operations along the way, like blending and masking.) In a deeply pipelined architecture with various caches and reordering, making the previous cycle's result available as an input to each processor is a big synchronization problem.

OpenCL and modern OpenGL do support completely random-access read-write data. But this is fraught with data parallelism race conditions, and it is up to the programmer to understand the GPU execution model and use barriers appropriately to ensure correct execution. See "Shader Memory Access Ordering" in that second link for an idea of how complex this is. Of course this flexibility is also slower than traditional (unidirectional) rasterization.



Going back to your original post: you should clarify what you're really trying to do. If you want a "dumb framebuffer" then using a CGBitmapContext made from your own malloc is probably the easiest thing to do. You can twiddle the pixels manually as much as you want and then update a CALayer with the bitmap.
Quote this message in a reply
Apprentice
Posts: 13
Joined: 2011.02
Post: #20
Yes, this gets to the gist of this post. Speed. I've gotten conflicting comments from people on this thread saying glsubtextimage() is fastest, others saying the cache twiddling will slow it down, and you should use fragment shaders. And still others (like you and others) stating to use CALayer, or from the layer gathered from AVFoundation. This is also in addition to the hidden surface buffer layer for jailbroken phones (which I can't use because of app store requirements).

Thats five ways to plot pixels to the display, and I can't nail down which is fastest. I just want a background layer, then plot pixels on top of it. The background layer may change, but preferably it is just a large bitmap where you change the pointer to the location of the large bitmap that gets displayed on the screen. I need double buffering, so that I can copy the background, plot some pixels on it, switch the display pointer to it, go to the other buffer and start over. I don't even know if the iPhone 4 has enough reserved main memory for 3 buffers (one to store the background in case it never changes, two more for the alternating display/work buffer). Two buffers suffice too, if there is a fast way to copy the background from main memory to the video memory (which I think is reserved in the main memory as well). Which would be fastest of those four (legitimate) methods? Or are there only three methods, and your method is the same as that stated in http://cmgresearch.blogspot.com/2010/10/...ios40.html

Now for the questions:

If the fragment shader cannot read from display framebuffer, I assume it can read the texture pixel data of an already displayed polygon right? This would be in the data section of the video memory? Or is there no such thing on SGX, and all the reserved main memory for video are for display only? Or are you saying the fragment shader can only read from main memory (via a passed array of location of textures opengl needs). In other words, fragment shaders cannot ever do anything to video memory except write the results of the calculations, and those shader operation calculations happen from non-video main memory side (where all the data is located). This would mean once a 3D scene is displayed, you can't ever go back and modify it on-the-fly. You need to do the shader operations on previous data in non-video main memory (probably arrays of data) and re-display the 3D scene.

Now for the technical question that will probably not be found from the links you provided: Does using BGRA888 allow bypassing a cache twiddling step everyone keeps saying in this thread? Is it possible to substitute glsubtextimage() with a fragment shader function that is faster? Given the need to use a background image and an array of vertices to plot. Will it be fastest?
Quote this message in a reply
Member
Posts: 166
Joined: 2009.04
Post: #21
(Mar 1, 2011 09:06 AM)edepot Wrote:  Or are you saying the fragment shader can only read from main memory (via a passed array of location of textures opengl needs). In other words, fragment shaders cannot ever do anything to video memory except write the results of the calculations, and those shader operation calculations happen from non-video main memory side (where all the data is located). This would mean once a 3D scene is displayed, you can't ever go back and modify it on-the-fly.

That's pretty much the case.
The way around it is to render to rtt targets ( off-screen texture targets.)
Most screen space effects are done that way - you render to a texture as many times as you need and then use the rtt-based texture as a read-only texture when rendering to the framebuffer.
Quote this message in a reply
Sage
Posts: 1,232
Joined: 2002.10
Post: #22
(Mar 1, 2011 09:06 AM)edepot Wrote:  Yes, this gets to the gist of this post. Speed. Thats five ways to plot pixels to the display, and I can't nail down which is fastest.
Picking a rendering design based on performance advice in a forum thread sounds like premature optimization to me.

If you are concerned about performance, you should write some benchmarks. Pick two or three of those methods and benchmark them.

Because it sounds like you want to do a bunch of manual pixel manipulation on the CPU, you should probably also benchmark simple things like memcpy() and for() loops filling out pixels, so you have some baseline data on the performance of your hardware.

edepot Wrote:I just want a background layer, then plot pixels on top of it. The background layer may change, but preferably it is just a large bitmap where you change the pointer to the location of the large bitmap that gets displayed on the screen. I need double buffering, so that I can copy the background, plot some pixels on it, switch the display pointer to it, go to the other buffer and start over.
That's the most useful info you've given so far.

Can you quantify that further? How often will the background layer change? How much of it changes? How many pixels will you plot on top each frame?

How many bytes per second do you need to move from the CPU to the screen?

edepot Wrote:Does using BGRA888 allow bypassing a cache twiddling step everyone keeps saying in this thread?
No. It can only bypass the component order swizzle. Not the cache twiddle.

On iOS devices, the fastest possible texture upload will be had with the PVRTC compressed formats, because they are already in GPU-friendly layout. These formats are only appropriate for static (i.e. compressed off-line) content, so they're no good for your described use case. All other texture formats will be twiddled.
Quote this message in a reply
Apprentice
Posts: 13
Joined: 2011.02
Post: #23
I am assuming rtt textures are to satisfy the double buffering? If not, then I don't see the point in rendering to something off screen when it can render it on screen. Since all the data are in main memory and not video memory, they won't show until you actually start plotting to the video memory. Or am I missing something? What is the point of rtt? Are you saying if you render to main memory via rtt texture, then you can break through the limitation of fragment shaders not being able to read from the rendered texture? I thought there were no instructions for fragments shaders to read already rendered stuff (like in video memory, and wouldn't this apply also to rendering to main memory via rtt?). In other words, since you have already passed everything the fragment shaders need to render to video memory buffer, what is the point of rendering to a rtt texture outside of display, THEN make that a read only texture? Wouldn't it slow it down? Because then you would need to use glsubtextimage2d() on that read only texture to copy back to the polygon in video memory, and add in a twiddle and swizzle step fragment shaders were supposed to bypass? Or are you saying you can use rtt texture to render something in video memory that is not shown?

As for the requirement, it would most likely be a full screen background texture that changes each cycle. However, that background texture can be a small segment of a huge bitmap, and simply moving the display pointer to it, so no need to do any flash memory loading, as everything will be in main memory. The number of pixels would be: as much as possible to satisfy 30fps. So a background copy, for loop of many pixels, then send to display buffer via fragment shader or texture. This on retina iphone 4 display. 960x640
Quote this message in a reply
Member
Posts: 40
Joined: 2009.05
Post: #24
Why not load up your huge bitmap to the GPU as a texture and then just draw the segments you want to display from the texture?

Depending on what you mean by huge you might need to load it into multiple textures - on < 3GS phones you can have a texture 1024x1024 on later phones 2048x2048. Depending on the definition of "huge" you might need to unload the textures for areas that aren't going to be displayed anymore and upload bits that are about to be displayed.

Do you mean by changes each cycle that the contents of the huge bitmap are changing? Or is it simply that the section of the bitmap you want in the background is changing (like a simple scrolling background)?

(Mar 2, 2011 01:10 PM)edepot Wrote:  As for the requirement, it would most likely be a full screen background texture that changes each cycle. However, that background texture can be a small segment of a huge bitmap, and simply moving the display pointer to it, so no need to do any flash memory loading, as everything will be in main memory. The number of pixels would be: as much as possible to satisfy 30fps. So a background copy, for loop of many pixels, then send to display buffer via fragment shader or texture. This on retina iphone 4 display. 960x640
Quote this message in a reply
Member
Posts: 23
Joined: 2010.08
Post: #25
I feel this topic has gone off in a tangent. The original question was if anybody could provide a barebone iPhone example to plot a pixel to the screen.

We all agree that OpenGL ES is the lowest graphics API available on the iPhone - but at this point we start discussing how to abuse the API to get an interface similar to direct framebuffer access. OpenGL ES is a hardware accelerated API and was never meant to give the user direct framebuffer access!

If you are still interested in plotting a pixel in OpenGL ES - here's the correct way:
Code:
const float pos[]= {0, 0};
glPointSize (1);
glColor4f (1.0, 0.0, 0.0, 1.0);
glEnableClientState (GL_VERTEX_ARRAY);
glVertexPointer (2, GL_FLOAT, 0, pos);
glDrawArrays (GL_POINTS, 0, 1);
glDisableClientState(GL_VERTEX_ARRAY);

You've seen how to plot a point - now stop and don't ever think about OpenGL again in terms of pixels! Go learn how to draw quads and how to use textures in OpenGL ES. Take the time to learn the API - and stop treating it like a dumb ol' framebuffer. Use the API the way it was meant to be used - and you will achieve effects faster and simpler. Not to mention it will be hardware accelerated!

Here some OpengGL ES tutorials to get you started:
Learning OpenGL ES will be well worth your time - I promise!

The Monkey Hustle - Now available on the App Store!
Quote this message in a reply
Post Reply 

Possibly Related Threads...
Thread: Author Replies: Views: Last Post
  What is the fastest way to blit a framebuffer to the iPhone screen? Rasterman 20 27,382 Mar 2, 2011 08:26 AM
Last Post: arekkusu
  rendering framebuffer texture not working lookitsash 2 4,009 Oct 30, 2009 09:49 AM
Last Post: arekkusu
  [Problem] Loading image with OpenGL orthogonal view pixel-to-pixel Lexis 2 3,655 Aug 21, 2009 01:35 AM
Last Post: Lexis