Some considerations
Marcelo E. Magallon
mmagallo@debian.org
Sun, 4 Jan 2004 20:32:18 -0600
On Sun, Jan 04, 2004 at 09:36:44PM +0000, Keith Whitwell wrote:
> This has much to do with scheduling algorithms - SGI I'm sure has
> done a lot of work to integrate the behaviour of opengl apps and the
> system scheduler so that this appearance of smooth behaviour results.
That's probably right. I don't have an SGI nearby, but AFAIR you can
start two copies of say gears, and both are going to run at the refresh
rate of the monitor. You have to start a couple more before it drops
to half the refresh rate.
OTOH, if you use sync-to-vblank with low-end NVIDIA hardware, you are
going to get something _like_ the refresh rate of the monitor. If you
start a second copy, both of them drop to half the refresh rate. And
if you look at the CPU usage, both are pegged at 50%. That means
there's a busy loop somewhere, which sort of neglects part of the
reason for sync-to-vblank in the first place. So you are right, the
kernel threats the tasks as CPU hogs because they _are_ CPU hogs.
> Additionally, there's a suprising amount of work, including probably
> hardware support you need to get multiple applications running
> sync-to-vblank efficiently.
Sure. I'm not saying it's easy. The architecture of SGI graphics
hardware is very different from PC hardware. Even the latest
generation of SGI hardware (if you abstract from their latest
offerings, which are ATI parts) has dedicated texture memory for
example. IIRC only the O2 has unified memory (in fact it works much
like i810 hardware).
> My (limited) understanding is that the nvidia hardware has much of
> what's required, but who knows if it's been a priority for them at
> the driver level.
AFAIK the Quadro line has much of what's needed. I'm not really sure
about the low end parts, but it doesn't like it does, from what I'm
telling.
> If you think about it, a GL app with direct rendering probably looks
> a lot like a batch application to the scheduler.
Sure. The normal case is a CPU hogh because it's constantly computing
the stuff that it's going to send to the card.
> > Much larger areas of the screen get damaged. In can imagine the
> > best solution is to render each window to a texture and then
> > rendering a bunch of polygons one on top of the other.
>
> Who is doing this rendering? Are you talking about doing the
> rendering on CPU to a buffer which is then handed off to the card as
> a texture?
Actually I was thinking about the proxy X server doing the rendering
using the graphics card, but yes, the rendering can happen on the CPU,
I don't see why not. That's supposedly the way OS X works and they
seem to get away with it. As long as you have some way of caching
parts of the screen I don't see why it can't work. But if the graphics
card is sitting idle (and it is -- a modern graphics card has ~ 1-10x
the computing horsepower of a modern CPU) the idea of _using_ it is
attractive.
> > And where does that leave my OpenGL application? As long as my
> > OpenGL application is a top-level window everything is ok, but when
> > I lower it, I start to get inconsistent results, or did I miss
> > something?
>
> I'm not really sure what you're getting at - an opengl app would be
> rendering to an offscreen buffer and composited along with everything
> else.
Hmmm... tell that to the OpenGL application :-)
Sure, you could modify the driver in such a way that it allocates an
off-screen buffer instead of rendering to the framebuffer (which they
probably do anyway -- modulo single buffered applications). This is
probably implementation dependendent (it certainly doesn't work on SGIs
-- not that SGIs are interesting per se, I'm just saying not every
implementation behaves like this), but if you have a fullscreen OpenGL
application and you place another OpenGL window on top of it, and you
read the framebuffer (the backbuffer actually), you get the contents of
the window that's on top. With some drivers and some cards at least.
At any rate, you have to change the driver because calling SwapBuffers
needs to do something different, not what it usually does.
I don't know, but I have the hunch that that's slow. I mean, you have
to render, copy to a texture and then render a polygon. OpenGL
programmers get pissed off when their applications get slower for no
good reason. What I'm getting at is a simple question: how do OpenGL
applications fit here without seeing their performance punished? If we
are talking about glxgears (which reports ridiculous things like 3000
fps) it's fine, but what about that visualization thing which is having
a hard time getting past 20 fps? At 20 fps one frame is 50 ms. If you
add something that's going to take additional 10 ms, you are down to 17
fps. Not good (incidentally, gears drops from 3000 fps to 97 :-).
Sure, you don't _have_ to use the Xserver, but then I see an adoption
problem (the same way gamers hate Windows XP -- or whatever it is they
hate nowadays).
If you are compositing images, you need something to composite with.
If the OpenGL application is bypassing the Xserver because it's working
in direct rendering mode, what are you going to do? glReadPixels? How
do you know when? It's not the end of the world. On SGIs you have to
play tricks to take screenshots of OpenGL apps. But there it's
actually a hardware thing. Along the same lines, you can't assign
transparency to an OpenGL window. You probably don't want to either,
but _someone_ is going to ask why (the same way people ask why you
can't take a screenshot of say, Xine).
(along the same line, what about XVideo?)
I'm not trying to punch holes on the idea for the fun of it, I'm
wondering about the feasibility.
Cheers,
Marcelo