Some considerations

Marcelo E. Magallon mmagallo@debian.org
Sun, 4 Jan 2004 20:32:18 -0600


On Sun, Jan 04, 2004 at 09:36:44PM +0000, Keith Whitwell wrote:

 > This has much to do with scheduling algorithms - SGI I'm sure has
 > done a lot of work to integrate the behaviour of opengl apps and the
 > system scheduler so that this appearance of smooth behaviour results.

 That's probably right.  I don't have an SGI nearby, but AFAIR you can
 start two copies of say gears, and both are going to run at the refresh
 rate of the monitor.  You have to start a couple more before it drops
 to half the refresh rate.

 OTOH, if you use sync-to-vblank with low-end NVIDIA hardware, you are
 going to get something _like_ the refresh rate of the monitor.  If you
 start a second copy, both of them drop to half the refresh rate.  And
 if you look at the CPU usage, both are pegged at 50%.  That means
 there's a busy loop somewhere, which sort of neglects part of the
 reason for sync-to-vblank in the first place.  So you are right, the
 kernel threats the tasks as CPU hogs because they _are_ CPU hogs.

 > Additionally, there's a suprising amount of work, including probably
 > hardware support you need to get multiple applications running
 > sync-to-vblank efficiently.

 Sure.  I'm not saying it's easy.  The architecture of SGI graphics
 hardware is very different from PC hardware.  Even the latest
 generation of SGI hardware (if you abstract from their latest
 offerings, which are ATI parts) has dedicated texture memory for
 example.  IIRC only the O2 has unified memory (in fact it works much
 like i810 hardware).

 > My (limited) understanding is that the nvidia hardware has much of
 > what's required, but who knows if it's been a priority for them at
 > the driver level.

 AFAIK the Quadro line has much of what's needed.  I'm not really sure
 about the low end parts, but it doesn't like it does, from what I'm
 telling.

 > If you think about it, a GL app with direct rendering probably looks
 > a lot like a batch application to the scheduler.

 Sure.  The normal case is a CPU hogh because it's constantly computing
 the stuff that it's going to send to the card.

 > > Much larger areas of the screen get damaged.  In can imagine the
 > > best solution is to render each window to a texture and then
 > > rendering a bunch of polygons one on top of the other. 
 > 
 > Who is doing this rendering?  Are you talking about doing the
 > rendering on CPU to a buffer which is then handed off to the card as
 > a texture?

 Actually I was thinking about the proxy X server doing the rendering
 using the graphics card, but yes, the rendering can happen on the CPU,
 I don't see why not.  That's supposedly the way OS X works and they
 seem to get away with it.  As long as you have some way of caching
 parts of the screen I don't see why it can't work.  But if the graphics
 card is sitting idle (and it is -- a modern graphics card has ~ 1-10x
 the computing horsepower of a modern CPU) the idea of _using_ it is
 attractive.

 > > And where does that leave my OpenGL application?  As long as my
 > > OpenGL application is a top-level window everything is ok, but when
 > > I lower it, I start to get inconsistent results, or did I miss
 > > something?
 > 
 > I'm not really sure what you're getting at - an opengl app would be
 > rendering to an offscreen buffer and composited along with everything
 > else.

 Hmmm... tell that to the OpenGL application :-)

 Sure, you could modify the driver in such a way that it allocates an
 off-screen buffer instead of rendering to the framebuffer (which they
 probably do anyway -- modulo single buffered applications).  This is
 probably implementation dependendent (it certainly doesn't work on SGIs
 -- not that SGIs are interesting per se, I'm just saying not every
 implementation behaves like this), but if you have a fullscreen OpenGL
 application and you place another OpenGL window on top of it, and you
 read the framebuffer (the backbuffer actually), you get the contents of
 the window that's on top.  With some drivers and some cards at least.
 At any rate, you have to change the driver because calling SwapBuffers
 needs to do something different, not what it usually does.

 I don't know, but I have the hunch that that's slow.  I mean, you have
 to render, copy to a texture and then render a polygon.  OpenGL
 programmers get pissed off when their applications get slower for no
 good reason.  What I'm getting at is a simple question: how do OpenGL
 applications fit here without seeing their performance punished?  If we
 are talking about glxgears (which reports ridiculous things like 3000
 fps) it's fine, but what about that visualization thing which is having
 a hard time getting past 20 fps?  At 20 fps one frame is 50 ms.  If you
 add something that's going to take additional 10 ms, you are down to 17
 fps.  Not good (incidentally, gears drops from 3000 fps to 97 :-).
 Sure, you don't _have_ to use the Xserver, but then I see an adoption
 problem (the same way gamers hate Windows XP -- or whatever it is they
 hate nowadays).

 If you are compositing images, you need something to composite with.
 If the OpenGL application is bypassing the Xserver because it's working
 in direct rendering mode, what are you going to do?  glReadPixels?  How
 do you know when?  It's not the end of the world.  On SGIs you have to
 play tricks to take screenshots of OpenGL apps.  But there it's
 actually a hardware thing.  Along the same lines, you can't assign
 transparency to an OpenGL window.  You probably don't want to either,
 but _someone_ is going to ask why (the same way people ask why you
 can't take a screenshot of say, Xine).

 (along the same line, what about XVideo?)

 I'm not trying to punch holes on the idea for the fun of it, I'm
 wondering about the feasibility.

 Cheers,

     Marcelo