Some considerations

Mon, 05 Jan 2004 05:42:07 -0500

There are fundamentally two issues in avoiding tearing and
enable "aways correct" display of applications; I think we
have to keep them separate in our minds or we will become
terminally confused.

1) that of vertical retrace catching a graphics operation in
mid-flight.

This is soluble either by:
      o classic double buffering, with buffer flips occurring at
	vertical retrace time.
      o if you can't afford full hardware double buffering, by
	scheduling the copy of the windows to vertical retrace
	time (since hardware bit-blit is fast these days.

2) that of catching applications at intermediate states, where
they have not finished re-rendering their presentation.  Without
application knowledge, there is no way for the compositing manager
to know when it is "safe" to composite the results onto the screen.
Without this knowledge, there is no way for an external agent to
"get the right answer".

This is soluble by some amount of coordination between the
application and the compositing manager.

I think, though am not entirely sure (the proof will be in
the implementation) that we can use the existing X Synchronization
extension to provide this coordination between clients and the
composting manager.  When an application is "done" with a frame
and idle, it can increment a counter and await the compositing
manager further incrementing the counter before proceeding, or
a scheme like this.

                               - Jim

On Mon, 2004-01-05 at 03:53, Keith Whitwell wrote:
> Marcelo E. Magallon wrote:
> > On Sun, Jan 04, 2004 at 09:36:44PM +0000, Keith Whitwell wrote:
> > 
> >  > This has much to do with scheduling algorithms - SGI I'm sure has
> >  > done a lot of work to integrate the behaviour of opengl apps and the
> >  > system scheduler so that this appearance of smooth behaviour results.
> > 
> >  That's probably right.  I don't have an SGI nearby, but AFAIR you can
> >  start two copies of say gears, and both are going to run at the refresh
> >  rate of the monitor.  You have to start a couple more before it drops
> >  to half the refresh rate.
> > 
> >  OTOH, if you use sync-to-vblank with low-end NVIDIA hardware, you are
> >  going to get something _like_ the refresh rate of the monitor.  If you
> >  start a second copy, both of them drop to half the refresh rate.  And
> >  if you look at the CPU usage, both are pegged at 50%.  That means
> >  there's a busy loop somewhere, which sort of neglects part of the
> >  reason for sync-to-vblank in the first place.  So you are right, the
> >  kernel threats the tasks as CPU hogs because they _are_ CPU hogs.
> > 
> >  > Additionally, there's a suprising amount of work, including probably
> >  > hardware support you need to get multiple applications running
> >  > sync-to-vblank efficiently.
> > 
> >  Sure.  I'm not saying it's easy.  The architecture of SGI graphics
> >  hardware is very different from PC hardware.  Even the latest
> >  generation of SGI hardware (if you abstract from their latest
> >  offerings, which are ATI parts) has dedicated texture memory for
> >  example.  IIRC only the O2 has unified memory (in fact it works much
> >  like i810 hardware).
> > 
> >  > My (limited) understanding is that the nvidia hardware has much of
> >  > what's required, but who knows if it's been a priority for them at
> >  > the driver level.
> > 
> >  AFAIK the Quadro line has much of what's needed.  I'm not really sure
> >  about the low end parts, but it doesn't like it does, from what I'm
> >  telling.
> > 
> >  > If you think about it, a GL app with direct rendering probably looks
> >  > a lot like a batch application to the scheduler.
> > 
> >  Sure.  The normal case is a CPU hogh because it's constantly computing
> >  the stuff that it's going to send to the card.
> > 
> >  > > Much larger areas of the screen get damaged.  In can imagine the
> >  > > best solution is to render each window to a texture and then
> >  > > rendering a bunch of polygons one on top of the other. 
> >  > 
> >  > Who is doing this rendering?  Are you talking about doing the
> >  > rendering on CPU to a buffer which is then handed off to the card as
> >  > a texture?
> > 
> >  Actually I was thinking about the proxy X server doing the rendering
> >  using the graphics card, but yes, the rendering can happen on the CPU,
> >  I don't see why not.  That's supposedly the way OS X works and they
> >  seem to get away with it.  As long as you have some way of caching
> >  parts of the screen I don't see why it can't work.  But if the graphics
> >  card is sitting idle (and it is -- a modern graphics card has ~ 1-10x
> >  the computing horsepower of a modern CPU) the idea of _using_ it is
> >  attractive.
> > 
> >  > > And where does that leave my OpenGL application?  As long as my
> >  > > OpenGL application is a top-level window everything is ok, but when
> >  > > I lower it, I start to get inconsistent results, or did I miss
> >  > > something?
> >  > 
> >  > I'm not really sure what you're getting at - an opengl app would be
> >  > rendering to an offscreen buffer and composited along with everything
> >  > else.
> > 
> >  Hmmm... tell that to the OpenGL application :-)
> > 
> >  Sure, you could modify the driver in such a way that it allocates an
> >  off-screen buffer instead of rendering to the framebuffer (which they
> >  probably do anyway -- modulo single buffered applications).  This is
> >  probably implementation dependendent (it certainly doesn't work on SGIs
> >  -- not that SGIs are interesting per se, I'm just saying not every
> >  implementation behaves like this), but if you have a fullscreen OpenGL
> >  application and you place another OpenGL window on top of it, and you
> >  read the framebuffer (the backbuffer actually), you get the contents of
> >  the window that's on top.  With some drivers and some cards at least.
> >  At any rate, you have to change the driver because calling SwapBuffers
> >  needs to do something different, not what it usually does.
> > 
> >  I don't know, but I have the hunch that that's slow.  I mean, you have
> >  to render, copy to a texture and then render a polygon.  OpenGL
> >  programmers get pissed off when their applications get slower for no
> >  good reason.  What I'm getting at is a simple question: how do OpenGL
> >  applications fit here without seeing their performance punished?  If we
> >  are talking about glxgears (which reports ridiculous things like 3000
> >  fps) it's fine, but what about that visualization thing which is having
> >  a hard time getting past 20 fps?  At 20 fps one frame is 50 ms.  If you
> >  add something that's going to take additional 10 ms, you are down to 17
> >  fps.  Not good (incidentally, gears drops from 3000 fps to 97 :-).
> >  Sure, you don't _have_ to use the Xserver, but then I see an adoption
> >  problem (the same way gamers hate Windows XP -- or whatever it is they
> >  hate nowadays).
> 
> 
> I don't really see the problem.  In the current case there is an application 
> rendering to an offscreen buffer (its backbuffer) and there is some sort of 
> swapbuffers operation - either a copy or a pageflip.  I don't see what would 
> be different in a compositing situation.
> 
> In general, if there is a specific case that has an obvious fastpath 
> implementation, we make sure we can take advantage of it.
> 
> I'm not really convinced we need to start worrying about optimizing a system 
> which is still in its infancy, but an the other hand I agree we shouldn't 
> architect it in such a way to preclude such optimizations.
> 
> >  If you are compositing images, you need something to composite with.
> >  If the OpenGL application is bypassing the Xserver because it's working
> >  in direct rendering mode, what are you going to do?  glReadPixels?  How
> >  do you know when?  It's not the end of the world.  On SGIs you have to
> >  play tricks to take screenshots of OpenGL apps.  But there it's
> >  actually a hardware thing.  Along the same lines, you can't assign
> >  transparency to an OpenGL window.  You probably don't want to either,
> >  but _someone_ is going to ask why (the same way people ask why you
> >  can't take a screenshot of say, Xine).
> > 
> >  (along the same line, what about XVideo?)
> > 
> >  I'm not trying to punch holes on the idea for the fun of it, I'm
> >  wondering about the feasibility.
> 
> I think you're mainly tieing yourself into knots worrying about trying to 
> architect this thing without modifying either the drivers or their interface. 
>   That's not what's proposed.  The ability to do all of the things you're 
> talking about isn't precluded by using OpenGL as the rendering interface, but 
> (may be) precluded by the what else is and isn't exposed by the drivers. 
> Figuring out what new facilities are needed to make this work is an important 
> part of what's going on here.
> 
> Keith
> 
> 
> _______________________________________________
> Xserver mailing list
> Xserver@pdx.freedesktop.org
> http://pdx.freedesktop.org/cgi-bin/mailman/listinfo/xserver
-- 
Jim Gettys <Jim.Gettys@hp.com>
HP Labs, Cambridge Research Laboratory