Bug 1822 - libGL (and DRI drivers) should support TLS
Summary: libGL (and DRI drivers) should support TLS
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: libGL (show other bugs)
Version: DRI git
Hardware: x86 (IA32) Linux (All)
: high enhancement
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-11-10 12:15 UTC by Ian Romanick
Modified: 2005-04-12 21:00 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch to add TLS support (17.01 KB, patch)
2004-11-10 12:21 UTC, Ian Romanick
no flags Details | Splinter Review
Updated TLS support patch (59.73 KB, patch)
2004-11-10 14:09 UTC, Ian Romanick
no flags Details | Splinter Review
tls-support-3.patch (59.99 KB, patch)
2004-11-10 18:43 UTC, Adam Jackson
no flags Details | Splinter Review
tls-support-4.patch (13.27 KB, patch)
2005-04-06 15:15 UTC, Ian Romanick
no flags Details | Splinter Review
tls-support-5.patch (19.22 KB, patch)
2005-04-11 16:35 UTC, Ian Romanick
no flags Details | Splinter Review

Description Ian Romanick 2004-11-10 12:15:20 UTC
Using the TLS segment to store thread-specific data, such as the current context
pointer and the current dispatch table pointer, would be much more efficient
than the current method that used pthread_{get,set}specific.
Comment 1 Ian Romanick 2004-11-10 12:21:41 UTC
Created attachment 1276 [details] [review]
Patch to add TLS support

With this patch, when GLX_USE_TLS is defined (at build time), libGL and the DRI
drivers will be built to use TLS.  Using TLS changes the libGL / DRI driver
interface in the following ways:

1. _glapi_RealDispatch no longer exists.  I don't think it was ever accessed
outside libGL.so (or outside glapi.o, for that matter), so this should have no
impact.

2. _glapi_Dispatch, _glapi_DispatchTSD, and _glapi_Context are all now
constant.  _glapi_Dispatch always points to the "threadsafe" table, and the
other two are NULL.

3. _glapi_tls_Dispatch is the new thread-local dispatch table pointer.	It
should never be NULL.

4. _glapi_tls_Context is the new thread-local context pointer.

This patch is an updated version of a much older patch that I posted to the
list a long time ago.  The old version got a lot of testing, but this version
has had only minimal testing.  It's here for review.
Comment 2 Ian Romanick 2004-11-10 14:09:29 UTC
Created attachment 1278 [details] [review]
Updated TLS support patch

This version of the patch actually compiles and links (sorry).	It add a new
feature.  If HAVE_ALIAS is defined at build time, all dispatch functions that
resolve to the same table entry (e.g., glPointParameterf, glPointParameterfARB,
glPointParameterfEXT, and glPointParameterfSGIS) will all resolve to the same
address.  In my build, this saves about 3KiB in the libGL.so (when debug info
is also removed).
Comment 3 Adam Jackson 2004-11-10 18:43:50 UTC
Created attachment 1279 [details] [review]
tls-support-3.patch

src/glx/x11/dispatch.c wouldn't build with the last patch under 'linux-dri',
needs to have glthread.h included too.	probably we don't need two copies of
dispatch.c...

builds, more or less works.  size comparison:

linux-dri, no TLS:
 437017   27572    5204  469793   72b21 libGL.so
1690845   84148  146832 1921825  1d5321 r200_dri.so

linux-dri, TLS:
 417798   27576    5176  450550   6dff6 libGL.so
1680789   84152  146832 1911773  1d2bdd r200_dri.so

linux-dri-x86, no TLS:
 406661    8920    5204  420785   66bb1 libGL.so
1777602   72004  146832 1996438  1e7696 r200_dri.so

linux-dri-x86, TLS:
 379890    8940    5176  394006   60316 libGL.so
1761114   72008  146832 1979954  1e3632 r200_dri.so

so we save about 17-23KiB on libGL, and 9KiB on r200_dri.so on all
configurations.

typical api_speed runs:

linux-dri, no TLS:
1000000 calls to	   glColor3fv required 66275372 cycles.
1000000 calls to	  glNormal3fv required 67313061 cycles.
1000000 calls to	glTexCoord2fv required 67436552 cycles.
1000000 calls to	glTexCoord3fv required 68557680 cycles.
1000000 calls to   glMultiTexCoord2fv required 78982018 cycles.
1000000 calls to    glMultiTexCoord2f required 81005552 cycles.
1000000 calls to	 glFogCoordfv required 65530834 cycles.
1000000 calls to	  glFogCoordf required 66423760 cycles.

linux-dri, TLS:
1000000 calls to	   glColor3fv required 88917879 cycles.
1000000 calls to	  glNormal3fv required 87621478 cycles.
1000000 calls to	glTexCoord2fv required 88670172 cycles.
1000000 calls to	glTexCoord3fv required 88722107 cycles.
1000000 calls to   glMultiTexCoord2fv required 93348227 cycles.
1000000 calls to    glMultiTexCoord2f required 99807264 cycles.
1000000 calls to	 glFogCoordfv required 86875659 cycles.
1000000 calls to	  glFogCoordf required 88212488 cycles.

linux-dri-x86, no TLS:
1000000 calls to	   glColor3fv required 35566732 cycles.
1000000 calls to	  glNormal3fv required 34631503 cycles.
1000000 calls to	glTexCoord2fv required 33493758 cycles.
1000000 calls to	glTexCoord3fv required 35927528 cycles.
1000000 calls to   glMultiTexCoord2fv required 43212345 cycles.
1000000 calls to    glMultiTexCoord2f required 40598730 cycles.
1000000 calls to	 glFogCoordfv required 30849407 cycles.
1000000 calls to	  glFogCoordf required 31340343 cycles.

linux-dri-x86, TLS:
1000000 calls to	   glColor3fv required 60313505 cycles.
1000000 calls to	  glNormal3fv required 60259834 cycles.
1000000 calls to	glTexCoord2fv required 59010398 cycles.
1000000 calls to	glTexCoord3fv required 59690089 cycles.
1000000 calls to   glMultiTexCoord2fv required 66056490 cycles.
1000000 calls to    glMultiTexCoord2f required 67972318 cycles.
1000000 calls to	 glFogCoordfv required 54717279 cycles.
1000000 calls to	  glFogCoordf required 54752466 cycles.

this is worrying.  why should it be this much slower?

also, the TLS libs make quake3 hang the machine.  might be able to blame that
using a hacked hyperz drm module though.
Comment 4 Ian Romanick 2004-11-10 23:48:27 UTC
I just realized why the TLS version is likely slower.  In Jakub's original TLS
patch, he modified the dispatch stubs to call a local function to get the
dispatch pointer.  In TLS mode, this means a function that does something like:

get_dispatch:
	movl	%gs:_glapi_tls_Dispatch@NTPOFF, %eax
	ret

I seem to recall that this was done to eliminate a bunch of relocs.  Perhaps
something related to prelink?  Dunno.  The code that calls it is actually a
'call get_dispatch; nop'.  The intention was the other code in libGL could copy
the get_dispatch function "inline".  This is what my patch also does.

The overhead of the CALL / RET pair is likely higher than the MOV / TEST / JNE
in the non-TLS dispatch stubs.  On my box, "inlinging" get_dispatch reduces
glColor3fv from ~107 cycles to ~61 cycles.
Comment 5 Dieter Nützel 2004-11-11 08:50:46 UTC
(In reply to comment #3) 
> Created an attachment (id=1279) [edit] 
> tls-support-3.patch 
>  
> src/glx/x11/dispatch.c wouldn't build with the last patch under 'linux-dri', 
> needs to have glthread.h included too.  probably we don't need two copies of 
> dispatch.c... 
>  
> builds, more or less works.  size comparison: 
 
Is this newer than Ian's 'first' one: 
 
DRI-TLS-01.patch 
Mesa-TLS-01.patch 
 
I use that on DRI CVS (freedesktop.org) and Mesa CVS. 
So I need separated ones. 
Comment 6 Ian Romanick 2004-11-11 10:24:31 UTC
Here is the original message from Jakub explaining why the dispatch stubs use
the extra function call:

http://marc.theaimsgroup.com/?l=dri-devel&m=105216557628538&w=2

Basically, it eliminates some relocs and makes prelink happy.
Comment 7 Ian Romanick 2005-04-06 15:15:13 UTC
Created attachment 2342 [details] [review]
tls-support-4.patch

Updates tls-support-3.patch to work against current CVS.

I think that there are a couple issues that remain before this code can start
going into CVS.

1. The build.  Should TLS builds go in $(LIB_DIR)/tls?	The whole build process
breaks down if, for example, a distro wants to build TLS and non-TLS versions
of everything.	Can we just punt on this issue for now?

2. LD_KERNEL_ASSUME=2.4.19 stuff.  Will the voodoo that's currently in
src/mesa/x86/glapi_x86.S work "as is" for other archs?	If so, we can just put
that in its own file and always link it in.

3. Run-time inlining of get_dispatch.  The original code from Jakub would copy
the code from get_dispatch into each of the dispatch functions at run-time. 
This means the dispatch stubs need to be in a wtext section.  This balanced
between run-time performance and the number of relocs.

4. There are a few things in src/glx/x11 that could be TLS-ized (e.g., the
current GLX context, etc.).  This can probably wait until after the first batch
goes into CVS.
Comment 8 Ian Romanick 2005-04-11 16:35:07 UTC
Created attachment 2394 [details] [review]
tls-support-5.patch

This patch resolves issue #3 from the previous patch.  It also updates the
_glapi_add_entrypoint path to support the TLS dispatch functions.  Unless
anyone objects, I'm going to commit this patch, minus the change to
configs/linux-dri, to Mesa CVS in the next day or two.
Comment 9 Ian Romanick 2005-04-13 14:00:57 UTC
tls-support-5.patch, minus the changes to configs/linux-dri, is now comitted to
Mesa CVS.  Closing bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.