Using the TLS segment to store thread-specific data, such as the current context pointer and the current dispatch table pointer, would be much more efficient than the current method that used pthread_{get,set}specific.
Created attachment 1276 [details] [review] Patch to add TLS support With this patch, when GLX_USE_TLS is defined (at build time), libGL and the DRI drivers will be built to use TLS. Using TLS changes the libGL / DRI driver interface in the following ways: 1. _glapi_RealDispatch no longer exists. I don't think it was ever accessed outside libGL.so (or outside glapi.o, for that matter), so this should have no impact. 2. _glapi_Dispatch, _glapi_DispatchTSD, and _glapi_Context are all now constant. _glapi_Dispatch always points to the "threadsafe" table, and the other two are NULL. 3. _glapi_tls_Dispatch is the new thread-local dispatch table pointer. It should never be NULL. 4. _glapi_tls_Context is the new thread-local context pointer. This patch is an updated version of a much older patch that I posted to the list a long time ago. The old version got a lot of testing, but this version has had only minimal testing. It's here for review.
Created attachment 1278 [details] [review] Updated TLS support patch This version of the patch actually compiles and links (sorry). It add a new feature. If HAVE_ALIAS is defined at build time, all dispatch functions that resolve to the same table entry (e.g., glPointParameterf, glPointParameterfARB, glPointParameterfEXT, and glPointParameterfSGIS) will all resolve to the same address. In my build, this saves about 3KiB in the libGL.so (when debug info is also removed).
Created attachment 1279 [details] [review] tls-support-3.patch src/glx/x11/dispatch.c wouldn't build with the last patch under 'linux-dri', needs to have glthread.h included too. probably we don't need two copies of dispatch.c... builds, more or less works. size comparison: linux-dri, no TLS: 437017 27572 5204 469793 72b21 libGL.so 1690845 84148 146832 1921825 1d5321 r200_dri.so linux-dri, TLS: 417798 27576 5176 450550 6dff6 libGL.so 1680789 84152 146832 1911773 1d2bdd r200_dri.so linux-dri-x86, no TLS: 406661 8920 5204 420785 66bb1 libGL.so 1777602 72004 146832 1996438 1e7696 r200_dri.so linux-dri-x86, TLS: 379890 8940 5176 394006 60316 libGL.so 1761114 72008 146832 1979954 1e3632 r200_dri.so so we save about 17-23KiB on libGL, and 9KiB on r200_dri.so on all configurations. typical api_speed runs: linux-dri, no TLS: 1000000 calls to glColor3fv required 66275372 cycles. 1000000 calls to glNormal3fv required 67313061 cycles. 1000000 calls to glTexCoord2fv required 67436552 cycles. 1000000 calls to glTexCoord3fv required 68557680 cycles. 1000000 calls to glMultiTexCoord2fv required 78982018 cycles. 1000000 calls to glMultiTexCoord2f required 81005552 cycles. 1000000 calls to glFogCoordfv required 65530834 cycles. 1000000 calls to glFogCoordf required 66423760 cycles. linux-dri, TLS: 1000000 calls to glColor3fv required 88917879 cycles. 1000000 calls to glNormal3fv required 87621478 cycles. 1000000 calls to glTexCoord2fv required 88670172 cycles. 1000000 calls to glTexCoord3fv required 88722107 cycles. 1000000 calls to glMultiTexCoord2fv required 93348227 cycles. 1000000 calls to glMultiTexCoord2f required 99807264 cycles. 1000000 calls to glFogCoordfv required 86875659 cycles. 1000000 calls to glFogCoordf required 88212488 cycles. linux-dri-x86, no TLS: 1000000 calls to glColor3fv required 35566732 cycles. 1000000 calls to glNormal3fv required 34631503 cycles. 1000000 calls to glTexCoord2fv required 33493758 cycles. 1000000 calls to glTexCoord3fv required 35927528 cycles. 1000000 calls to glMultiTexCoord2fv required 43212345 cycles. 1000000 calls to glMultiTexCoord2f required 40598730 cycles. 1000000 calls to glFogCoordfv required 30849407 cycles. 1000000 calls to glFogCoordf required 31340343 cycles. linux-dri-x86, TLS: 1000000 calls to glColor3fv required 60313505 cycles. 1000000 calls to glNormal3fv required 60259834 cycles. 1000000 calls to glTexCoord2fv required 59010398 cycles. 1000000 calls to glTexCoord3fv required 59690089 cycles. 1000000 calls to glMultiTexCoord2fv required 66056490 cycles. 1000000 calls to glMultiTexCoord2f required 67972318 cycles. 1000000 calls to glFogCoordfv required 54717279 cycles. 1000000 calls to glFogCoordf required 54752466 cycles. this is worrying. why should it be this much slower? also, the TLS libs make quake3 hang the machine. might be able to blame that using a hacked hyperz drm module though.
I just realized why the TLS version is likely slower. In Jakub's original TLS patch, he modified the dispatch stubs to call a local function to get the dispatch pointer. In TLS mode, this means a function that does something like: get_dispatch: movl %gs:_glapi_tls_Dispatch@NTPOFF, %eax ret I seem to recall that this was done to eliminate a bunch of relocs. Perhaps something related to prelink? Dunno. The code that calls it is actually a 'call get_dispatch; nop'. The intention was the other code in libGL could copy the get_dispatch function "inline". This is what my patch also does. The overhead of the CALL / RET pair is likely higher than the MOV / TEST / JNE in the non-TLS dispatch stubs. On my box, "inlinging" get_dispatch reduces glColor3fv from ~107 cycles to ~61 cycles.
(In reply to comment #3) > Created an attachment (id=1279) [edit] > tls-support-3.patch > > src/glx/x11/dispatch.c wouldn't build with the last patch under 'linux-dri', > needs to have glthread.h included too. probably we don't need two copies of > dispatch.c... > > builds, more or less works. size comparison: Is this newer than Ian's 'first' one: DRI-TLS-01.patch Mesa-TLS-01.patch I use that on DRI CVS (freedesktop.org) and Mesa CVS. So I need separated ones.
Here is the original message from Jakub explaining why the dispatch stubs use the extra function call: http://marc.theaimsgroup.com/?l=dri-devel&m=105216557628538&w=2 Basically, it eliminates some relocs and makes prelink happy.
Created attachment 2342 [details] [review] tls-support-4.patch Updates tls-support-3.patch to work against current CVS. I think that there are a couple issues that remain before this code can start going into CVS. 1. The build. Should TLS builds go in $(LIB_DIR)/tls? The whole build process breaks down if, for example, a distro wants to build TLS and non-TLS versions of everything. Can we just punt on this issue for now? 2. LD_KERNEL_ASSUME=2.4.19 stuff. Will the voodoo that's currently in src/mesa/x86/glapi_x86.S work "as is" for other archs? If so, we can just put that in its own file and always link it in. 3. Run-time inlining of get_dispatch. The original code from Jakub would copy the code from get_dispatch into each of the dispatch functions at run-time. This means the dispatch stubs need to be in a wtext section. This balanced between run-time performance and the number of relocs. 4. There are a few things in src/glx/x11 that could be TLS-ized (e.g., the current GLX context, etc.). This can probably wait until after the first batch goes into CVS.
Created attachment 2394 [details] [review] tls-support-5.patch This patch resolves issue #3 from the previous patch. It also updates the _glapi_add_entrypoint path to support the TLS dispatch functions. Unless anyone objects, I'm going to commit this patch, minus the change to configs/linux-dri, to Mesa CVS in the next day or two.
tls-support-5.patch, minus the changes to configs/linux-dri, is now comitted to Mesa CVS. Closing bug.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.