Bug 839 - Speeding up render with gcc 3.4 and MMX intrinsics
Summary: Speeding up render with gcc 3.4 and MMX intrinsics
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: x86 (IA32) Linux (All)
: high normal
Assignee: Xorg Project Team
QA Contact:
URL: http://freedesktop.org/~sandmann/rpms/
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-07-08 09:45 UTC by Søren Sandmann Pedersen
Modified: 2004-07-23 15:24 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
fbmmx.c (31.15 KB, text/x-csrc)
2004-07-08 09:48 UTC, Søren Sandmann Pedersen
no flags Details
fbmmx.h (4.68 KB, text/x-chdr)
2004-07-08 09:49 UTC, Søren Sandmann Pedersen
no flags Details
The patch (15.21 KB, patch)
2004-07-08 09:49 UTC, Søren Sandmann Pedersen
no flags Details | Splinter Review
The benchmark (5.21 KB, text/x-csrc)
2004-07-08 10:38 UTC, Søren Sandmann Pedersen
no flags Details

Description Søren Sandmann Pedersen 2004-07-08 09:45:22 UTC
Attached is a patch that speeds up some operations in the Render
extension by using gcc 3.4 MMX intrinsics. A benchmark rendering a
paragraph of component alpha text to a pixmap gave these results on a
1200 MHz laptop with an i830 chip running Fedora Core I:

Unmodified X server and the pixmap in system RAM:

        [ssp@localhost x]$ ./a.out
        total time: 41.394618
        average rect time: 0.683200
        worst rect: 9
        average glyph time: 3.550500

with the MMX optimizations:

        [ssp@localhost x]$ ./a.out
        total time: 22.972553
        average rect time: 0.677900
        worst rect: 9
        average glyph time: 1.692000

Ie., text rendering is more than twice as fast. The 'average glyph
time' here is the time it takes to render the entire paragraph of
text.

With the pixmap in video RAM, the speedup is not quite as
spectacular:

Unmodified X server:

        [ssp@localhost x]$ ./a.out
        total time: 95.900768
        average rect time: 0.003300
        worst rect: 1
        average glyph time: 9.693500

With MMXified compositing:

        total time: 66.559287
        average rect time: 0.015100
        worst rect: 6
        average glyph time: 6.720500

But still a nice improvement. The patch includes improved code for
these cases:

Subpixel text:
- (constant color) in (component alpha mask) over 565 destination
- (constant color) in (component alpha mask) over 32bit destination
- (32 bit component alpha) Saturate (32 bit destination)

Regular antialiased text:
- (8 bit alpha) Saturate (8 bit destination)
- (constant color) in (8 bit alpha mask) over 565 destination
- (constant color) in (8 bit alpha mask) over 32bit destination

GdkPixbuf:
- (reversed, non-premultiplied source) over 32bit destination
- (reversed, non-premultiplied source) over 565 destination

Alpha rectangle (e.g., Nautilus selection rectangle):
- (constant color) over 32bit destination
- (constant color) over 565 destination

Solid fill
- solid fill of 32 bit drawable
- solid fill of 16 bit drawable

The code can optionally be compiled to use the pshufw instruction, which
is only available on pentium III. 


One question: The patch has a bad hack where it redefines
DefaultCCOptions for all of the framebuffer code. How should this be
done properly? The problem with the existing DefaultCCOptions is that
they include -pedantic which doesn't work with the MMX intrinsics.

Attaching the patch, two new files and the benchmark.
Comment 1 Søren Sandmann Pedersen 2004-07-08 09:48:28 UTC
Created attachment 452 [details]
fbmmx.c
Comment 2 Søren Sandmann Pedersen 2004-07-08 09:49:00 UTC
Created attachment 453 [details]
fbmmx.h
Comment 3 Søren Sandmann Pedersen 2004-07-08 09:49:42 UTC
Created attachment 454 [details] [review]
The patch
Comment 4 Søren Sandmann Pedersen 2004-07-08 10:38:04 UTC
Created attachment 455 [details]
The benchmark
Comment 5 Nicholas Miell 2004-07-08 22:25:36 UTC
Is there any particular reason why you didn't use mmintrin.h ?
Comment 6 Søren Sandmann Pedersen 2004-07-09 03:54:37 UTC
> Is there any particular reason why you didn't use mmintrin.h ?

Well, that file is not installed on my system. 
Comment 7 Nicholas Miell 2004-07-09 11:43:21 UTC
(In reply to comment #6)
> > Is there any particular reason why you didn't use mmintrin.h ?
> 
> Well, that file is not installed on my system. 

I don't know how you could manage to install gcc without it's system headers;
are you sure it's not in the gcc install dir?

Try running the following:

ls $(gcc -print-search-dirs | grep install | awk '{ print $2 }')/include/*intrin.h
Comment 8 Søren Sandmann Pedersen 2004-07-09 16:24:17 UTC
Yeah, I got the file. 

I'll have to change my story then to "No particular reason, I just didn't know
about the file". It looks like it contains a lot of useful stuff that I
duplicated in fbmmx.c.
Comment 9 Jeff Muizelaar 2004-07-15 14:49:46 UTC
(In reply to comment #3)
> Created an attachment (id=454)
> The patch
> 
Instead of surrounding all of the calls to the *mmx functions in fbpict.c with
USE_GCC34_MMX why not just have fbHaveMMX() return 0 when USE_GCC34_MMX is not
defined?
Comment 10 Duraid Madina 2004-07-19 14:10:56 UTC
Hi Søren,

This looks like a great patch, but I'm wondering, why not change things so
there's an fb_asm.c (or whatever) which appropriately #includes another source
file with the MMX code? This would make things cleaner for other people who want
to do the same thing for other platforms (PowerPC's Altivec, IA64's multimedia
instructions etc).

On that note, I think the potential speed gains for text rendering will be even
more significant on these other platforms, since GCC does a worse job there, and
vendor-accelerated RENDER extensions (e.g. NVIDIA's) are nowhere to be seen.
Comment 11 Søren Sandmann Pedersen 2004-07-19 14:39:24 UTC
> Instead of surrounding all of the calls to the *mmx functions in fbpict.c with
> USE_GCC34_MMX why not just have fbHaveMMX() return 0 when USE_GCC34_MMX is not
> defined?

The problem doing that is thew all the MMX functions are not even defined if we
don't have gcc 3.4 on i386.


From Duraid Madina:

> This looks like a great patch, but I'm wondering, why not change things so
> there's an fb_asm.c (or whatever) which appropriately #includes another source
> file with the MMX code? This would make things cleaner for other people who 
> want

That might not be a bad idea, but I'd rather wait until someone actually creates
optimizations for those other platforms.
Comment 12 Søren Sandmann Pedersen 2004-07-24 08:24:57 UTC
The patch is committed now.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.