Revamping the GTK accessibility stack?

URLs

Discussion sort-of started here (but not only)

https://gitlab.gnome.org/GNOME/gtk/issues/1739

https://gitlab.gnome.org/GNOME/gtk/merge_requests/1120

Previous discussions: klj https://wiki.gnome.org/Hackfests/ATK2011

https://wiki.gnome.org/Accessibility/Hackfests/ATK2012

https://wiki.gnome.org/Accessibility/Hackfests/GUADEC2013

Meta-bug for ATK TODO list: https://bugzilla.gnome.org/show_bug.cgi?id=638537

Some input questions are discussed on Input

A proposal for p2p connections between applications and screenreader on P2P

https://nimfsoft.art/blog/the-reality-of-wayland-input-methods-in-2022/

Coherency

For various reasons, we most probably want to keep coherent with other interfaces (while avoiding the bad ideas):

Microsoft's UIA (https://docs.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32) / IA2 (https://github.com/LinuxA11y/IAccessible2) / MSAA (https://docs.microsoft.com/en-us/previous-versions//ms697707(v=vs.85))
OS X's NSAccessibility (https://developer.apple.com/documentation/appkit/nsaccessibility)
Aria (https://w3c.github.io/aria/ https://w3c.github.io/core-aam https://w3c.github.io/aria-practices/)
Android?

The technical IPC mechanism details and the IPC interface details vary, but the ground principle is the same:

The screen reader uses an IPC to get information from application's widgets.
The toolkit used by the application implements the server hooks.
The toolkit sends notifications for events that are useful to screen readers (e.g. text change)
Some IPCs do have effect/actions in the application

Strong issues

AT-SPI allows any application to access information, act on the application etc.
- https://gitlab.gnome.org/GNOME/at-spi2-atk/issues/12
- Only orca, gnome-shell's zoom, compiz' ezoom, on-screen adaptative keyboard caribou/florence, terminal reader brltty, text editor from speech dasher, accerciser, should have access
key snooping makes all keypresses go through the screen reader, thus making the whole desktop slower
key snooping doesn't work on wayland, making screen readers very difficult to use, https://gitlab.gnome.org/GNOME/mutter/issues/9
Stealing key events from gtk is considered wrong, it should go through other means https://gitlab.gnome.org/GNOME/gtk/issues/1739#note_458183
On wayland, widgets can't provide absolute screen position, only position relative to window, and only the compositor can provide the absolute position of the window

Issues

The ATK key event seems too raw and clenched on X11 specifics https://bugzilla.gnome.org/show_bug.cgi?id=649559#c3
ATK itself is cumbersome
- at-spi2-core + at-spi2-atk + atk + gtk
- Adding a feature requires adding there all, having just one library would make development much easier
at-spi contains interfaces which have not really been implemented (e.g. boundary details in the AtkText interface)
clench between GtkWidget and GtkAccessible classes.

Usage needs

blind people

Getting audio feedback for the currently-focused widget:
- when going from one to the other, when the current widget changes its state, when a notification widget notifies of something.
- What the widget actually is: a checkbox to be ticked, a radio button, a button, an entry to be typed, ... and the current state
Getting audio feedback of what was done:
- what was typed, thus the translated key, not only the keycode.
- what is getting selected
- what text was pasted
- the new value of a spin button
- that a row or column was added in an array
- that a new tab was added
- that some text showed up (e.g. "no occurence found")
- that a window opened up / was closed, or switched to another application
- etc.
Speak text: current word, current sentence, current line, all the text
- Have the textview scrolled while it is spoken, in case the speech goes beyond what is visible, for sighted co-worker to see it.
Speak attribute of text: bold, size, etc.
Screen reader driven through shortcuts, to e.g. spell a letter, spell a word, read the current selection, speak the current logical position (window, panel, group, widget), change feedback verbosity, etc.
Move caret to where user interrupted the speech.
Move caret to braille cursor routing position.
Simulate mouse clicks (for when there is no keyboard shortcut equivalent defined)

people with low vision

Getting zoomed area to show the currently-focused widget, and the current caret position
Getting zoomed area to show the text currently being spoken by the screen reader
- Have the textview scrolled while it is spoken, in case the speech goes beyond what is visible
Speak the content that is under the mouse
- And be able to act on it safely, i.e. be sure to be acting on what was spoken last

people with limited typing capabilities

Produce an on-screen adaptative keyboard which contains only the actions that can be achieved in the software, and use one-button + timing to efficiently reach them
typing / actions from eye-tracking
typing / actions from speech recognition

notable entailed technical needs

key press events with translated key, not only the keycode
keyboard shortcuts
Mouse click simulation

Ideas for solutions

Just ideas randomly thrown here, no particular order/priority/applicability intended

Check how it's done in other OSes
- key snooping: on Windows, NVDA uses the win32 API, not the accessibility API
Checking authorization before allowing connection to AT-SPI
- System-provided screen readers can probably get a pass-through (policykit?)
- Others would need user-provided authorization (Needs to be accessible, or at least simple enough to be done blindly (e.g. a well-known sound, to which you can respond by just typing enter))
- wayland security modules https://github.com/mupuf/libwsm
Separating out key press notification from keyboard shortcuts:
- key press notification can be asynchronous, thus not slowing desktop down
- only keyboard shortcut would have to go through the screen reader
key press/shortcut transmission
- through XInput2? (Requires a window, but at-spi2-core could create one ; wayland needs something else)
- through AT-SPI or another protocol? We want authorization to avoid arbitrary key snooping, if we add authorization over the whole AT-SPI that should be fine
- A bridge for dogtail was implemented on https://gitlab.gnome.org/ofourdan/gnome-ponytail-daemon but this is more of a hack
- A macro daemon is currently in development on https://gitlab.freedesktop.org/whot/macrod/
- Wayland security module https://lists.freedesktop.org/archives/wayland-devel/2015-March/020474.html
- Should be stuffed within at-spi2, so that AT developers do not have to re-do it (badly)
- Introduce something new both for Wayland and X11, or leave X11 with XInput2?
- Macrod daemon https://gitlab.freedesktop.org/whot/macrod/
key press feedback
- Key presses
- when keyboard layout is consistent over the desktop, only needs the keycode and translate it properly
- But also input method expansions
- make IM emit an at-spi event to notify of this?
Implement accessibility support (such as "read current line") within the toolkit itself
- There are a lot of such things that would need to be implemented to make working with speech feedback workable, and not shared with other toolkits
- The screen reader has its own state (e.g. feedback verbosity level)
- That makes integration with other screen reader behavior extremely difficult (We need to have it interrupted by e.g. desktop notifications, focus change, etc.)
Involve the compositor
- for keypresses / shortcuts?
- for getting absolute window position
- Define an interface that all compositor would have to implement
Turn AtkObject into an interface, so implementors can decide between dedicated objects or just implement the interface
gtk could directly talk at-spi (like Qt does), and get rid of the atk layer
- Get that auto-generated from IPC descriptions
Or even use an aria layer, independent from the transport technology actually used