Revamping the GTK accessibility stack?
URLs
https://wiki.gnome.org/Hackfests/GTK2020/Notes
https://blog.gtk.org/2020/02/17/gtk-hackfest-2020-roadmap-and-accessibility/
https://github.com/flatpak/xdg-desktop-portal/issues/1046#issuecomment-1615146403
GTK4 accessibility issues https://gitlab.gnome.org/GNOME/gtk/-/issues?sort=updated_desc&state=opened&label_name[]=8.+Accessibility&label_name[]=GTK4
Archaelogy of Accessibility https://www.youtube.com/watch?v=eNh0Xg8abj0
Discussion sort-of started here (but not only)
https://gitlab.gnome.org/GNOME/gtk/issues/1739
https://gitlab.gnome.org/GNOME/gtk/merge_requests/1120
Previous discussions: klj https://wiki.gnome.org/Hackfests/ATK2011
https://wiki.gnome.org/Accessibility/Hackfests/ATK2012
https://wiki.gnome.org/Accessibility/Hackfests/GUADEC2013
Meta-bug for ATK TODO list: https://bugzilla.gnome.org/show_bug.cgi?id=638537
Some input questions are discussed on Input
A proposal for p2p connections between applications and screenreader on P2P
https://nimfsoft.art/blog/the-reality-of-wayland-input-methods-in-2022/
Coherency
For various reasons, we most probably want to keep coherent with other interfaces (while avoiding the bad ideas):
- Microsoft's UIA (https://docs.microsoft.com/en-us/windows/win32/winauto/entry-uiauto-win32) / IA2 (https://github.com/LinuxA11y/IAccessible2) / MSAA (https://docs.microsoft.com/en-us/previous-versions//ms697707(v=vs.85))
- OS X's NSAccessibility (https://developer.apple.com/documentation/appkit/nsaccessibility)
- Aria (https://w3c.github.io/aria/ https://w3c.github.io/core-aam https://w3c.github.io/aria-practices/)
- Android?
The technical IPC mechanism details and the IPC interface details vary, but the ground principle is the same:
- The screen reader uses an IPC to get information from application's widgets.
- The toolkit used by the application implements the server hooks.
- The toolkit sends notifications for events that are useful to screen readers (e.g. text change)
- Some IPCs do have effect/actions in the application
Strong issues
- AT-SPI allows any application to access information, act on the application etc.
- https://gitlab.gnome.org/GNOME/at-spi2-atk/issues/12
- Only orca, gnome-shell's zoom, compiz' ezoom, on-screen adaptative keyboard caribou/florence, terminal reader brltty, text editor from speech dasher, accerciser, should have access
- key snooping makes all keypresses go through the screen reader, thus making the whole desktop slower
- key snooping doesn't work on wayland, making screen readers very difficult to use, https://gitlab.gnome.org/GNOME/mutter/issues/9
- Stealing key events from gtk is considered wrong, it should go through other means https://gitlab.gnome.org/GNOME/gtk/issues/1739#note_458183
- On wayland, widgets can't provide absolute screen position, only position relative to window, and only the compositor can provide the absolute position of the window
Issues
- The ATK key event seems too raw and clenched on X11 specifics https://bugzilla.gnome.org/show_bug.cgi?id=649559#c3
- ATK itself is cumbersome
- at-spi2-core + at-spi2-atk + atk + gtk
- Adding a feature requires adding there all, having just one library would make development much easier
- at-spi contains interfaces which have not really been implemented (e.g. boundary details in the AtkText interface)
- clench between GtkWidget and GtkAccessible classes.
Usage needs
blind people
- Getting audio feedback for the currently-focused widget:
- when going from one to the other, when the current widget changes its state, when a notification widget notifies of something.
- What the widget actually is: a checkbox to be ticked, a radio button, a button, an entry to be typed, ... and the current state
- Getting audio feedback of what was done:
- what was typed, thus the translated key, not only the keycode.
- what is getting selected
- what text was pasted
- the new value of a spin button
- that a row or column was added in an array
- that a new tab was added
- that some text showed up (e.g. "no occurence found")
- that a window opened up / was closed, or switched to another application
- etc.
- Speak text: current word, current sentence, current line, all the text
- Have the textview scrolled while it is spoken, in case the speech goes beyond what is visible, for sighted co-worker to see it.
- Speak attribute of text: bold, size, etc.
- Screen reader driven through shortcuts, to e.g. spell a letter, spell a word, read the current selection, speak the current logical position (window, panel, group, widget), change feedback verbosity, etc.
- Move caret to where user interrupted the speech.
- Move caret to braille cursor routing position.
- Simulate mouse clicks (for when there is no keyboard shortcut equivalent defined)
people with low vision
- Getting zoomed area to show the currently-focused widget, and the current caret position
- Getting zoomed area to show the text currently being spoken by the screen reader
- Have the textview scrolled while it is spoken, in case the speech goes beyond what is visible
- Speak the content that is under the mouse
- And be able to act on it safely, i.e. be sure to be acting on what was spoken last
people with limited typing capabilities
- Produce an on-screen adaptative keyboard which contains only the actions that can be achieved in the software, and use one-button + timing to efficiently reach them
- typing / actions from eye-tracking
- typing / actions from speech recognition
notable entailed technical needs
- key press events with translated key, not only the keycode
- keyboard shortcuts
- Mouse click simulation
Ideas for solutions
Just ideas randomly thrown here, no particular order/priority/applicability intended
Check how it's done in other OSes
- key snooping: on Windows, NVDA uses the win32 API, not the accessibility API
Checking authorization before allowing connection to AT-SPI
- System-provided screen readers can probably get a pass-through (policykit?)
- Others would need user-provided authorization (Needs to be accessible, or at least simple enough to be done blindly (e.g. a well-known sound, to which you can respond by just typing enter))
- wayland security modules https://github.com/mupuf/libwsm
Separating out key press notification from keyboard shortcuts:
- key press notification can be asynchronous, thus not slowing desktop down
- only keyboard shortcut would have to go through the screen reader
key press/shortcut transmission
- through XInput2? (Requires a window, but at-spi2-core could create one ; wayland needs something else)
- through AT-SPI or another protocol? We want authorization to avoid arbitrary key snooping, if we add authorization over the whole AT-SPI that should be fine
- A bridge for dogtail was implemented on https://gitlab.gnome.org/ofourdan/gnome-ponytail-daemon but this is more of a hack
- A macro daemon is currently in development on https://gitlab.freedesktop.org/whot/macrod/
- Wayland security module https://lists.freedesktop.org/archives/wayland-devel/2015-March/020474.html
- Should be stuffed within at-spi2, so that AT developers do not have to re-do it (badly)
- Introduce something new both for Wayland and X11, or leave X11 with XInput2?
- Macrod daemon https://gitlab.freedesktop.org/whot/macrod/
key press feedback
- Key presses
- when keyboard layout is consistent over the desktop, only needs the keycode and translate it properly
- But also input method expansions
- make IM emit an at-spi event to notify of this?
Implement accessibility support (such as "read current line") within the toolkit itself
- There are a lot of such things that would need to be implemented to make working with speech feedback workable, and not shared with other toolkits
- The screen reader has its own state (e.g. feedback verbosity level)
- That makes integration with other screen reader behavior extremely difficult (We need to have it interrupted by e.g. desktop notifications, focus change, etc.)
Involve the compositor
- for keypresses / shortcuts?
- for getting absolute window position
- Define an interface that all compositor would have to implement
Turn AtkObject into an interface, so implementors can decide between dedicated objects or just implement the interface
- gtk could directly talk at-spi (like Qt does), and get rid of the atk layer
- Get that auto-generated from IPC descriptions
- Or even use an aria layer, independent from the transport technology actually used