Revamping the GTK accessibility stack?

URLs

https://wiki.gnome.org/Hackfests/GTK2020/Notes

https://blog.gtk.org/2020/02/17/gtk-hackfest-2020-roadmap-and-accessibility/

https://github.com/flatpak/xdg-desktop-portal/issues/1046#issuecomment-1615146403

GTK4 accessibility issues https://gitlab.gnome.org/GNOME/gtk/-/issues?sort=updated_desc&state=opened&label_name[]=8.+Accessibility&label_name[]=GTK4

Archaelogy of Accessibility https://www.youtube.com/watch?v=eNh0Xg8abj0

Discussion sort-of started here (but not only)

https://gitlab.gnome.org/GNOME/gtk/issues/1739

https://gitlab.gnome.org/GNOME/gtk/merge_requests/1120

Previous discussions: klj https://wiki.gnome.org/Hackfests/ATK2011

https://wiki.gnome.org/Accessibility/Hackfests/ATK2012

https://wiki.gnome.org/Accessibility/Hackfests/GUADEC2013

Meta-bug for ATK TODO list: https://bugzilla.gnome.org/show_bug.cgi?id=638537

Some input questions are discussed on Input

A proposal for p2p connections between applications and screenreader on P2P

https://nimfsoft.art/blog/the-reality-of-wayland-input-methods-in-2022/

Coherency

For various reasons, we most probably want to keep coherent with other interfaces (while avoiding the bad ideas):

The technical IPC mechanism details and the IPC interface details vary, but the ground principle is the same:

  • The screen reader uses an IPC to get information from application's widgets.
  • The toolkit used by the application implements the server hooks.
  • The toolkit sends notifications for events that are useful to screen readers (e.g. text change)
  • Some IPCs do have effect/actions in the application

Strong issues

  • AT-SPI allows any application to access information, act on the application etc.
  • key snooping makes all keypresses go through the screen reader, thus making the whole desktop slower
  • key snooping doesn't work on wayland, making screen readers very difficult to use, https://gitlab.gnome.org/GNOME/mutter/issues/9
  • Stealing key events from gtk is considered wrong, it should go through other means https://gitlab.gnome.org/GNOME/gtk/issues/1739#note_458183
  • On wayland, widgets can't provide absolute screen position, only position relative to window, and only the compositor can provide the absolute position of the window

Issues

  • The ATK key event seems too raw and clenched on X11 specifics https://bugzilla.gnome.org/show_bug.cgi?id=649559#c3
  • ATK itself is cumbersome
    • at-spi2-core + at-spi2-atk + atk + gtk
    • Adding a feature requires adding there all, having just one library would make development much easier
  • at-spi contains interfaces which have not really been implemented (e.g. boundary details in the AtkText interface)
  • clench between GtkWidget and GtkAccessible classes.

Usage needs

blind people

  • Getting audio feedback for the currently-focused widget:
    • when going from one to the other, when the current widget changes its state, when a notification widget notifies of something.
    • What the widget actually is: a checkbox to be ticked, a radio button, a button, an entry to be typed, ... and the current state
  • Getting audio feedback of what was done:
    • what was typed, thus the translated key, not only the keycode.
    • what is getting selected
    • what text was pasted
    • the new value of a spin button
    • that a row or column was added in an array
    • that a new tab was added
    • that some text showed up (e.g. "no occurence found")
    • that a window opened up / was closed, or switched to another application
    • etc.
  • Speak text: current word, current sentence, current line, all the text
    • Have the textview scrolled while it is spoken, in case the speech goes beyond what is visible, for sighted co-worker to see it.
  • Speak attribute of text: bold, size, etc.
  • Screen reader driven through shortcuts, to e.g. spell a letter, spell a word, read the current selection, speak the current logical position (window, panel, group, widget), change feedback verbosity, etc.
  • Move caret to where user interrupted the speech.
  • Move caret to braille cursor routing position.
  • Simulate mouse clicks (for when there is no keyboard shortcut equivalent defined)

people with low vision

  • Getting zoomed area to show the currently-focused widget, and the current caret position
  • Getting zoomed area to show the text currently being spoken by the screen reader
    • Have the textview scrolled while it is spoken, in case the speech goes beyond what is visible
  • Speak the content that is under the mouse
    • And be able to act on it safely, i.e. be sure to be acting on what was spoken last

people with limited typing capabilities

  • Produce an on-screen adaptative keyboard which contains only the actions that can be achieved in the software, and use one-button + timing to efficiently reach them
  • typing / actions from eye-tracking
  • typing / actions from speech recognition

notable entailed technical needs

  • key press events with translated key, not only the keycode
  • keyboard shortcuts
  • Mouse click simulation

Ideas for solutions

Just ideas randomly thrown here, no particular order/priority/applicability intended

  • Check how it's done in other OSes

    • key snooping: on Windows, NVDA uses the win32 API, not the accessibility API
  • Checking authorization before allowing connection to AT-SPI

    • System-provided screen readers can probably get a pass-through (policykit?)
    • Others would need user-provided authorization (Needs to be accessible, or at least simple enough to be done blindly (e.g. a well-known sound, to which you can respond by just typing enter))
    • wayland security modules https://github.com/mupuf/libwsm
  • Separating out key press notification from keyboard shortcuts:

    • key press notification can be asynchronous, thus not slowing desktop down
    • only keyboard shortcut would have to go through the screen reader
  • key press/shortcut transmission

  • key press feedback

    • Key presses
    • when keyboard layout is consistent over the desktop, only needs the keycode and translate it properly
    • But also input method expansions
    • make IM emit an at-spi event to notify of this?
  • Implement accessibility support (such as "read current line") within the toolkit itself

    • There are a lot of such things that would need to be implemented to make working with speech feedback workable, and not shared with other toolkits
    • The screen reader has its own state (e.g. feedback verbosity level)
    • That makes integration with other screen reader behavior extremely difficult (We need to have it interrupted by e.g. desktop notifications, focus change, etc.)
  • Involve the compositor

    • for keypresses / shortcuts?
    • for getting absolute window position
    • Define an interface that all compositor would have to implement
  • Turn AtkObject into an interface, so implementors can decide between dedicated objects or just implement the interface

  • gtk could directly talk at-spi (like Qt does), and get rid of the atk layer
    • Get that auto-generated from IPC descriptions
  • Or even use an aria layer, independent from the transport technology actually used