Privileged Wayland Clients
When Wayland arrived on the scene some time around 2013, it promised increased security: in X11, any client can monitor the inputs and outputs of other clients. This allows to implement useful features such as screenshot tools and virtual keyboards, but also malicious tools like screen scrapers and keyloggers. So in the name of security, Wayland did away with all of that.
This resulted in a bit of a divide in the community: Some projects (like Gnome) chose to implement the missing features directly in their compositor. Other projects (like wlroots) wanted to go with a more modular approach where users could mix and match components. So they went ahead and defined Wayland protocol extensions for the missing features.
To avoid reintroducing all the security issues of X11, these protocol extensions are intended to only be used by "privileged clients". How does that work exactly?
Identifying Clients
Well, it's complicated.
Say sway wants to delegate screenshots to flameshot. It could use wl_client_get_credentials() to get the PID of each client and then use wl_display_set_global_filter() to block access to the relevant interfaces unless /proc/PID/exe links to the flameshot binary.
While this sounds good in theory, it has a couple of issues: /proc/PID/exe can be faked using LD_PRELOAD, and accessing /proc/ is vulnerable to race conditions.
Discussion quickly shifted towards sandboxing, because processes that are not sandboxed can read all of the user's files, which already is the worst case scenario, and can change the compositor's configuration, which nullifies any restrictions defined there. It was decided that any efforts to restrict access to privileged interfaces should focus on sandboxed application. Unsandboxed applications would always be privileged.
This lead to the security-context protocol extension, which allows sandboxing engines like Flatpak to use a separate, tagged socket for sandboxed applications.1 Most compositors simply block access to all privileged interfaces for such clients.
While this is progress, there are still a lot of unsandboxed applications, and I do not see that changing any time soon. I think we can do better!
Meaningful Interaction
Another option would be to ask the user for confirmation every time a privileged interface is accessed. However, Matthias Clasen argued in 2019 that a simple confirm dialog would just annoy users. So instead he recommends to require meaningful interaction.
For example, when you try to pick a color in Gimp, the meaningful interaction would be to select a pixel on the screen. In technical terms, Gimp could talk to xdg-desktop-portal, which talks to a portal backend, which in turn has privileged access to the compositor. In exchange for the privileged access, the portal backend guarantees to prevent abuse by requiring a user interaction.
The compositor still needs a reliable way to identity the portal backend. So user interaction is not a solution to the privileged client issue. Instead, it is a prerequisite that privileged clients have to satisfy.
Is there a better way?
When I press the "print" key on my keyboard, my compositor spawns a screenshot tool (in my case grim), which then connects back to the compositor. So far, we have tried to decide whether that tool should get access to privileged interfaces when it connects, either based on its identity or on user interaction. But at that point, the meaningful interaction has already happened. The meaningful interaction was pressing the "print" key.
This really clicked for me when I realized that all privileged clients are spawned by the compositor, either on user interaction or on startup (in which case they require meaningful user interaction themselves). We just need to find a way to pass additional privileges to the process when it is spawned, without allowing other processes to steal them.
I found the answer in a great article by Martin Roukala from 2014 (!): Usually, wayland clients use the WAYLAND_DISPLAY environment variable to find the socket that they connect to. But there is also WAYLAND_SOCKET, which tells them to use an already connected file descriptor instead. This way you can pass an anonymous socket that cannot be accessed by any other process.
Unfortunately, there is one major issues with this approach: The privileges only apply to the first client, not any of its children. For example, swayidle timeout 600 swaylock would not work because only swayidle would get the privileged socket. swaylock would use the regular socket, without access to the privileged ext_session_lock_manager_v1 interface.
Anyways, I implemented this approach for labwc in #3089. I have been using this branch for a couple of days now, and it seems to work fine. I don't expect that it will be merged though because it contains quite some breaking changes.
Conclusion
The status quo is that wlroots-based compositors implement privileged interfaces, and any non-sandboxed processes can access them. That is way too much attack surface for my taste.
The big question for me is whether the WAYLAND_SOCKET approach I described above is a step in the right direction or mere security theater. I have a hard time deciding either way.
Next I would like to look more into sandboxing to see if I can apply more general restrictions, especially to terminal applications.
way-secure is another, standalone implementation.↩︎