Creating a Wayland protocol so a cat can chase my cursor
This post assumes general familiarity with C (pointers, manual memory management, etc), Rust, and the idea of a display server, but no Wayland-specific knowledge.
I'm not a Wayland expert by any means; this is just stuff I figured out by reading source code and documentation. Don't use any of this as a template.
Also, note that throughout this post, the word 'pointer' will generally refer to a specific kind of Wayland object, but occasionally to a C pointer. I'll try to disambiguate when necessary.
intro
I've been working on a "desktop toy" for having a cat (or other things) hang out on your desktop in the style of the classic neko
to kill time and because I want to mess around with writing relatively low-level Wayland code. One of the features I wanted to implement was to allow the cat to chase your cursor akin to the original neko
(but across your entire screen, not just a specific window). But I ran into a problem: in Wayland, there's no way to monitor the motion of the pointer across the entire screen without also 'grabbing' all pointer events, making everything else uninteractable; even outside the 'standard' protocol, none of the extensions support it. So I was left with a few options:
- Have a way to turn on 'chase mode', which receives pointer events and causes the cat to chase the cursor, and then turn it back off again, which causes the cat to stop chasing the cursor and lets you use your computer. This is what I did first, since it's simple.
- Communicate outside of Wayland. sway has a separate IPC mechanism that boils down to pushing JSON messages over a socket that it already uses for notifications that aren't part of the core Wayland protocol. This would require me to add another event stream to openbonzi (since right now it only uses Wayland protocol events) and it wouldn't be fun.
- Modify sway so that some clients always get pointer motion information even if the pointer isn't over a surface they own. I considered this, but in my early implementations I noticed that it violated assumptions that the codebase made about pointer motion and would send spurious surface enter/leave events. This would also merge both 'normal' pointer events and 'global' pointer events, which can be subpar since an application might want to handle those separately.
- Add an entirely new type of object to the Wayland protocol. This is guaranteed not to interfere with any existing code, and it means that client code can easily distinguish between global motion and 'normal' motion. This would also require the most
workfun.
Given that this blog post exists, I clearly chose option 4.
defining the protocol
Wayland isn't a specific piece of software; instead, it's a protocol to allow clients (i.e. applications) to ask the display server to do things and for the display server to notify those clients when something happens (e.g. the mouse moved). The IPC design is fairly typical: a simple binary format over a Unix socket (along with passing file descriptors for things like drag-and-drop). The protocol is described by XML files such as xdg-shell.xml which various programs use to generate bindings for a specific language or library. These XML files and the associated generated code are also known as 'protocols', and so the statement "sway supports the xdg_activation_v1 protocol" means that sway exposes objects via the Wayland protocol necessary for clients to use the functionality described by that file. You can browse both the core protocol and all the common extensions (and most of the uncommon ones) on the Wayland Explorer, which renders the XML files into some nice HTML you can browse.
The upshot of this is that display servers that support Wayland will all implement the core 'basic' protocols, but various extensions may or may not be supported. This means that not all clients will work on all display servers, but it also means that it's possible to create new functionality that only certain clients need. Which is exactly what we want.
For more information, see the Wayland Book; some of the APIs have changed somewhat, but it's still a very good conceptual reference.
Before we can write any server code, we have to write the XML protocol definition file. In this case, I started with thinking about how the client should use the protocol. In the Wayland model, when declaring an interface (think 'type'), you can also declare that it can receive various kinds of events. This is how the server notifies clients of things that are happening: a moving pointer generates motion events on a wl_pointer
object, changing the resolution of a display generates geometry events on a wl_output
, and so on. So it makes sense for us to define a new kind of object that receives events whenever the pointer moves, regardless of where it is.
I originally called these objects 'pointer monitors', but 'monitor' looks too close to 'manager' (a word we'll see soon), so I renamed them to 'pointer spies'. Each pointer spy monitors exactly one pointer and should receive events whenever the corresponding pointer moves. The corresponding Wayland XML description looks like so:
While this exists, it will receive events every time the pointer moves.
Sent whenever the pointer moves.
Destroys the pointer monitor object.
Note the wp_
prefix, indicating that this is a 'general-purpose' protocol, and the _v1
suffix, used to allow for backwards-incompatible future changes; both of these are specified by the naming convention. We can also request that the spy destroy itself if we no longer need it; this isn't strictly necessary for my use case, but is good practice.
We also need a way to actually get one of these pointer spies; every request the client sends has to be part of an interface implemented by some object. The typical solution to this is to use a "manager" object that has a get_foo
request that constructs a new instance of the object. While there's nothing stopping you from implementing other methods on the manager, in our case we only need to be able to get pointer spies:
This interface allows clients to know the position of the pointer at all
times. This is intended for 'desktop toy'-type applications. This is
purely an experiment and isn't intended for serious usage.
Create a new pointer monitor object.
One thing to note is that, instead of returning the ID of the generated object, the client sends the ID as part of the creation request (which is safe since object IDs are per-client). Setting the type to new_id
tells code generators that this is effectively a 'constructor' method and so it should return some representation of the pointer spy object.
But this just pushes the question one level up: how do we get a manager object? The answer is that Wayland has the idea of a "global object"; server code can register an object as a global, and when you connect to a Wayland server, you can get access to a wl_registry
object that receives an event for each global object specifying what interface it implements as well as an arbitrary "name". So our client code will look for a global implementing the wp_pointer_spy_manager_v1
interface and then bind its name (which allocates an ID number on the wire protocol).
Overall, the process looks like this (with the process of acquiring the wl_pointer
omitted):
And that's it! We just need to wrap those two together in a <protocol>
tag and do a bit of other boring throat-clearing and we're done writing the protocol itself. If you want examples of other similar protocols, the relative pointer protocol has a very similar API (and is what I based the pointer spy protocol on).
server implementation
wayland-scanner can't actually generate the implementation, just stubs for us to use. The next step is to actually add support for it. There are three 'layers' to sway:
- libwayland defines code for serializing messages over the wire protocol and deserializing it. It's pretty low-level code. Many Wayland compositors and clients are going to want to use this (through language-specific FFI bindings), although there's nothing stopping you from writing your own. In C, Wayland types will start with
wl_
. - wlroots, whose README describes it as "pluggable, composable, unopinionated modules for building a Wayland compositor", provides abstractions over reading from input hardware, implementations for many interfaces, and support for Xwayland (running X11 applications on Wayland).
- sway is the compositor itself, and consists of code for things like arranging windows on screen and higher-level functionality.
The C types these codebases implement are all namespaced in the obvious way: wl_foo
for libwayland, wlr_foo
for wlroots, and sway_foo
for sway. We're mostly going to be working with libwayland and wlroots types here, and it's important to keep in mind that wl_foo
types generally correspond to objects in the protocol, whereas sway_foo
and wlr_foo
types correspond to actual physical state. This will come up in a bit.
If I was going to actually try to upstream support for this, I'd want to add it to wlroots; protocol implementation is too high-level for libwayland, and there's no reason that this has to be sway-specific. But I don't have a burning desire to do so, and it's simpler to only have to patch sway than it would be to have to patch wlroots and sway.
generated code
First, let's look at the generated code from the above. In this case, wayland-scanner generates a .h and .c file for the server and a .h for the client (since the client code is just types and inline functions, both of which can be put in headers). We don't care about client code on the server, and we don't need to know the implementation of the server code, so we can just look at the .h, which the build system put in build/protocol/wp-pointer-spy-v1-protocol.h
. Here it is, with extraneous things removed
extern const struct wl_interface wp_pointer_spy_manager_v1_interface;
extern const struct wl_interface wp_pointer_spy_v1_interface;
The wl_interface
type contains data about the interface such as its name, the signature of its requests, and so on. This is used for serialization/deserialization machinery, and additionally allows us to check that a wl_resource object actually corresponds to the right kind of object (since you can ask a resource for the interface it implements).
;
;
These structs are used to define implementations of the interface, with each request having a corresponding object. C allows you to have values and struct types with the same name, so this can get a bit confusing, but you can easily disambiguate them by looking for the keyword struct
immediately beforehand. If it helps, you can think of these as being called wp_pointer_spy_v1_implementation
and wp_pointer_spy_manager_v1_implementation
.
static inline void
This is just a simple function to actually send the event. We get one of these per event, and it exists to make the API slightly nicer.
server code
The generated code is purely for working with the protocol, and it doesn't help us actually write the implementation (and it can't, since libwayland isn't tied to any particular implementation method). But in order to explain something subtle about the design, I need to explain what a "seat" is.
Seats are a Wayland abstraction that ties together keyboard, mouse, and touchscreen input, analogous to a single user sitting at the computer and using it. Most setups will only have one seat; even if the user has multiple keyboards or a mouse and drawing tablet, they'll still be the same seat. All the different devices are abstracted behind a single pointer input stream (though there are APIs for cases where the difference between devices matters, such as the tablet API).
This confused me for a while, since wl_seat::get_pointer
will return a new wl_pointer
every time. If a seat effectively only has one virtual pointer, what does it mean to have multiple wl_pointer
objects? The answer is that
- input events really belong to seats, not pointers
- any input event is 'broadcast' to all
wl_pointer
s on that seat
That is, you can effectively think of a wl_pointer
as a 'handle' to a seat that exposes pointer-specific requests and events. All wl_pointer
s for a specific seat will receive the same set of events at the same time, and if you want to broadcast an event to a pointer (or similar wrapper), you actually need to broadcast it to every pointer belonging to the same seat.
With that out of the way, here's the API I defined in include/sway/input/pointer_spy.h
and sway/input/pointer_spy.c
:
// in the .h
;
struct wp_pointer_spy_manager_v1 *
// in the .c
void
This is a fairly simple API: a constructor and a 'method' on the returned object. The constructor takes a wl_display *
, which is is effectively the 'root' of a running Wayland instance (the type for a monitor or other 'physical display' is called wl_output
); the actual implementation is mostly just boring bookkeeping, registering handlers to properly destroy and free data structures when necessary, and so on.
The wp_pointer_spy_manager_v1_send_motion
implementation is also very simple; the manager keeps track of all the spies that were created with it, and when we want to send a motion event, we send it to every spy with the given seat.
wl_list_for_each
implementation details
If you don't care how wl_list_for_each
works, you can skip this section; it's not at all necessary, and the semantics are obvious. But I thought this was cool.
The way the manager stores the list of spies uses a trick that I hadn't seen before in my limited C experience, but is pretty common in systems programming. A wl_list
is a doubly linked list, but rather than each node storing a pointer to its value, the node is embedded inside the data structure itself:
;
;
;
Here, the manager's wl_list
has manager.spies->prev = NULL
, and manager.spies->next
points to the .spies
member of the first spy in its list (or NULL
if there isn't one).
But how do you actually get at the spy from its wl_list
link? Pointer math!
Given a pointer into a member, the name of the member, and the type of the containing struct (here provided by using typeof
on sample
), you can use the offsetof
macro to find out the pointer to the underlying structure. This is why wl_list_for_each
takes three parameters: you need the name of the variable to store the iterated value into (which is also used to pass the type into wl_container_of
), the list to iterate over, and the name of the list's member in the structure.
This does mean that you need one wl_list
in a structure for each list you want it to be a part of, but in practice this isn't a problem since most structs will only be part of one list anyway.
Oh, and wl_list_for_each
is just a macro that expands into a fragment of a for-loop, which is why you can just put the loop body in braces afterwards:
Now that we have it defined, we just need to initialize the manager on server startup (not shown, since it's pretty boring) and make sure to call wp_pointer_spy_v1_send_motion
in the right spot, and we're good to go!
// in sway/input/cursor.c
void
sway_cursor
is sway's representation of the cursor (the graphic image shown on-screen and its current position); we get the corresponding wlr_seat *
, the cursor's current x and y coordinates, and then send the events.
client code
We've done the server half, but now we need to actually be able to use it. Since this is a protocol I just defined, it's clearly not going to be part of smithay-client-toolkit or wayland-client, but we can use wayland-scanner ourselves:
// in src/protocols.rs
pub
Then we just need to set up an event handler:
We can test this easily without having to recompile and restart my running sway instance, since if you start sway from inside an existing instance you can get a second sway instance inside a window.
# running `systemd user --import-environment` (like I do in my config file)
# will mess things up, so run with an empty config to avoid that
&
# run the demo client
WAYLAND_DISPLAY=wayland-2
And it works!
(Note that this is running sway-in-sway, unlike the video at the top.)
If you want to play around with it yourself, download a patched copy of the sway source code (or the raw patch which applies against 1a3cfc50
) and a demo program and run them.