Creating a Wayland protocol so a cat can chase my cursor

A virtual cat running arouund a window, chasing a pointer. I'm not sure this is what "pointer chasing" is supposed to be.

This post assumes general familiarity with C (pointers, manual memory management, etc), Rust, and the idea of a display server, but no Wayland-specific knowledge.

I'm not a Wayland expert by any means; this is just stuff I figured out by reading source code and documentation. Don't use any of this as a template.

Also, note that throughout this post, the word 'pointer' will generally refer to a specific kind of Wayland object, but occasionally to a C pointer. I'll try to disambiguate when necessary.

intro

I've been working on a "desktop toy" for having a cat (or other things) hang out on your desktop in the style of the classic neko to kill time and because I want to mess around with writing relatively low-level Wayland code. One of the features I wanted to implement was to allow the cat to chase your cursor akin to the original neko (but across your entire screen, not just a specific window). But I ran into a problem: in Wayland, there's no way to monitor the motion of the pointer across the entire screen without also 'grabbing' all pointer events, making everything else uninteractable; even outside the 'standard' protocol, none of the extensions support it. So I was left with a few options:

Have a way to turn on 'chase mode', which receives pointer events and causes the cat to chase the cursor, and then turn it back off again, which causes the cat to stop chasing the cursor and lets you use your computer. This is what I did first, since it's simple.
Communicate outside of Wayland. sway has a separate IPC mechanism that boils down to pushing JSON messages over a socket that it already uses for notifications that aren't part of the core Wayland protocol. This would require me to add another event stream to openbonzi (since right now it only uses Wayland protocol events) and it wouldn't be fun.
Modify sway so that some clients always get pointer motion information even if the pointer isn't over a surface they own. I considered this, but in my early implementations I noticed that it violated assumptions that the codebase made about pointer motion and would send spurious surface enter/leave events. This would also merge both 'normal' pointer events and 'global' pointer events, which can be subpar since an application might want to handle those separately.
Add an entirely new type of object to the Wayland protocol. This is guaranteed not to interfere with any existing code, and it means that client code can easily distinguish between global motion and 'normal' motion. This would also require the most ~~work~~ fun.

Given that this blog post exists, I clearly chose option 4.

defining the protocol

Wayland isn't a specific piece of software; instead, it's a protocol to allow clients (i.e. applications) to ask the display server to do things and for the display server to notify those clients when something happens (e.g. the mouse moved). The IPC design is fairly typical: a simple binary format over a Unix socket (along with passing file descriptors for things like drag-and-drop). The protocol is described by XML files such as xdg-shell.xml which various programs use to generate bindings for a specific language or library. These XML files and the associated generated code are also known as 'protocols', and so the statement "sway supports the xdg_activation_v1 protocol" means that sway exposes objects via the Wayland protocol necessary for clients to use the functionality described by that file. You can browse both the core protocol and all the common extensions (and most of the uncommon ones) on the Wayland Explorer, which renders the XML files into some nice HTML you can browse.

The upshot of this is that display servers that support Wayland will all implement the core 'basic' protocols, but various extensions may or may not be supported. This means that not all clients will work on all display servers, but it also means that it's possible to create new functionality that only certain clients need. Which is exactly what we want.

For more information, see the Wayland Book; some of the APIs have changed somewhat, but it's still a very good conceptual reference.

Before we can write any server code, we have to write the XML protocol definition file. In this case, I started with thinking about how the client should use the protocol. In the Wayland model, when declaring an interface (think 'type'), you can also declare that it can receive various kinds of events. This is how the server notifies clients of things that are happening: a moving pointer generates motion events on a wl_pointer object, changing the resolution of a display generates geometry events on a wl_output, and so on. So it makes sense for us to define a new kind of object that receives events whenever the pointer moves, regardless of where it is.

I originally called these objects 'pointer monitors', but 'monitor' looks too close to 'manager' (a word we'll see soon), so I renamed them to 'pointer spies'. Each pointer spy monitors exactly one pointer and should receive events whenever the corresponding pointer moves. The corresponding Wayland XML description looks like so:

<interface name="wp_pointer_spy_v1" version="1">
  <description summary="monitors a pointer's position">
    While this exists, it will receive events every time the pointer moves.
  </description>

  <event name="motion">
    <description summary="pointer motion">
      Sent whenever the pointer moves.
    </description>

    <arg name="time" type="uint" summary="timestamp with millisecond granularity" />
    <arg name="x" type="fixed" summary="x coordinate" />
    <arg name="y" type="fixed" summary="y coordinate" />
  </event>

  <request name="destroy" type="destructor">
    <description summary="destroy the pointer_spy object">
      Destroys the pointer monitor object.
    </description>
  </request>
</interface>

Note the wp_ prefix, indicating that this is a 'general-purpose' protocol, and the _v1 suffix, used to allow for backwards-incompatible future changes; both of these are specified by the naming convention. We can also request that the spy destroy itself if we no longer need it; this isn't strictly necessary for my use case, but is good practice.

We also need a way to actually get one of these pointer spies; every request the client sends has to be part of an interface implemented by some object. The typical solution to this is to use a "manager" object that has a get_foo request that constructs a new instance of the object. While there's nothing stopping you from implementing other methods on the manager, in our case we only need to be able to get pointer spies:

<interface name="wp_pointer_spy_manager_v1" version="1">
  <description summary="monitor a pointer's position across an output">
    This interface allows clients to know the position of the pointer at all
    times. This is intended for 'desktop toy'-type applications. This is
    purely an experiment and isn't intended for serious usage.
  </description>

  <request name="get_pointer_spy">
    <description summary="start monitoring a pointer">
      Create a new pointer monitor object.
    </description>
    <arg name="id" type="new_id" interface="wp_pointer_spy_v1"/>
    <arg name="pointer" type="object" interface="wl_pointer"/>
  </request>
</interface>

One thing to note is that, instead of returning the ID of the generated object, the client sends the ID as part of the creation request (which is safe since object IDs are per-client). Setting the type to new_id tells code generators that this is effectively a 'constructor' method and so it should return some representation of the pointer spy object.

But this just pushes the question one level up: how do we get a manager object? The answer is that Wayland has the idea of a "global object"; server code can register an object as a global, and when you connect to a Wayland server, you can get access to a wl_registry object that receives an event for each global object specifying what interface it implements as well as an arbitrary "name". So our client code will look for a global implementing the wp_pointer_spy_manager_v1 interface and then bind its name (which allocates an ID number on the wire protocol).

Overall, the process looks like this (with the process of acquiring the wl_pointer omitted): And that's it! We just need to wrap those two together in a <protocol> tag and do a bit of other boring throat-clearing and we're done writing the protocol itself. If you want examples of other similar protocols, the relative pointer protocol has a very similar API (and is what I based the pointer spy protocol on).

server implementation

wayland-scanner can't actually generate the implementation, just stubs for us to use. The next step is to actually add support for it. There are three 'layers' to sway:

libwayland defines code for serializing messages over the wire protocol and deserializing it. It's pretty low-level code. Many Wayland compositors and clients are going to want to use this (through language-specific FFI bindings), although there's nothing stopping you from writing your own. In C, Wayland types will start with wl_.
wlroots, whose README describes it as "pluggable, composable, unopinionated modules for building a Wayland compositor", provides abstractions over reading from input hardware, implementations for many interfaces, and support for Xwayland (running X11 applications on Wayland).
sway is the compositor itself, and consists of code for things like arranging windows on screen and higher-level functionality.

The C types these codebases implement are all namespaced in the obvious way: wl_foo for libwayland, wlr_foo for wlroots, and sway_foo for sway. We're mostly going to be working with libwayland and wlroots types here, and it's important to keep in mind that wl_foo types generally correspond to objects in the protocol, whereas sway_foo and wlr_foo types correspond to actual physical state. This will come up in a bit.

If I was going to actually try to upstream support for this, I'd want to add it to wlroots; protocol implementation is too high-level for libwayland, and there's no reason that this has to be sway-specific. But I don't have a burning desire to do so, and it's simpler to only have to patch sway than it would be to have to patch wlroots and sway.

generated code

First, let's look at the generated code from the above. In this case, wayland-scanner generates a .h and .c file for the server and a .h for the client (since the client code is just types and inline functions, both of which can be put in headers). We don't care about client code on the server, and we don't need to know the implementation of the server code, so we can just look at the .h, which the build system put in build/protocol/wp-pointer-spy-v1-protocol.h. Here it is, with extraneous things removed

extern const struct wl_interface wp_pointer_spy_manager_v1_interface;
extern const struct wl_interface wp_pointer_spy_v1_interface;

The wl_interface type contains data about the interface such as its name, the signature of its requests, and so on. This is used for serialization/deserialization machinery, and additionally allows us to check that a wl_resource object actually corresponds to the right kind of object (since you can ask a resource for the interface it implements).

struct wp_pointer_spy_manager_v1_interface {
	void (*get_pointer_spy)(struct wl_client *client,
				struct wl_resource *resource,
				uint32_t id,
				struct wl_resource *pointer);
};

struct wp_pointer_spy_v1_interface {
	void (*destroy)(struct wl_client *client,
			struct wl_resource *resource);
};

These structs are used to define implementations of the interface, with each request having a corresponding object. C allows you to have values and struct types with the same name, so this can get a bit confusing, but you can easily disambiguate them by looking for the keyword struct immediately beforehand. If it helps, you can think of these as being called wp_pointer_spy_v1_implementation and wp_pointer_spy_manager_v1_implementation.

#define WP_POINTER_SPY_V1_MOTION 0

static inline void
wp_pointer_spy_v1_send_motion(struct wl_resource *resource_, uint32_t time, wl_fixed_t x, wl_fixed_t y)
{
	wl_resource_post_event(resource_, WP_POINTER_SPY_V1_MOTION, time, x, y);
}

This is just a simple function to actually send the event. We get one of these per event, and it exists to make the API slightly nicer.

server code

The generated code is purely for working with the protocol, and it doesn't help us actually write the implementation (and it can't, since libwayland isn't tied to any particular implementation method). But in order to explain something subtle about the design, I need to explain what a "seat" is.

Seats are a Wayland abstraction that ties together keyboard, mouse, and touchscreen input, analogous to a single user sitting at the computer and using it. Most setups will only have one seat; even if the user has multiple keyboards or a mouse and drawing tablet, they'll still be the same seat. All the different devices are abstracted behind a single pointer input stream (though there are APIs for cases where the difference between devices matters, such as the tablet API).

This confused me for a while, since wl_seat::get_pointer will return a new wl_pointer every time. If a seat effectively only has one virtual pointer, what does it mean to have multiple wl_pointer objects? The answer is that

input events really belong to seats, not pointers
any input event is 'broadcast' to all wl_pointers on that seat

That is, you can effectively think of a wl_pointer as a 'handle' to a seat that exposes pointer-specific requests and events. All wl_pointers for a specific seat will receive the same set of events at the same time, and if you want to broadcast an event to a pointer (or similar wrapper), you actually need to broadcast it to every pointer belonging to the same seat.

With that out of the way, here's the API I defined in include/sway/input/pointer_spy.h and sway/input/pointer_spy.c:

// in the .h
struct wp_pointer_spy_v1 {
  struct wl_resource *resource;
  struct wlr_seat *seat;
  struct wl_link link;
  // other irrelevant fields omitted...
};

struct wp_pointer_spy_manager_v1 {
  struct wl_link spies;
  // other irrelevant fields omitted
}

struct wp_pointer_spy_manager_v1 *
wp_pointer_spy_manager_v1_create(struct wl_display *display) {
  // implementation omitted because it's pretty boring
}

// in the .c
void wp_pointer_spy_manager_v1_send_motion(
    struct wp_pointer_spy_manager_v1 *manager, struct wlr_seat *seat,
    uint32_t time, double x, double y) {
  struct wp_pointer_spy_v1 *spy;
  wl_list_for_each(spy, &manager->spies, link) {
    if (spy->seat != seat) {
      continue;
    }
    wp_pointer_spy_v1_send_motion(spy->resource, time, wl_fixed_from_double(x),
                                  wl_fixed_from_double(y));
  }
}

This is a fairly simple API: a constructor and a 'method' on the returned object. The constructor takes a wl_display *, which is is effectively the 'root' of a running Wayland instance (the type for a monitor or other 'physical display' is called wl_output); the actual implementation is mostly just boring bookkeeping, registering handlers to properly destroy and free data structures when necessary, and so on.

The wp_pointer_spy_manager_v1_send_motion implementation is also very simple; the manager keeps track of all the spies that were created with it, and when we want to send a motion event, we send it to every spy with the given seat.

wl_list_for_each implementation details

If you don't care how wl_list_for_each works, you can skip this section; it's not at all necessary, and the semantics are obvious. But I thought this was cool.

The way the manager stores the list of spies uses a trick that I hadn't seen before in my limited C experience, but is pretty common in systems programming. A wl_list is a doubly linked list, but rather than each node storing a pointer to its value, the node is embedded inside the data structure itself:

struct wl_list {
  struct wl_list* prev;
  struct wl_list* next;
};

struct wp_pointer_spy_manager_v1 {
  struct wl_global *global;
  struct wl_list link;
  // rest omitted
};

struct wp_pointer_spy_v1 {
  struct wl_resource *resource;
  struct wlr_seat *seat;
  struct wl_list spies;
  // rest omitted
};

Here, the manager's wl_list has manager.spies->prev = NULL, and manager.spies->next points to the .spies member of the first spy in its list (or NULL if there isn't one).

But how do you actually get at the spy from its wl_list link? Pointer math!

#define wl_container_of(ptr, sample, member)                            \
        (WL_TYPEOF(sample))((char *)(ptr) -                             \
                             offsetof(WL_TYPEOF(*sample), member))

Given a pointer into a member, the name of the member, and the type of the containing struct (here provided by using typeof on sample), you can use the offsetof macro to find out the pointer to the underlying structure. This is why wl_list_for_each takes three parameters: you need the name of the variable to store the iterated value into (which is also used to pass the type into wl_container_of), the list to iterate over, and the name of the list's member in the structure.

This does mean that you need one wl_list in a structure for each list you want it to be a part of, but in practice this isn't a problem since most structs will only be part of one list anyway.

Oh, and wl_list_for_each is just a macro that expands into a fragment of a for-loop, which is why you can just put the loop body in braces afterwards:

#define wl_list_for_each(pos, head, member)				\
	for (pos = wl_container_of((head)->next, pos, member);	\
	     &pos->member != (head);					\
	     pos = wl_container_of(pos->member.next, pos, member))

Now that we have it defined, we just need to initialize the manager on server startup (not shown, since it's pretty boring) and make sure to call wp_pointer_spy_v1_send_motion in the right spot, and we're good to go!

// in sway/input/cursor.c
void pointer_motion(struct sway_cursor *cursor, uint32_t time_msec,
                struct wlr_input_device *device, double dx, double dy,
                double dx_unaccel, double dy_unaccel) {
        // ...rest is unchanged
        wp_pointer_spy_manager_v1_send_motion(
                server.pointer_spy_manager, cursor->seat->wlr_seat,
                time_msec, cursor->cursor->x, cursor->cursor->y);
        // ...rest is unchanged
}

sway_cursor is sway's representation of the cursor (the graphic image shown on-screen and its current position); we get the corresponding wlr_seat *, the cursor's current x and y coordinates, and then send the events.

client code

We've done the server half, but now we need to actually be able to use it. Since this is a protocol I just defined, it's clearly not going to be part of smithay-client-toolkit or wayland-client, but we can use wayland-scanner ourselves:

// in src/protocols.rs
#![allow(clippy::all, non_upper_case_globals, non_camel_case_types)]
pub mod pointer_spy {
    use wayland_client;
    use wayland_client::protocol::*;

    pub mod __interfaces {
        use wayland_client::protocol::__interfaces::*;
        wayland_scanner::generate_interfaces!("protocols/wp-pointer-spy-v1.xml");
    }
    use self::__interfaces::*;

    wayland_scanner::generate_client_code!("protocols/wp-pointer-spy-v1.xml");
}

Then we just need to set up an event handler:

struct AppState {
    seat_state: SeatState,
    spy_manager: WpPointerSpyManagerV1,
    pointer: Option<WlPointer>,
    pointer_spy: Option<WpPointerSpyV1>,
}

fn main() {
    let mut event_loop: EventLoop<AppState> = EventLoop::try_new().unwrap();
    let conn = Connection::connect_to_env().unwrap();
    let (globals, event_queue) = registry_queue_init(&conn).unwrap();
    let queue_handle = event_queue.handle();
    WaylandSource::new(conn.clone(), event_queue)
        .insert(event_loop.handle())
        .unwrap();
    let mut state = AppState {
        spy_manager: globals.bind(&queue_handle, 1..=1, ()).unwrap(),
        seat_state: SeatState::new(&globals, &queue_handle),
        pointer: None,
        pointer_spy: None,
    };
    loop {
        event_loop.dispatch(None, &mut state).unwrap();
    }
}

impl Dispatch<WpPointerSpyV1, ()> for AppState {
    fn event(
        _state: &mut Self,
        _proxy: &WpPointerSpyV1,
        event: <WpPointerSpyV1 as wayland_client::Proxy>::Event,
        _data: &(),
        _conn: &Connection,
        _qhandle: &wayland_client::QueueHandle<Self>,
    ) {
        eprintln!("{event:?}");
    }
}

We can test this easily without having to recompile and restart my running sway instance, since if you start sway from inside an existing instance you can get a second sway instance inside a window.

# running `systemd user --import-environment` (like I do in my config file)
# will mess things up, so run with an empty config to avoid that
sway --config /dev/null &
# run the demo client
WAYLAND_DISPLAY=wayland-2 cargo run

And it works!

(Note that this is running sway-in-sway, unlike the video at the top.)

If you want to play around with it yourself, download a patched copy of the sway source code (or the raw patch which applies against 1a3cfc50) and a demo program and run them.