Why did we choose Clay for our desktop app?

Out of all ways to write cross-platform apps, why did we choose Clay to write our desktop app? Let’s first find out what features need to be implemented for the ViaAudio desktop app.

The core feature for ViaAudio is to stream audio wirelessly from PCs to phones. So intuitively the first feature the app needs some way to record audio on PC. Then, the audio needs to be sent wirelessly, so it needs access to socket APIs. Finally, it needs some UI to let the user input which IP address to stream the audio to, and which sound card it captures from.

Why not Electron, Flutter, etc.

The main reason we went with C and Clay instead of something like React Native or Flutter is simple: ViaAudio needs low-level access to system APIs. For audio capturing specifically, we’re using miniaudio, which is written in C. miniaudio provides cross-platform access to low-level audio APIs like pulseaudio on Linux and WASAPI on Windows. miniaudio handles all this complexity for us with a clean C API. It’s hard to get this level of system access with Electron / Flutter without using some

And honestly, we don’t need a fancy UI. The whole interface is just a start/stop button, a text input for the IP address, and a list of audio devices to choose from. That’s it. No animations, no complex state management, no navigation stack. Using React Native or Flutter for this would be massive overkill.

What Clay actually does

So what is Clay? Clay is a layout library. You define your UI layout in a way similar to flexbox, and Clay translates that into a list of draw commands.

Here are a list of all possible rendering commands that Clay produces:

CLAY_RENDER_COMMAND_TYPE_RECTANGLE - A rectangle should be drawn, configured with .config.rectangleElementConfig
CLAY_RENDER_COMMAND_TYPE_BORDER - A border should be drawn, configured with .config.borderElementConfig
CLAY_RENDER_COMMAND_TYPE_TEXT - Text should be drawn, configured with .config.textElementConfig
CLAY_RENDER_COMMAND_TYPE_IMAGE - An image should be drawn, configured with .config.imageElementConfig
CLAY_RENDER_COMMAND_TYPE_SCISSOR_START - Named after glScissor, this indicates that the renderer should begin culling any subsequent pixels that are drawn outside the .boundingBox of this render command.
CLAY_RENDER_COMMAND_TYPE_SCISSOR_END - Only ever appears after a matching CLAY_RENDER_COMMAND_TYPE_SCISSOR_START command, and indicates that the scissor has ended.

Clay only produces lists of rendering commands. To actually render the UI, we need a renderer. We chose the Sokol renderer for that. Sokol is a cross-platform graphics library that sets up the window and provides the OpenGL environment. The sokol clay renderer takes those draw commands from Clay and uses sokol_gl to actually draw the UI elements on screen. It’s a pretty clean separation of concerns – Clay handles the “what and where”, sokol handles the “how”.

The beauty of single-header C libraries

Here’s where things get really nice: Clay, sokol, and miniaudio are all single-header libraries. This means no complicated build systems, no cmake hell, no figuring out how to link against different .so files. The entire build command for this project is just gcc main.c -lm -lGL -lXi -lX11 -lXcursor. That’s it. One command, one source file, done.

The platform-specific parts

Of course, nothing is completely cross-platform. We can reuse most of the code – the entire UI part, all the audio capturing logic, and most of the networking code works identically on Windows and Linux

But some things are inevitably platform-specific. Getting the config file path is different on each platform (on Linux it’s ~/.config/viaaudio, on Windows it’s the roaming AppData directory). The socket API is also slightly different – on Windows you need to initialize Winsock with WSAStartup() and use closesocket(), while on Linux you just use close(). These are minor annoyances, but they’re contained to small #ifdef _WIN32 blocks scattered throughout the code.

The downside: no components provided

Now for the biggest pain point with Clay: it provides absolutely no UI components. It gives you flexboxes and text rendering, and that’s it. Want a button? Build it yourself with a colored rectangle and some hover detection. Want a text input? Well, good luck writing your own.

The text input box for the IP address was particularly fun to implement. We render each character as a separate text component with click listeners to position the cursor. That way, we know where to move the cursor to when characters are clicked. Then we listen to keyboard inputs and handle insertion and backspace. It sounds simple, and for our use case (just entering IP addresses), it actually is.

But this would be extremely hard to scale to a fully-featured text input. The current implementation doesn’t support custom input methods (no Chinese pinyin input), and we don’t properly handle Unicode. But for typing “192.168.1.100”, what we have works perfectly fine.

Was it worth it?

So was choosing Clay worth it? For this specific project, absolutely. We got a lightweight, cross-platform app with minimal dependencies that produces a small binary. The development experience was pretty smooth once we got past the initial learning curve. Sure, we had to implement some UI components from scratch, but for such a simple interface it wasn’t a big deal. And having direct access to system APIs through C made the audio capturing and networking code straightforward.

Would I recommend Clay for a more complex app with lots of UI components? Probably not. The lack of built-in components would become a serious productivity killer. But for small utilities that need system access and simple interfaces, the Clay + sokol + miniaudio stack is actually pretty sweet.