Yelp develops two major applications, Yelp & Yelp for Business, for Web (Desktop & Mobile), iOS, and Android platforms. That’s eight unique clients! Keeping a fresh, consistent UI on all these clients is a major challenge. Server-driven UI (SDUI) has become a standard industry technique for managing UI on multiple platforms. At Yelp, many product teams created SDUI frameworks for their features. Though successful, these frameworks were expensive to develop and maintain, and no single SDUI framework supported all our clients. In late 2021, we began building a unified SDUI framework called CHAOS or “Content Hosting Architecture with Optimization Strategies”.

Why CHAOS?

CHAOS is a backronym. Initially, we thought it would make a good blog post! But we found deeper meaning in the name. According to chaos theory, small changes to a system can dramatically alter its state. CHAOS would simplify the process of deploying major UI changes on our clients, leading to our slogan: “Small changes have big results”

Though we chose CHAOS quickly, we went through many proposals for the phrase behind the acronym:

  • Creative and Humorous Acronym for Our System
  • Content Helps Accelerate Our Success
  • Components Help Accelerate Our Screens

We eventually settled on “Content Hosting Architecture with Optimization Strategies”.

“Content Hosting Architecture” made sense. UI is the content the user sees and interacts with. We were building an architecture for hosting interactive content. The content could be anything from an entire mobile screen or desktop browser page to a single UI element, often called a component.

We added “Optimization Strategies” because we planned to use machine learning (ML) to optimize content. For example, some consumers prefer to see photos when searching for businesses while others prefer to see reviews. Sometimes, the consumer’s preference changes depending on the type of business; photos might be more important for finding a good restaurant and reviews more important for finding a plumber. An ML model could select the best search experience automatically.

What is SDUI?

SDUI is a popular technique for managing UI on multiple platforms. In a standard UI, the client developer writes both presentation and data fetching logic. Updating the UI requires changing the client. For mobile clients, changes require going through the platform’s app release process and waiting for users to upgrade to the new version. If multiple clients require the same UI changes, the cost of making the changes increases dramatically.

In SDUI, the backend developer writes the presentation and data fetching logic, returning the configured UI to the client. The backend code can be updated without requiring changes to the client, and a single backend change can update the UI on multiple clients.

At Yelp, we’ve built many successful SDUI frameworks. Building a server-driven platform for mobile app development described one such framework, the Biz Native Foundation or BNF, for managing the UX on the iOS and Android versions of Yelp for Business.

The BNF has a very typical server-driven architecture for mobile clients. It supports server-driven mobile screens that host a list of components. Interacting with a component, such as tapping a button, triggers an action that updates the UI directly or indirectly through a property – a piece of observable application state.

While the BNF was being developed, several other major SDUI frameworks were being developed for Yelp’s consumer clients, and more teams were considering SDUI for their use cases. We organized an internal SDUI community to foster knowledge sharing and collaboration. Still, each SDUI framework was an independent effort. Some clients even had multiple SDUI frameworks controlling different aspects of the UI. A single product request might require changes to multiple SDUI frameworks!

Having a single, cross-platform SDUI framework would eliminate duplicate effort and simplify UI changes across multiple clients. We started CHAOS as a community-driven effort to build that framework.

REST vs. GraphQL for SDUI

Historically, we’ve built and maintained multiple REST APIs for our clients. Having different APIs, each with its own Swagger spec and backend Python service, was a big reason why we couldn’t unify our SDUI frameworks.

Fortunately, for the last several years, we’ve been switching all Yelp clients to a unified GraphQL API. Therefore, using GraphQL was a requirement for CHAOS. Even if we wanted to use REST for SDUI, our clients would need to support both REST & GraphQL. When Yelp introduced GraphQL, we wanted to replace REST entirely.

We were initially excited about using GraphQL for SDUI. We thought we could evolve our SDUI graph more easily than a REST API, which requires explicit versioning. We thought the explicitness of client queries would help maintain backwards compatibility because each request would document the supported types and fields. As we’ll discuss in the next section, GraphQL presented some challenges when designing the CHAOS API, and we ultimately embedded some REST objects for pragmatic reasons.

Designing CHAOS

We’ll start by outlining the original requirements for CHAOS, then discuss the use model and how it was translated into a GraphQL API.

Requirements

  • Use GraphQL
  • Support a variety of use cases on web and mobile clients
  • Handle forwards & backwards compatibility when making changes

Use model

A view is a piece of UI managed by CHAOS. Every view has a unique name and a layout, which arranges a set of components. Components can trigger actions to implement side-effects. Every layout, component, or action has a unique versioned type.

For example, a product manager wants a simple view to help new Yelp users find local business. The initial design requires a single column layout with text, illustration, and button components. Clicking the button opens a deep link to a Yelp search.

We can easily extend CHAOS to support more use cases by adding more layouts, components, and actions. Layouts can be a single column, a row, or a full web page/mobile screen with multiple sections. Components can be a single piece of text, a button, or an entire section. Actions can open URLs, log analytics, or update application state.

A Yelp client queries the CHAOS GraphQL API for a view. The GraphQL API loads the view by calling a standardized REST API on a CHAOS backend implemented as a Python service.

There’s no single CHAOS backend for all views. Rather, CHAOS backends are microservices for UI. They can be responsible for a single view or multiple related views, and the CHAOS API dispatches client queries based on the view name.

CHAOS provides React, Android, and iOS client libraries for making GraphQL queries and rendering views. CHAOS provides a Python package for building views in CHAOS backends.

Dream Query

At Yelp, when building new GraphQL APIs, we start by writing a Dream Query. We need a query to fetch a CHAOS view by its unique name:

query GetChaosView($name: String!) {
    chaosView(name: $name) {
        views {
            identifier
            layout
        }
        initialViewId
        components
        actions
    }
}

The query returns a ChaosConfiguration with an array of views and an initial view ID. Though many CHAOS use cases have a single view, some use cases have a sequence of related views. We could always fetch subsequent views with additional GraphQL queries, but they would require extra round trips over a potentially slow and unreliable network connection. Consequently, CHAOS supports returning multiple views within the same configuration for better performance and reliability.

Each view has a layout that arranges components by ID. Layouts are represented by the ChaosLayout union type:

union ChaosLayout = ChaosSingleColumn | ChaosMobilePhoneScreen

CHAOS supports a single column layout that arranges components in a vertical stack, which is great for adding some SDUI to an existing web page or mobile screen.

type ChaosSingleColumn implements ChaosLayout {
    rows: [String!]!
}

CHAOS also supports a layout for controlling an entire mobile phone screen, a common use case for many of our existing SDUI frameworks.

type ChaosMobilePhoneScreen implements ChaosLayout {
    toolBar: String
    main: [String!]!
    footer: String
}

We’ve been experimenting with layouts for entire web pages and will report on those efforts in subsequent blog posts. More commonly, our web clients use single column layouts to add some SDUI content to a page that otherwise uses traditional data fetching and presentation logic.

Layouts refer to components by ID, and all components in a ChaosConfiguration are stored in the top-level components field. Similarly, components refer to actions by ID, and all actions are stored in the top-level actions field.

Storing components and actions in the top-level configuration has some practical benefits. First, it reduces response size when components or actions are referenced multiple times. Second, it improves readability because layouts are compact and focused on how components are arranged.

Modeling components & actions

Initially, we planned to use explicit GraphQL types to model each component and action. We defined interfaces that all components and actions must satisfy. Because we reference components and actions by ID, they must have a unique string identifier. The other fields depend on the particular component or action.

Let’s say CHAOS supports a single component (ChaosButton) and action (ChaosOpenUrl) with the following GraphQL types:

type ChaosButton implements ChaosComponent {
    identifier: String!
    text: String!
    onClick: [String!]!
}

type ChaosOpenUrl implements ChaosAction {
    identifier: String!
    url: String!
}

The client’s query uses fragments to specify the supported component and action types:

query GetChaosView($name: String!) {
    chaosView(name: $name) {
        views {
            identifier
            layout {
               ... on ChaosSingleColumn {
                  rows
                }
            }
        }
        components {
            ... on ChaosButton {
               identifier
               text
               onClick
            }
        }
        actions {
            ... on ChaosOpenUrl {
               identifier
               url
            }
        }
        initialViewId
    }
}

Though this seems like a sensible approach, we found a number of issues in practice.

First, components and actions aren’t like traditional GraphQL types for data fetching. A main selling point for GraphQL is that clients fetch only the fields they require. Well, the client can’t query some button fields and not others; the button won’t work without onClick!

Second, adding new fields must be done carefully. Let’s add a new style parameter to control the appearance of the button:

type ChaosButton implements ChaosComponent {
    identifier: String!
    text: String!
    style: ChaosButtonStyle
    onClick: [String!]!
}

Unfortunately, we’ve already released the original button to mobile clients, and there are older app versions that don’t support style. How did we communicate to the CHAOS backend that the mobile client supports the new field?

The GraphQL server knows whether the client’s query includes the new field. We use Apollo Server, and it supplies an info argument to the component’s resolver with an abstract syntax tree (AST) representing the query. But we need to traverse through several nested arrays and objects to find whether style is part of the ChaosButton fragment:

We also need to communicate to the CHAOS backend that the field is available. We’ll be constantly adding and (less frequently) removing fields. Do we send a list of supported fields for every component and action to the backend? That would add a considerable amount of overhead to each request.

The third issue is that adding a type has the same problem. Let’s add a new component to represent a block of styled text:

type ChaosText implements ChaosComponent {
    identifier: String!
    text: String!
    textStyle: ChaosTextStyle
    textAlignment: ChaosTextAlignment
}

The client’s query must be updated to support the new component type:

query GetChaosView($name: String!) {
    chaosView(name: $name) {
        views {
            identifier
            layout {
               ... on ChaosSingleColumn {
                   rows
               }
            }
        }
        components {
            ... on ChaosButton {
                identifier
                text
                style
                onClick
            }
            ... on ChaosText {
                identifier
                text
                textStyle
                textAlignment
            }
        }
        actions {
            ... on ChaosOpenUrl {
                identifier
                url
            }
        }
        initialViewId
    }
}

To determine if the query includes the ChaosText fragment, the component’s GraphQL resolver must delve deep into the AST, then pass that information along to the CHAOS backend in a list of supported components (and actions):

In the end we decided that explicit, unversioned GraphQL types weren’t practical. We’d spend too much time and effort maintaining our GraphQL layer without much real benefit. The clients would be writing large queries, and the server would be parsing them. Instead, we modeled each component or action as a versioned REST object in JSON format.

Every component or action has a unique type string with an integer version number, such as chaos.button.v1 and chaos.open-url.v1. GraphQL doesn’t natively support JSON or map fields, so parameters are stored in a stringified JSON object.

type ChaosJsonComponent implements ChaosComponent {
    identifier: String!
    componentType: String!
    parameters: String!
}

type ChaosJsonAction implements ChaosAction {
    identifier: String!
    actionType: String!
    parameters: String!
}

For example, a button component in our GraphQL response looks like:

{
   "identifier": "primacy-cta",
   "componentType": "chaos.button.v1",
   "parameters": "{\"text\": \"Find local businesses\", \"onClick\": [\"open-search-url\"]}",
    "__typename": "ChaosJsonComponent"
}

Clearly, the stringified JSON isn’t very readable. We’ve created developer tools to edit and debug CHAOS configurations.

We still use GraphQL types for views and layouts. These types change less frequently and contain the high-level structure of the UI, so direct readability is more useful. Internally, we still associate layouts with a unique versioned type string, e.g. chaos.single-column.v1, and we may switch to embedded REST objects for layouts, too. We’re still figuring out the right balance between GraphQL and REST, but we’ve been using the approach in production for more than two years without revisiting the decision.

Here’s a complete CHAOS configuration to see how everything comes together:

{
  "data": {
    "chaosView": {
      "views": [
        {
          "identifier": "consumer.welcome",
          "layout": {
            "__typename": "ChaosSingleColumn",
            "rows": [
              "welcome-to-yelp-header",
              "welcome-to-yelp-illustration",
              "find-local-businesses-button"
            ]
          },
          "__typename": "ChaosView"
        }
      ],
      "components": [
        {
          "__typename": "ChaosJsonComponent",
          "identifier": "welcome-to-yelp-header",
          "componentType": "chaos.text.v1",
          "parameters": "{\"text\": \"Welcome to Yelp\", \"textStyle\": \"heading1-bold\", \"textAlignment\": \"center\"}}"
        },
        {
          "__typename": "ChaosJsonComponent",
          "identifier": "welcome-to-yelp-illustration",
          "componentType": "chaos.illustration.v1",
          "parameters": "{\"dimensions\": {\"width\": 375, \"height\": 300}, \"url\": \"https://media.yelp.com/welcome-to-yelp.svg\"}}"
        },
        {
          "__typename": "ChaosJsonComponent",
          "identifier": "find-local-businesses-button",
          "componentType": "chaos.button.v1",
          "parameters": "{\"text\": \"Find local businesses\", \"style\": \"primary\"}, \"onClick”: [\"open-search-url\"]}"
        }
      ],
      "actions": [
        {
          "__typename": "ChaosJsonAction",
          "identifier": "open-search-url",
          "actionType": "chaos.open-url.v1",
          "parameters": "{\"url\": \"https://yelp.com/search\"}"
        }
       ],
      "initialViewId": "consumer.welcome",
      "__typename": "ChaosConfiguration"
    }
  }
}

Versioning components & actions

When changing a component or action, we increment the version. For example, adding style to the CHAOS button introduces chaos.button.v2.

Clients have their own internal component libraries and use factories associated with each component type to map the CHAOS component to the internal component’s interface. Actions go through a similar mapping process.

CHAOS backends use a YAML config file to determine what component or action types can be used in a CHAOS configuration. The GraphQL layer passes information about the platform (React, iOS, or Android) to the CHAOS backend. For mobile clients, the GraphQL layer also passes the app version.

For React clients, we can update all our React clients simultaneously using Gondola, Yelp’s PaaS for front-end deployment. Therefore, we use web: true to indicate that a type is available for web clients.

For mobile clients, we can’t update older versions. We also have distinct apps for consumers & business owners on each platform. Therefore, we use start: <app version> to indicate the first app version that supports a type, and each app/platform combination has its own value.

components:
  - type: chaos.button.v1
    web: true
    consumer-ios:
      start: 22.1.0
    consumer-android:
      start: 22.3.0
    biz-ios:
      Start: 22.1.0
    biz-android:
      start: 22.6.0

actions:
  - type: chaos.open-url.v1
    web: true
    consumer-ios:
      start: 22.1.0
    consumer-android:
      start: 22.3.0
    biz-ios:
      Start: 22.1.0
    biz-android:
      start: 22.6.0

Use Cases

We shipped the first CHAOS use case to production in early 2022, only a few months after starting development. Since then, we’ve been regularly shipping new use cases. CHAOS development is entirely use-case driven. We add new layouts, components, and actions when they are required.

CHAOS isn’t intended to replace traditional UI development. We use CHAOS where it makes sense. Usually, a good use case for CHAOS satisfies one or more following conditions:

  • It must be consistent across multiple clients.
  • It has dynamic, highly contextual content.
  • It must be updated quickly on mobile clients.

For example, CHAOS manages the Yelp for Business support flow on web and mobile clients. When a business owner opens the support flow, we show a CHAOS view with a list of support options:

Some business owners use multiple clients, and some businesses are managed by multiple owners who use different clients. Therefore, we want to show consistent support options on all clients.

Support options are also dynamic and highly contextual. Live chat or phone support isn’t available 24/7, and the phone number depends on location.

Finally, if there’s a technical issue such as an outage, we want to update our mobile clients quickly without waiting for an app release. By adding a note that we’re aware of the issue and working on it, we can keep business owners informed and avoid unnecessary support calls.

With CHAOS, the support options can be updated on all clients by deploying a change to a single backend service.

Future Projects

As we adopt CHAOS more broadly within Yelp, we’ve identified some key areas for future investment.

Automated previews

To verify changes to a CHAOS view, a backend developer tests each client manually.

Though testing web clients is relatively straightforward – everyone has access to a browser – testing mobile clients requires access to simulators or physical devices. Before Yelp switched to remote work, we maintained a mobile device library in each engineering office. After the switch, we integrated with a cloud-based testing solution from a vendor. Even so, manual testing is cumbersome for a backend developer who needs to verify multiple platforms or app versions.

In the future, we plan to support automated previews. When a backend developer publishes a GitHub PR with changes to a CHAOS view, we’ll automatically generate previews for each platform and attach them to the PR when ready.

No-code configuration updates

Currently, when a product manager or designer wants to change a CHAOS view, they must ask a backend developer. The backend developer changes the Python code that configures the CHAOS view, creates a PR, gets it approved, and deploys the changes to production. Even simple changes, such as changing copy, require 30 minutes to several hours.

In the future, we plan to support no-code configuration updates for product managers and designers through internal editing tools.

Optimization strategies

Despite being a core part of the CHAOS backronym, we haven’t implemented any optimization strategies for CHAOS content. Selecting, ordering, and configuring CHAOS content must be done manually in Python code.

In the future, we plan to use ML to automatically select, order, and configure some CHAOS content.

More CHAOS?

This is the first in a series of blog posts about CHAOS. In upcoming blog posts, our client engineers will explain how CHAOS works on Web, iOS, and Android clients, and our backend engineers will explain how to build a CHAOS backend in Python.

Become an Engineer at Yelp

We work on a lot of cool projects at Yelp, if you're interested apply!

View Job

Back to blog