Notes on GraphQL

The last week has been my first foray into GraphQL, using the GitHub GraphQL API endpoints. I now have OpinionsTM.

The promise is fantastic: query for everything you need, but nothing more. Get it all in one go.

But the reality is somewhat... different.

What I found was that you end up with a lot of garbage data structures that you then, on the client side, need to decipher and massage, unpacking edges, nodes, and whatnot. I ended up having to do almost a dozen array_column(), array_map(), and array_reduce() operations on the returned data to get a structure I can actually use.

The final data I needed looked like this:

[
  {
    "name": "zendframework/zend-expressive",
    "tags": [
      {
        "name": "3.0.2",
        "date": "2018-04-10"
      }
    ]
  }
]

To fetch it, I needed a query like the following:

query showOrganizationInfo(
  $organization:String!
  $cursor:String!
) {
  organization(login:$organization) {
    repositories(first: 100, after: $cursor) {
      pageInfo {
        startCursor
        hasNextPage
        endCursor
      }
      nodes {
        nameWithOwner
        tags:refs(refPrefix: "refs/tags/", first: 100, orderBy:{field:TAG_COMMIT_DATE, direction:DESC}) {
          edges {
            tag: node {
              name
              target {
                ... on Commit {
                  pushedDate
                }
                ... on Tag {
                  tagger {
                    date
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

Which gave me data like the following:

{
  "data": {
    "organization": {
      "repositories: {
        "pageInfo": {
          "startCursor": "...",
          "hasNextPage": true,
          "endCursor": "..."
        },
        "nodes": [
          {
            "nameWithOwner": "zendframework/zend-expressive",
            "tags": {
              "edges": [
                "tag": {
                  "name": "3.0.2",
                  "target": {
                    "tagger": {
                      "date": "2018-04-10"
                    }
                  }
                }
              ]
            }
          }
        ]
      }
    }
  }
}

How did I discover how to create the query? I'd like to say it was by reading the docs. I really would. But these gave me almost zero useful examples, particularly when it came to pagination, ordering results sets, or what those various "nodes" and "edges" bits were, or why they were necessary. (I eventually found the information, but it's still rather opaque as an end-user.)

Additionally, see that pageInfo bit? This brings me to my next point: pagination sucks, particularly if it's not at the top-level. You can only fetch 100 items at a time from any given node in the GitHub GraphQL API, which means pagination. And I have yet to find a client that will detect pagination data in results and auto-follow them. Additionally, the "after" property had to be something valid... but there were no examples of what a valid value would be. I had to resort to StackOverflow to find an example, and I still don't understand why it works.

I get why clients cannot unfurl pagination, as pagination data could appear anywhere in the query. However, it hit me hard, as I thought I had a complete set of data, only to discover around half of it was missing once I finally got the processing correct.

If any items further down the tree also require pagination, you're in for some real headaches, as you then have to fetch paginated sets depth-first.

So, while GraphQL promises fewer round trips and exactly the data you need, my experience so far is:

  • I end up having to be very careful about structuring my queries, paying huge attention to pagination potential, and often sending multiple queries ANYWAYS. A well-documented REST API is often far easier to understand and work with immediately.

  • I end up doing MORE work client-side to make the data I receive back USEFUL. This is because the payload structure is based on the query structure and the various permutations you need in order to get at the data you need. Again, a REST API usually has a single, well-documented payload, making consumption far easier.

I'm sure I'm probably mis-using GraphQL, or missing a number of features to make this stuff easier, but so far, I'm left wishing I could just have a number of useful REST endpoints that I can hit consistently in order to aggregate the data I need.

Before anybody suggests it, yes, I am very aware that GitHub also offers a REST API, and the v3 API has endpoints for most of what I needed. However, I had to rely on tags, not releases, as not all of our tags have associated releases. However, the data returned for tags does not include the commit date; for that, you need to fetch the associated commit, and then the date may be under either the author or the committer. This approach would have meant literally thousands of calls to get the data I need, which would have had me hitting rate limits, and potentially taking hours to complete.

My point: perhaps instead of GraphQL, aggregating a bit more data in REST resources (e.g., including commit data with tags), or providing endpoints that allow merging specific resource types could have solved the problem easily. This is where having a developer relations team that finds out what data consumers are needing comes in handy, instead of simply mandating graphql all the things to allow infinite flexibility (and the frustrations of such flexibility, both for the API developer and consumer).

Updates
  • 2018-09-19: syntax highlighting fixes.