Decision time with AWS Keyspaces

At OpenCredo, we have been working with Cassandra for years and we have a good understanding of its pros and cons. The raw write performance of Cassandra cannot be denied but the overwhelming complexity of its operations – when combined with the mental adjustments required of developers to design appropriate data and query models – means that in many cases, we can no longer recommend operating it yourself.

This is supported by the rise of several managed service providers offering cassandra-as-a-service. There are Cassandra-compatible offerings from Instaclustr, Datastax, Aiven and Scylla. As of April 2020, AWS also has a generally available offering: Amazon Keyspaces.

In this blog post, we’ll look at Amazon’s offering – how it differs from open source Cassandra and what use cases it might be suitable for.

What is AWS Keyspaces?

AWS Keyspaces is a fully managed serverless Cassandra-compatible service. That’s Cassandra – compatible. So we can assume that it’s not actually vanilla Cassandra under the hood. This is unofficially confirmed:

Unsurprisingly, it integrates with DynamoDB technology. Cassandra and Dynamo share a common heritage and DynamoDB has been well worn-in over the years so we won’t necessarily consider this problematic.

What is more interesting is that it is serverless and autoscaling: there are no operations to consider: no compaction, no incremental repair, no rebalancing the ring, no scaling issues. For any under-resourced data operations team this must be the main selling point: Keyspaces provides an SLA – if this SLA is acceptable to your internal consumers of Cassandra, then Keyspaces can be used in place of your internal cluster.

AWS Keyspaces is delivered as a 9 node Cassandra 3.11.2 cluster; that is, it is compatible with tools and drivers for Cassandra 3.11.2, including the Datastax Java Driver. Whilst we have no control over the underlying operations and configuration, we do have full control over the data plane – keyspaces, tables and read-write operations.

How does it differ from Cassandra?

So this all looks pretty promising, what are the compromises? At the moment the following limitations are in place over vanilla Cassandra 3.*:

Only single datacenter deployments are possible, within a single AWS region.
Writes are always replicated 3 times – across AWS availability zones – and acknowledged using LOCAL_QUORUM.
Reads are only available with consistency level ONE/LOCAL_ONE and LOCAL_QUORUM.
No indexes, logged batches, user defined types, triggers, user defined functions or materialised views
As usual with AWS, subject to quotas which you’ll need to keep an eye on.

These compromises are not particularly onerous – we would generally want to ensure that our writes are QUORUM consistent, although we may miss consistency level ONE for fast, risk-tolerant writes. Likewise with the possibility of ONE and QUORUM reads we have the possibility of fast reads (for, say immutable data) and consistent(ish) data respectively.

The missing functionality is not central to typical use cases for Cassandra, or indeed an anti-pattern – for example materialized views are best suited to scenarios where write throughput is low, but if write throughput is low, why are you using Cassandra and not something else?

Do we get anything extra?

Being integrated with AWS, there are some additional functions that we can take advantage of. These essentially take the place of native Cassandra functions:

Security authentication and authorization is outsourced to AWS IAM which, by now, is a mature and proven system. We can set granular least-privilege resource policies at the keyspace and table level. DDL operations are logged in Cloud Trail.
Encryption at rest is automatically configured with data encrypted with AES-256 with a single KMS CMK key across a keyspace. Communications in transit between the client and keyspaces are protected by TLS. We’ll assume that node-to-node communications are secure but this remains opaque as with many cloud services.
Keyspaces provides Point-in-time Backup and Recovery to the nearest second for up to 35 days.
CloudWatch provides relevant metrics, far fewer than open source Cassandra – but this reflects the serverless nature of the service where we don’t need to wrestle with the complex multivariate health indicators provided natively.

These remove a class of challenges, there are tools to help like Medusa for backup but for an architecture already integrated into the AWS ecology, these are better aligned.

How do we implement Keyspaces?

As of September 2020, unfortunately there is no support yet for AWS Keyspaces in the Terraform AWS provider, although the issue appears to have been raised. Likewise, it doesn’t appear to be available in the AWS Python SDK (boto3) or the AWS CLI, but does in CloudFormation. On this basis, we will need to be temporarily expedient about how Keyspaces is provisioned.

Architecturally, there is support for Interface VPC Endpoints which should be used to keep traffic private to the AWS network and will allow fine grained controls over access to the VPC endpoint and what that endpoint can access (preventing classes of exfiltration attack, a vector often neglected)

Once setup, we need to connect to Keyspaces. The main way to do this is probably the Datastax Java Driver which supports a range of features including connection pooling, load balancing and the control connection. It’s a bit ceremonial to get started but well documented.

Out of the box, each TCP connection to Keyspaces supports up to 3,000 queries per second – so that’s up to 27,000 across the 9 nodes. With the Datastax driver (on a recent version and version of Cassandra) we default to 1 shared connection per node – this can be increased and must where we are looking for higher throughput from Keyspaces – there is no limit to the number of TCP connections that can be made to Keyspaces nodes. So, we can scale up by TCP connection configuration rather than resizing the cluster.

We must also choose the pricing model – either on-demand where capacity auto-scales to… demand, or provisioned where capacity is fixed (but easily changed) with some cost advantages.

What’s the catch?

So, our feature survey above indicates that there is a lot of upside and only a few downsides to using AWS keyspaces. Let’s take a step back and evaluate it again:

We’ve already discussed the lack of support from standard infrastructure-as-code tools like Terraform and the AWS CLI. We also note that, as of writing (September 2020), AWS Keyspaces does not appear in the AWS Services in Scope by Compliance Program list. So its compliance status seems to be presently “undefined” and presumably “in-progress”. For sensitive workloads this places a delay of unknown length on adoption (which, of course, might fruitfully be employed in evaluation).

This lack of contextual maturity is also reflected in lack of proving of the system. Particularly given that this is a new cloud service, it is critical that you benchmark the performance of your data and querying model against the performance of Keyspaces. The under-the-hood assumptions made to provide a SaaS service may, or may not, suit your workloads.

A final concern is pricing, which seems to be roughly DynamoDB + 15%. To illustrate, if we are using on-demand pricing and have roughly 10,000 writes, of up to 1Kb, per second, in the London region, then this will cost:

$1.7221 * 0.01 * 60 * 60 * 24 * 365 = $543,081 per year

This doesn’t even include reads, storage, point-in-time recovery, VPC endpoints or data transfer. Expensive stuff.

Having said this, we need to take into account the TCO of running Cassandra yourself. It is generally considered that you need at least 2 dedicated staff to a non-trivial Cassandra cluster such as one ingesting 10,000 writes per second. These staff, with skills in distributed systems, caching, operating system configuration, JVM tuning etc are highly in demand and consequently, expensive to hire and retain.

Conclusions

AWS keyspaces is a slick offering that is well integrated into the AWS stack and provides a tidy, usable UI. It removes entire classes of pain from using Cassandra, freeing up staff and providing them with opportunity to pursue activities which might be more profitable to your business.

As with all new AWS services, time will tell: the compliance boxes must be ticked, performance better understood and the service matured before it becomes a sensible option for organisations who are not especially risk-tolerant.

Who would use it? Given the pricing model, it makes sense for:

Organisations who are dipping their toes in the cloud NoSQL space: who want to understand how NoSQL could support new or existing types of workload and want to use something which could be migrated to Open Source in the future.
Organisations who have rolled out Cassandra and are finding that the support burden of Cassandra is significantly inhibiting their ability to execute in other areas.

Web-scale and big data organisations who have mature teams, processes and significant workloads are unlikely to find much of interest here. The costs of migration are simply too great when compared to running on, say i3 instances on EC2; the promises of Cassandra 4.0 are too attractive.

For startups and other organisations where the priority is go-to-market, Keyspaces could be an extremely valuable starting point for supporting big data and streaming workloads without the hassle.

This blog is written exclusively by the OpenCredo team. We do not accept external contributions.

RETURN TO BLOG

SHARE