Introduction

JWorks group picture JWorks at Kubecon 2023

Another year, another KubeCon | CloudNativeCon EU edition. And, of course, JWorks was also present at this year’s edition in Amsterdam in the RAI. We wanted to hear more about all the new features, frameworks, tools, ideas and concepts that the Kubernetes and cloud world have to offer. And we got what we wanted. We attended many interesting talks, talked to very intriguing people at the event, and had a lot of fun while doing so. Next to the talks, there were also many booths from companies from all over the world (AWS, Azure, Canonical, …) to engage with other people and to talk and promote their newest products for the developer market. You can look at some of the talks we attended, which were very interesting. You can click on the talk title to go to the recorded version on YouTube, which CNCF provides.

Metrics at Full Throttle: Intro and Deep Dive Into Thanos - Saswata Mukherjee & Filip Petkovski, Shopify

Thanos enables a highly available Prometheus setup. It replaces parts of the Prometheus deployment model that are hard to scale with regular Prometheus.

This talk is a great introduction to Thanos. Saswata and Filip start by explaining what difficulties you can have with scaling a regular Prometheus setup. They explain what components of Thanos solve the different issues with scaling Prometheus. The following components of Thanos are introduced: Sidecar, Ruler, Receive, Query, Compactor, and Store gateway.

Where the first half of the talk is mainly targeted towards people who don’t know Thanos or don’t know all capabilities of Thanos yet, the second half was surely targeted towards experienced Thanos users. In the second half of the talk, Saswata and Filip highlight recent, since the KubeCon Detroit, improvements to Thanos.

They discuss 5 new features in Thanos. The first boils down to optimizations to the store gateway to remove the high IO requirements to run the store gateway. The new store gateway implementation doesn’t require the high IO disk anymore as the information is stored in memory instead of on disk.

Three new features were announced to optimize the querying throughout the different components in Thanos.

First, quality of service limits was added as Thanos configuration options. This allows teams managing Thanos to define the limits to tune the Thanos performance and prevent a single query from overloading the system.

The second improvement related to query performance was a newly implemented, Thanos-specific, PromQL engine. With the new query engine, the query will be analyzed up front and an optimal (parallel!) execution will be determined. Since not all operators are supported yet, the new engine will be used when possible with a fallback to the standard Prometheus PromQL engine when needed.

The third improvement is the distributed execution of queries. This mechanism allows queries to be executed by the nodes that have direct access to the needed data, preventing costly data transfers between components in the Thanos architecture. If you want more information on this amazing feature, definitely check out this talk when it’s available on YouTube!

A new hash ring mechanism was implemented for the Receiver to prevent overloading single Receiver instances.

They end by showing real-world performance graphs from a Thanos deployment and how different versions of Thanos caused visible improvements.

PlayStation and Kubernetes: How to Solve a Problem Like Real-Time - Joseph Irving, PlayStation

Since its birth in 1994, (Sony) PlayStation has been a global pioneer in the gaming industry. As PlayStation grew, its infrastructure needed to grow as well. Over time, PlayStation had to scale its game servers so that it could handle their demand. This had to be done in a way where game servers could be scaled in a compatible way with the game clients. This means that simply spinning up more instances of the game server might not always work.

This talk, presented by Joseph Irving, went over the kind of problems they faced, different types of real-time game servers, and their advantages and limitations (peer-to-peer, dedicated game servers, etc.).

Their solution was to use a project called Agones, which is created and maintained by Google in collaboration with Ubisoft. Agones is an open-source platform that provides a native way of running game servers on Kubernetes without worrying about the infrastructure. It provides compatibility with game server connections by using GameServers and connecting those GameServers with Fleets. Joseph introduces the Agones framework and talks about how they use it and how it helped them to scale their gaming servers and to provide multi-regional operability.

He concludes the talk by talking about Matchmakers, which implements matchmaking (finding matches for people that are searching for a lobby in a game) through Kubernetes.

Automating Configuration and Permissions Testing for GitOps with OPA Conftest - Eve Ben Ezra & Michael Hume, The New York Times

Open Policy Agent is a tool to add policy-based control to cloud-native environments. OPA is used in many different tools and systems as a system to validate configuration before it’s deployed to cloud environments.

In this talk, Eve and Michel explain how OPA is used in the New York Times internal developer platform to control what their developers deploy. Eve starts by showing what problems they experienced at the New York Times with allowing developers to use the internal platform. As with all validation (aka testing) mechanisms, shifting left is the focus. They continue to explain how Conftest helped them in providing feedback to the developer as soon as they write a single line of code. Validating the configuration using the same definition along every set in the process, makes it very transparent to developers where issues are introduced and what they can do to fix them.

Next, a great introduction to OPAs rule language Rego is shown by Eve. They take the audience through a complete example, explaining what is defined and how it’s interpreted, including the quirks, using Rego.

Finally, Michel shows how Kubeconform can be used to help with Kubernetes version migrations, including CRDs. They show a real example and explain how the output of the tool can be used to quickly identify what’s wrong with a manifest.

If anything, this talk is a brilliant introduction to Rego. If you have an interest in policy management in your Kubernetes cluster or if you have experienced hard-to-find bugs in Kubernetes manifests, this is a must-watch talk from the New York Times.

State of the Mop: Cloud Custodian in 2023

During the presentation, Kapil Thangavelu provided a concise update on Cloud Custodian, an open-source rules engine designed for account and resource management on AWS, Azure, and GCP, based on the Rego language. Cloud Custodian can effectively scale from small businesses to large enterprises.

Since Kubecon Detroit in October 2022, Cloud Custodian has added support for two new providers. Additionally, they’ve included Terraform support to enable users to check their Terraform source code while running inside a pipeline.

Looking ahead, Cloud Custodian’s roadmap for the current year includes adding a new K8s admission controller, support for AWS CloudFormation, preventative support for AWS, and an improved authoring experience through the addition of policy testing, policy tracing, a policy debugger, and more.

Cloud Custodian’s open-source nature, flexibility, and upcoming roadmap make it a tool to watch for organizations managing cloud resources. It provides a powerful and customizable way to improve security, cost optimization, and compliance across different cloud providers.

Scaling Databases at Activision - Greg Smith & Vladimir Kovacik, Activision/Blizzard

This talk goes over how Activision / Blizzard scaled their databases over time. They introduce the session by talking about hosting and operating their databases in the past, which included bare metal, many virtual machines & containers, … As they progressed and more people joined their games (especially on launch days), they quickly discovered that there was a need for better scaling, especially in the number of shards, due to performance issues.

As the whole company was internally moving to Kubernetes, they wondered if there was a way to have their MySQL databases hosted on Kubernetes natively. This is where they introduce Vitess, a native way to run and scale MySQL-compatible databases. They wanted to use something that supported MySQL since they had plenty of MySQL experts employed and because they used MySQL before adopting Vitess.

They divided the migration to Vitess into three adoption stages: MVP, Load Testing & Production Readiness.

MVP

In the MVP stage, they wanted to find out how easy or hard it was to migrate to Vitess. They needed to verify that Vitess works for them, which they could quickly find out by running their database tests. Along the way, they were able to fix all the issues that the migration caused, which included talking with the Vitess community to find solutions.

Load Testing

In the Load Testing stage, they needed to make sure that this new solution was able to scale successfully and handle their workload (again, especially on launch days). This could be done by checking if shards scaled successfully and that there were no performance bottlenecks during the load testing. The developers were able to confirm that scaling was done as how they expected it to scale.

Production Readiness

In the Production Readiness stage, they had to prove that their solution worked and that their previous stage turned out as a success. However, they wanted more confirmation about how Vitess works as they needed to be sure that it was the right solution. After some chaos testing and implementing other Vitess features, they confirmed that the scaling solution was stable.

One of their most important changes was the move from application sharding to a single endpoint. Where there previously was a separate sharding configuration in the application configuration, they now moved to a single database endpoint.

They concluded the talk by stating that Vitess does work, and right now there are approximately 60 Vitess clusters in development and production environments. They are also building an internal team around the company to support Vitess, and it has become the default database solution for any new services that the company will implement.

It was a very long process, but eventually, they were able to successfully adopt Vitess.

Sponsors / Vendors / Projects with stands

At Kubecon, there were a lot of booths where companies and separate projects were able to promote their product, introduce new features, and offer some fun gadgets to the attendees of Kubecon. Some major players in the industry were there, and of course, a lot of people were interested in what they had to offer to make their lives more simple.

AWS

AWS had an impressive presence, offering attendees a wide range of experiences and insights into their latest developments. The AWS stand provided a sneak peek of upcoming features and integrations planned for their EKS portal. The visuals on display highlighted their focus on monitoring and improving the overall user experience.

In addition to the exciting demos, AWS provided a fun and challenging mini-golf game for conference-goers to play. Participants had the opportunity to win AWS-branded pyjama pants, adding a touch of fun and excitement to the event.

AWS didn’t stop at the game, as they also offered an array of other goodies for attendees throughout the conference. Whether it was swag bags or other branded merchandise, AWS had something for everyone.

Finally, AWS brought a team of experts who were available to answer any questions attendees had about EKS. Their knowledgeable team provided attendees with valuable insights and guidance on using the platform effectively.

Conclusion

Kubecon was a fun experience, and we got a load of interesting topics and talks. Amsterdam is beautiful and tourist-friendly, with most of the residents speaking English fluently. The RAI was a huge venue, which meant to go from talk to talk some (power) walking was needed. Some talks were too popular to fit into the rooms that were assigned to them. The live stream was always a backup, but it doesn’t have that same effect, even if you sit around a smartphone screen with ten people to watch a talk. The food and especially the coffee was great this edition. KubeCon | CloudNativeCon EU was a great experience, and we can’t wait until next year’s edition. See you in Paris!

JWorks Tech Blog

KubeCon + CloudNativeCon 2023