The problems with implicit AWS resources

When a simple change leads to broken deployments

Author's image
Tamás Sallai
4 mins

Logging for services

Some resources in AWS are helpfully created when needed. The prime example for this is CloudWatch Log Groups: when the service, for example, Lambda or AppSync, first want to send a log message it creates the Log Group first. This is possible when the service's execution role has the logs:CreateLogGroup permission, which the AWS-managed policies contain for both services.

This is convenient to start with: add the permission and logging works automatically. Moreover, clicking the button on the Console brings you to the correct group where you can see all the log messages for the function or API.

Unmanaged resources

One downside of this is behavior is defaults: the Log Group is created with no message expiration, meaning all log messages will be kept forever. This is a safe starting point as you'll have the logs when you'll eventually need to investigate something. But usually not all log messages are needed forever and every byte stored incurs a cost. More than that, these log groups are not removed when the function is, so messages are stored forever even when the resource that used it no longer exists.

While the lack of expiration time only affects costs, there are funcionality problems with this as well. Since the Log Group is created when the service first puts a log message, there is a time when it is missing. Adding a metric filter, for example, then fails in this case.

And this can prevent a stack from deploying in some cases. In this article, we'll look into a case where a simple change in a CDK-managed AppSync API breaks new deployments.

Fortunately, Lambda recently got a feature where you can optionally configure the Log Group where the function logs: blog and docs. So there you can simply manage the Log Group with CDK/Terraform and that makes sure that it is created or updated with the correct configuration.

Managing an AppSync API

A simple API resource managed by the CDK:

const api = new aws_appsync.GraphqlApi(this, "Api", {
	name: "test-api",
	definition: {
		schema: aws_appsync.SchemaFile.fromAsset(path.join(__dirname, "schema.graphql")),
	},
	authorizationConfig: {
		defaultAuthorization: {
			authorizationType: aws_appsync.AuthorizationType.IAM,
		}
	},
	logConfig: {
		fieldLogLevel: "ALL",
	},
});

When deployed, it creates the API:

$ aws appsync list-graphql-apis | more
{
	"graphqlApis": [
		{
			"name": "test-api",
			"apiId": "mk4geyw2kna4pkhph7zcwlqvv4",
			"authenticationType": "AWS_IAM",
			"logConfig": {
				"fieldLogLevel": "ALL",
				"cloudWatchLogsRoleArn": "arn:aws:iam::278868411450:role/DeleterCustomResourceStack-ApiApiLogsRole90293F72-UYoU4ARI8uF1",
				"excludeVerboseContent": false
			},
			...
		}
	]
}

After the first request, a Log Group is created:

$ aws logs describe-log-groups
{
	"logGroups": [
		{
			"logGroupName": "/aws/appsync/apis/mk4geyw2kna4pkhph7zcwlqvv4",
			"creationTime": 1701196713911,
			"metricFilterCount": 0,
			"arn": "...",
			"storedBytes": 0
		}
	]
}

Adding a dependent resource

All good, let's add a metric filter to that:

api.logGroup.addMetricFilter("metric1", {
	filterPattern: {
		logPatternString: "ERROR",
	},
	metricName: "test",
	metricNamespace: "test",
})

Deployment is successful, as expected:

$ aws logs describe-log-groups
{
	"logGroups": [
		{
			"logGroupName": "/aws/appsync/apis/mk4geyw2kna4pkhph7zcwlqvv4",
			"creationTime": 1701196713911,
			"metricFilterCount": 1,
			"arn": "...",
			"storedBytes": 0
		}
	]
}

But this simple change broke all new deployments. Imagine a new developer is joining the team and you want to set up a lab environment. In the new account, you try to deploy the stack but get an error:

$ npm run cdk deploy

...

DeleterCustomResourceStack: deploying... [1/1]
DeleterCustomResourceStack: creating CloudFormation changeset...
[█████████████████████████████·····························] (3/6)

7:44:15 PM | CREATE_FAILED        | AWS::Logs::MetricFilter     | Api/LogGroup/metric1
Resource handler returned message: "The specified log group does not exist. (Service: CloudWatchLogs, Status Code: 400, Request ID: 98ef7913-7388-4ab2-8fab-3b10211861ce)" (
RequestToken: 6bdcba86-1d96-983c-95a3-3e3aabe973f4, HandlerErrorCode: NotFound)
7:44:16 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack  | DeleterCustomResourceStack
The following resource(s) failed to create: [ApiLogGroupmetric1913D2056, ApiSchema510EECD7]. Rollback requested by user.
7:44:16 PM | ROLLBACK_IN_PROGRESS | AWS::CloudFormation::Stack  | DeleterCustomResourceStack
The following resource(s) failed to create: [ApiLogGroupmetric1913D2056, ApiSchema510EECD7]. Rollback requested by user.

This is not unexpected: the Log Group does not exist because there were no requests sent to the API. This is not a problem with an existing API because the resource was already created. But when the API is just creating it does not work.

Managed Log Group

The solution is to manage the Log Group by the CDK instead of implicitly. That means a resource is needed:

const logs = new aws_logs.LogGroup(this, "AppSyncLogGroup", {
	logGroupName: `/aws/appsync/apis/${api.apiId}`,
	retention: aws_logs.RetentionDays.TWO_WEEKS,
	removalPolicy: RemovalPolicy.DESTROY,
});

To be extra secure, you can also remove the logs:CreateLogGroup permission from AppSync so that even if there is a request to the API somehow during the deployment it won't create the Log Group:

const logsRole = new aws_iam.Role(this, "LogsRole", {
	assumedBy: new aws_iam.ServicePrincipal("appsync.amazonaws.com"),
});
logsRole.addToPolicy(new aws_iam.PolicyStatement({
	effect: aws_iam.Effect.ALLOW,
	resources: ["arn:aws:logs:*:*:*"],
	actions: [
		"logs:CreateLogStream",
		"logs:PutLogEvents",
	],
}));
const api = new aws_appsync.GraphqlApi(this, "Api", {
	// ...
	logConfig: {
		role: logsRole,
		fieldLogLevel: "ALL",
	},
});

This can then be deployed and updated as well.

December 12, 2023
In this article