Verify Pagerduty reaches On-Call by Cron

We had a few incidents were on-call devs missed their calls because of various spam-blocking setups or “do not disturb” settings.
We now run a small service that test-notifies everyone once a month to make sure notifications go through. Notifications go out shortly before their ‘do not disturb’ stops so we do not wake them in the middle of the night, but still have a realistic situation.
Our setup has more logging/stats etc, but it goes something like this:

# configure user schedule
require 'yaml'
users = YAML.load <<~YAML
- name: "John Doe"
  id: ABCD
#  cron: "* * * * * America/Los_Angeles" # every minute ... for local testing
  cron: "55 6 * * 2#1 America/Los_Angeles" # every first Tuesday of the month at 6:55am
# ... more users here
YAML

# code to notify users
require 'json'
require 'faraday'
def create_test_incident(user)
  connection = Faraday.new
  response = nil
  2.times do
    response = connection.post do |req|
      req.url "https://api.pagerduty.com/incidents"
      req.headers['Content-Type'] = 'application/json'
      req.headers['Accept'] = 'application/vnd.pagerduty+json;version=2'
      req.headers['From'] = 'realusers@email.com' # incident owner 
      req.headers['Authorization'] = "Token token=#{ENV.fetch("PAGERDUTY_TOKEN")}"
      req.body = {
        incident: {
          type: "incident",
          title: "Pagerduty Tester: Incident for #{user.fetch("name")}, press resolve",
          service: {
            id: ENV.fetch("SERVICE_ID"),
            type: "service_reference"
          },
          assignments: [{
            assignee: {
              id: user.fetch("id"),
              type: "user_reference"
            }
          }]
        }
      }.to_json
    end
    if response.status == 429 # pagerduty rate-limits to 6 incidents/min/service
      sleep 60
      next
    end
    raise "Request failed #{response.status} -- #{response.body}" if response.status >= 300
  end
  JSON.parse(response.body).fetch("incident").fetch("id")
end

# run on a schedule (no threading / forking)
require 'serial_scheduler'
require 'fugit'
scheduler = SerialScheduler.new
users.each do |user|
  scheduler.add("Notify #{user.fetch("name")}", cron: user.fetch("cron"), timeout: 10) do
    user_id = user.fetch("id")
    incident_id = PagerdutyTester.create_test_incident(user)
    puts "Created incident for #{user_id} https://#{ENV.fetch('SUBDOMAIN')}.pagerduty.com/incidents/#{incident_id}"
  rescue StandardError => e
    puts "Creating incident for #{user_id} failed #{e}"
  end
end
scheduler.run

Leave a comment