Advent of Code: Fetching Puzzle Input using the API

When solving Advent of Code problems, my first step is always to access the full puzzle input and copy it into a file on my local system. This doesn't actually take very long, but it's still fun to see how we can automate it! In today's article, we'll write some simple Haskell code to make a network request to find this data.

We'll write a function that can take a particular year and day (like 2022 Day 5), and save the puzzle input for that day to a file that the rest of our code can use.

As a note, there's a complete Advent of Code API that allows you to do much more than access the puzzle input. You can submit your input, view leaderboards, and all kinds of other things. There's an existing Haskell library for all this, written in 2019. But we'll just be writing a small amount of code from scratch in this article, rather than using this library.

Authentication

In order to get the puzzle input for a certain day, you must be authenticated with Advent of Code. This typically means logging in with GitHub or another service. This saves a session cookie in your browser that is sent with every request you make to the site.

Our code needs to access this cookie somehow. It's theoretically possible to do this in an automated way by accessing your browser's data. For this example though, I found it easier to just copy the session token manually and save it as an environment variable. The token doesn't change as long as you don't log out, so you can keep reusing it.

This GitHub issue gives a good explanation with images for how to access this token using a browser like Google Chrome. At a high level, these are the steps:

  1. Log in to Advent of Code and access and puzzle input page (e.g. http://adventofcode.com/2022/day/1/input)
  2. Right click the page and click "inspect"
  3. Navigate to the "Network" tab
  4. Click on any request, and go to the "Headers" tab
  5. Search through the "Request Headers" for a header named cookie.
  6. You should find one value that starts with session=, followed by a long string of hexadecimal characters. Copy the whole value, starting with session= and including all the hex characters until you hit a semicolon.
  7. Save this value as an environment variable on your system using the name AOC_TOKEN.

The rest of the code will assume you have this session token (starting with the string session=) saved as the variable AOC_TOKEN in your environment. So for example, on my Windows Linux subsystem, I have a line like this in my .bashrc:

export AOC_TOKEN="session=12345abc..."

We're now ready to start writing some code!

Caching

Now before we jump into any shenanigans with network code, let's first write a caching function. All this will do is see if a specified file already exists and has data. We don't want to send unnecessary network requests (the puzzle input never changes), so once we have our data locally, we can short circuit our process.

So this function will take our FilePath and just return a boolean value. We first ensure the file exists.

checkFileExistsWithData :: FilePath -> IO Bool
checkFileExistsWithData fp = do
  exists <- doesFileExist fp
  if not exists
    then return False
    ...

As long as the file exists, we'll also ensure that it isn't empty.

checkFileExistsWithData :: FilePath -> IO Bool
checkFileExistsWithData fp = do
  exists <- doesFileExist fp
  if not exists
    then return False
    else do
      size <- getFileSize fp
      return $ size > 0

If there's any data there, we return True. Otherwise, we need to fetch the data from the API!

Setting Up the Function

Before we dive into the specifics of sending a network request, let's specify what our function will do. We'll take 3 inputs for this function:

  1. The problem year (e.g. 2022)
  2. The problem day (1-25)
  3. The file path to store the data

Here's what that type signature looks like:

fetchInputToFile :: (MonadLogger m, MonadThrow m, MonadIO m)
  => Int -- Year
  -> Int -- Day
  -> FilePath -- Destination File
  -> m ()

We'll need MonadIO for reading and writing to files, as well as reading environment variables. Using a MonadLogger allows us to tell the user some helpful information about whether the process worked, and MonadThrow is needed by our network library when parsing the route.

Now let's kick this function off with some setup tasks. We'll first run our caching check, and we'll also look for the session token as an environment variable.

fetchInputToFile :: (MonadLogger m, MonadThrow m, MonadIO m) => Int -> Int -> FilePath -> m ()
fetchInputToFile year day filepath = do
  isCached <- liftIO $ checkFileExistsWithData filepath
  token' <- liftIO $ lookupEnv "AOC_TOKEN"
  case (isCached, token') of
    (True, _) -> logDebugN "Input is cached!"
    (False, Nothing) -> logErrorN "Not cached but didn't find session token!"
    (False, Just token) -> ...

If it's cached, we can just return immediately. The file should already contain our data. If it isn't cached and we don't have a token, we're still forced to "do nothing" but we'll log an error message for the user.

Now we can move on to the network specific tasks.

Making the Network Request

Now let's prepare to actually send our request. We'll do this using the Network.HTTP.Simple library. We'll use four of its functions to create, send, and parse our request.

parseRequest :: MonadThrow m => String -> m Request

addRequestHeader :: HeaderName -> ByteString -> Request -> Request

httpBS :: MonadIO m => Request -> m (Response ByteString)

getResponseBody :: Response a -> a

Here's what these do:

  1. parseRequest generates a base request using the given route string (e.g. http://www.adventofcode.com)
  2. addRequestHeader adds a header to the request. We'll use this for our session token.
  3. httpBS sends the request and gives us a response containing a ByteString.
  4. getResponseBody just pulls the main content out of the Response object.

When using this library for other tasks, you'd probably use httpJSON to translate the response to any object you can parse from JSON. However, the puzzle input pages are luckily just raw data we can write to a file, without having any HTML wrapping or anything like that.

So let's pick our fetchInput function back up where we left off, and start by creating our "base" request. We determine the proper "route" for the request using the year and the day. Then we use parseRequest to make this base request.

fetchInputToFile :: (MonadLogger m, MonadThrow m, MonadIO m) => Int -> Int -> FilePath -> m ()
fetchInputToFile year day filepath = do
  isCached <- liftIO $ checkFileExistsWithData filepath
  token' <- liftIO $ lookupEnv "AOC_TOKEN"
  case (isCached, token') of
    ...
    (False, Just token) -> do
      let route = "https://adventofcode.com/" <> show year <> "/day/" <> show day <> "/input"
      baseRequest <- parseRequest route
      ...

Now we need to modify the request to incorporate the token we fetched from the environment. We add it using the addRequestHeader function with the cookie field. Note we have to pack our token into a ByteString.

import Data.ByteString.Char8 (pack)

fetchInputToFile :: (MonadLogger m, MonadThrow m, MonadIO m) => Int -> Int -> FilePath -> m ()
fetchInputToFile year day filepath = do
  isCached <- liftIO $ checkFileExistsWithData filepath
  token' <- liftIO $ lookupEnv "AOC_TOKEN"
  case (isCached, token') of
    ...
    (False, Just token) -> do
      let route = "https://adventofcode.com/" <> show year <> "/day/" <> show day <> "/input"
      baseRequest <- parseRequest route
      {- Add Request Header -}
      let finalRequest = addRequestHeader "cookie" (pack token) baseRequest 
      ...

Finally, we send the request with httpBS to get its response. We unwrap the response with getResponseBody, and then write that output to a file.

fetchInputToFile :: (MonadLogger m, MonadThrow m, MonadIO m) => Int -> Int -> FilePath -> m ()
fetchInputToFile year day filepath = do
  isCached <- liftIO $ checkFileExistsWithData filepath
  token' <- liftIO $ lookupEnv "AOC_TOKEN"
  case (isCached, token') of
    (True, _) -> logDebugN "Input is cached!"
    (False, Nothing) -> logErrorN "Not cached but didn't find session token!"
    (False, Just token) -> do
      let route = "https://adventofcode.com/" <> show year <> "/day/" <> show day <> "/input"
      baseRequest <- parseRequest route
      let finalRequest = addRequestHeader "cookie" (pack token) baseRequest 
      {- Send request, retrieve body from response -}
      response <- getResponseBody <$> httpBS finalRequest
      {- Write body to the file -}
      liftIO $ Data.ByteString.writeFile filepath response

And now we're done! We can bring this function up in a GHCI session and run it a couple times!

>> import Control.Monad.Logger
>> runStdoutLoggingT (fetchInputToFile 2022 1 "day_1_test.txt")

This results in the puzzle input (for Day 1 of this past year) appearing in the `day_1_test.txt" file in our current directory! We can run the function again and we'll find that it is cached, so no network request is necessary:

>> runStdoutLoggingT (fetchInputToFile 2022 1 "day_1_test.txt")
[Debug] Retrieving input from cache!

Now we've got a neat little function we can use each day to get the input!

Conclusion

To see all this code online, you can read the file on GitHub. This will be the last Advent of Code article for a little while, though I'll be continuing with video walkthrough catch-up on Thursdays. I'm sure I'll come back to it before too long, since there's a lot of depth to explore, especially with the harder problems.

If you're enjoying this content, make sure to subscribe to Monday Morning Haskell! If you sign up for our mailing list, you'll get our monthly newsletter, as well as access to our Subscriber Resources!

Previous
Previous

What's Next in 2023?

Next
Next

Advent of Code: Days 19 & 20 Videos