Visualizing how code is used across the organization is a vital part of our engineers’ day-to-day workflow - and we have a *lot* of code to search through! This blog post details our journey of adopting Sourcegraph at Yelp to help our engineers maintain and dig through the tens of gigabytes of data in our git repos!


Here at Yelp, we maintain hundreds of internal services and libraries that power our website and mobile apps. Examples include our mission-critical “emoji service” which helps translate and localize emojis, as well as our “homepage service” which… you guessed it, serves our venerable homepage, yelp.com!

Yelp homepage

Yelp homepage

Don’t Break the Website

Imagine you’re a developer tasked with implementing an exciting new feature. Perhaps you need to change the interface of the “getBusinesses” API endpoint to power a dedicated Find Desserts Near Me button on the homepage. “Piece of cake!” you say to yourself, as you add new parameters to alter the response of the shared resource. In order to not break the rest of the website though, you figure it’s best to see where other code is calling this endpoint so you can create a design that works for all use cases and doesn’t break existing call sites.

We have over 100,000 Python files alone to power Yelp - that’s a lot of code to search through! In order to figure out a safe rollout plan, we need to scan through all of our existing code to understand where and how the method is being called across multiple git repositories. So how can we do this?

Combined, our git repositories amount to tens of gigabytes of data. So cloning everything down locally whenever you want to perform a search is not a viable solution. Instead, we do this in the background as a scheduled process on a subset of our development machines, powered by all-repos. Some folks use this workflow, stringing together xargs and git grep, etc. into many homegrown bash scripts. A web interface (historically cgits and opengrok) is generally a more convenient go-to tool for browsing and searching code.

Tools like this are essential to our workflow. And since we’re always on the lookout for ways we can improve the developer experience at Yelp, we want the best-in-class tool for the job!

We first heard about Sourcegraph at a React meetup hosted at Yelp. There was a discussion around how different companies view and search code, and Sourcegraph was introduced as an interesting-looking new search tool. One of the participants pulled up sourcegraph.com to demonstrate its capabilities. We tried a couple of searches using the repo and file regex filters and jumped around the codebase using the Jump to Definition feature. Coming from other tools and homegrown scripts, this was a huge step up in the developer experience! It stood out as a clear win on that front, and we decided to look into it some more and see how we could maybe bring Sourcegraph to Yelp.

We validated the idea to see if it was worth pursuing by first setting it up locally. Sourcegraph is conveniently distributed as a docker image, so we were able to get a proof-of-concept running quickly and share it out with a small group of people. The feedback was positive! After using it regularly for a few weeks, we felt that the code browsing experience had been improved significantly and we pushed on to try and roll it out to the rest of Yelp!

Productionizing Sourcegraph

At Yelp, we run a biannual Hackathon – an opportunity for engineers to “scratch their creative itch” on projects outside of their day-to-day work. It was during one of these Hackathons that we started to productionize Sourcegraph at Yelp - which meant graduating the Sourcegraph instance from running on a local machine to being deployed on our PaaS platform, PaaSTA. By the end of the three days, we had Sourcegraph ready for the whole company to try out.

The feedback was great, and Sourcegraph was well received. We even won an award!

A coveted Hackathon trophy

A coveted Hackathon trophy

Showing off Sourcegraph to Yelpers at the Hackathon “Science Fair”

Showing off Sourcegraph to Yelpers at the Hackathon “Science Fair”

Once Sourcegraph was up and running at Yelp, we had to decide whether we wanted to invest more in the product to get features such as Code Intelligence. To come to this decision, we surveyed developers on how they liked Sourcegraph compared to other code search/viewing tools we were using, and the results were heavily favored towards Sourcegraph. 70% of developers rated Sorcegraph as very good, and 51% percent of developers were already using Sourcegraph exclusively as their preferred code analysis tool. As a result of this feedback, we decided to make Sourcegraph the singular supported tool at Yelp for code search and viewing!

Shipping Code Faster with Sourcegraph

Sourcegraph empowers developers at Yelp to ship code faster and more reliably than ever before. Code intelligence features such as Go-to-Definition and Find References are heavily-used features that enable developers to understand the plethora of microservices and libraries in our code base. When making large changes, Sourcegraph is the way to discover how your code is being called throughout the rest of the code base. Sourcegraph has also been helpful for onboarding new hires and introducing them to the code base.

Sourcegraph has proven to be one of the most useful tools for making mass code migrations and deprecations. A quick search can help scope out the magnitude of the change and the difficulty of implementing it, while also providing an easy way to track the progress of long-running migrations and deprecations.

Sourcegraph’s GraphQL API has also proved to be useful for tooling we have built in-house. Developers at Yelp have used the Sourcegraph API to power services such as our internal npm registry and flaky test analysis engine, both of which heavily utilize source control metadata.

Daily active users of Sourcegraph at Yelp

Daily active users of Sourcegraph at Yelp

Future Work

We are evaluating running Sourcegraph as a clustered deployment. While we are currently able to serve all Sourcegraph usage on a single host, we are looking into running all of Sourcegraph’s different services individually. This would allow us to scale up more resource-intensive instances of Sourcegraph’s services. We are planning to put it on Kubernetes, an initiative that is underway for a lot of Yelp’s infrastructure.

Written By

  • Mark Larah, Software Engineer (@mark_larah)
  • Dennis Coldwell, Engineering Manager
  • Kevin Chen, Software Engineer

Become an Engineer at Yelp

We work on a lot of cool projects at Yelp, if you're interested apply!

View Job

Back to blog