Fixing Thread-Safety Bugs with Nate Berkopec

Fixing Thread-Safety Bugs with Nate Berkopec

Can you tell if a Ruby gem is really thread-safe or not? And how do you fix a seemingly thread-safety issue that can be something else entirely?

Can you tell if a Ruby gem is really thread-safe or not? And how do you fix a seemingly thread-safety issue that can be something else entirely?

We had no idea. So we asked Nate Berkopec to help us. Nate is an expert in Ruby performance.

The verdict: nuking all shared global mutable state in your Ruby code is a bad idea if you don’t know what you’re doing!

Listen to this episode to learn:

  • How and why faker-ruby became thread-unsafe, especially for Puma users
  • Questions to ask yourself when trying to debug thread-safety issues
  • Shared global mutable state is not always the villain, and is not the source of all thread-safety issues
  • Nate’s “watch-out” list of things that can cause undesired behavior when running multi-threaded Ruby applications: Constants, Class Variables, and Rack Middleware.

Apple Podcasts | Spotify

About Nate

Nate Berkopec runs Speedshop, a software performance company specializing in Ruby. He’s also the maintainer of Puma, the popular Ruby webserver.

Episode Links

Thanks Valentino Stoll, nfstern02, and Gregg P for sponsoring hexdevs!

Enjoy!

Transcript

[00:00:00] Stefanni: Hey friends, Stefanni here. Welcome back to another hexdevs episode. Before we go to the episode, we want to thank Nate Berkopec for joining us on this episode about building tread-safe Ruby Code. As some of you might know, we help out at Faker Ruby, and we had an open issue that was related to thread-safety.

[00:00:28] Stefanni: And we were not quite sure how to fix that bug. So we sent a message to Nate and he kindly accepted the invite. And not only did he help us understand the issue, how to fix it, but he also gave us a lesson on building tread safe ruby code. Something else that we want to share is that this episode, and actually all of our episodes are sponsored by Get to Senior a program that we developed to help experience to Ruby developers take their careers to the next level.

[00:01:05] Stefanni: And we also have our Git Hub sponsor page. We have two sponsor: Valentino Stoll and Gregg P who have been supporting us. So thank you. And if you, dear Listener, wants to sponsor our work, go to hexdevs.com/get-to-senior or to our get GitHub page to see our sponsor page.

[00:01:35] Thiago: This is the hexdevs podcast. I’m Thiago.

[00:01:38] Thiago: And I’m Stefanni.

[00:01:39] Thiago: Today our guest is Nate Berkopec. Nate makes Rails apps go faster. He’s an expert in Ruby on Rails performance. And he runs a company called Speed Shop, a software performance company specializing in Ruby. He’s the maintainer of Puma, one of the most popular Ruby Web servers out there.

[00:02:00] Thiago: And he is also the author of the book, the Complete Guide to Ruby on Rails Performance. Thank you so much for joining us, Nate.

[00:02:06] Nate Berkopec: Hey, thank you very much.

[00:02:08] Stefanni: Yeah, we just gave Nate some homework to do. We sent him an email.

[00:02:16] Nate Berkopec: Yeah. What the heck, man? I thought this was a podcast, not like a class. I had to go take,

[00:02:24] Stefanni: Well, we had a, an issue that was opened on faker not being thread safe. So we didn’t know exactly what to do, how to solve the issue, and we thought it would be a great opportunity to bring Nate, who’s an expert in all of that, to talk more about why faker is not tread safe. And then we could also use this example to understand more how to build thread safe Ruby apps.

[00:02:59] Stefanni: So thank you so much Nate. And sorry to give you homework.

[00:03:05] Stefanni: Yeah. So I would like to get started with a heads up about what caught your attention about the issue, why. Is faker not thread-safe?

[00:03:18] Nate Berkopec: Yeah. So you, you brought me in, uh, and sent me this link to this report. So this was actually after you thought you had fixed a threadsafety issue on faker and so.

[00:03:33] Nate Berkopec: Basically in faker. So if you’re not familiar with faker, faker is like this extremely commonly used gem for, uh, generating fake data, usually in tests. So you can like, ask for last names. First names, like all, it’s like anything you can think of, faker can pretty much generate a random version of it, right?

[00:03:55] Nate Berkopec: Mm-hmm. So, Faker also works with locales. So you can have, you know, Japanese data, English data, whatever. Right? So you need to tell it what locale it, it, it’s gonna generate that data for. So there is a setting and that, uh, locale setting, uh, I think has always been, I don’t think this was changed ever. ‘Faker::conf.locale =’. Okay, so that was how you said it. You said that to a, a symbol, I believe you had said it to, you know, en-GB or, or JA or whatever your, your locale is. So there was a, a PR for two included in 2.23, which changed how this worked. And originally, and this was your PR Tiago. The, the PR that I’m about to describe that change, or is it not?

[00:04:53] Thiago: No, I just approved it.

[00:04:55] Nate Berkopec: Oh, you just approved it. Okay.

[00:04:56] Thiago: I did the review and then I’m pretty sure someone else worked on that. Yeah. Oh yeah, it was someone else, but I gave my thumbs up.

[00:05:04] Nate Berkopec: It’s still your fault. I see. I can, yeah, for sure. Oh, everybody’s fault.

[00:05:09] Nate Berkopec: Blame the reviewer. Um, so the way that this originally worked was faker config. The module had a, uh, locale. Class variable. So you would set the class variable with locale equals, and you would read it with locale, right? So that’s works. That’s fine. You know, it means that all threads and everything, like if you create a new thread, it’s still gonna read that same value, right?

[00:05:41] Nate Berkopec: So that, that all works. But, If more than one thread tries to write that value, we start to have problems. Right? So somebody basically made an issue that was like, I use faker in QA. And in QA we’re sending different requests that have different locales. And so the faker. Data, uh, faker is like setting the locale in one thread and then being that a different thread is reading that locale, which has now been overwritten and we don’t want that to happen.

[00:06:13] Nate Berkopec: So basically the request was that faker should be thread safe. But to to, to rephrase this as more of like a story, the desired behavior is, and you can correct me, this is what I’m a little unclear on, so you can, you can correct me if I’m wrong about this. I think the desired behavior. Is that we should set a default locale, uh, in a, in configuration, like when we start the app.

[00:06:38] Nate Berkopec: And if we don’t say anything else, this will, this will be the default locale, and then threads should be able to, in a thread safe way, change the locale on a per thread basis. Okay, so you could, you know, if this, if this was, if, if this feature was available, you could, in Puma or any other multi-threaded web server, you could set the locale in a maybe rack middleware or something, I don’t know, like in a before act in a around action in Rails, you would change the locale in faker and then do whatever work you wanted to do, you know, run the rest of the action and.

[00:07:16] Nate Berkopec: In an ensure block, you could say, set the locale back to what it was before. And um, that would be how you could change that locale. And then you, then you could have like per request locales in a thread-safe way. That. Is that, is that correct? That sounds, that sounded like the desired behavior to me.

[00:07:32] Thiago: Yeah, that’s the desired behavior. I think we had that before, but it wasn’t tread safe. And then someone wanted to have different threats and not have one overriding the locale on the other tread, and then that was fixed. But then the other person say, Hey, but then you fix my set up now where I run this, I guess in production, I think. Mm-hmm.

[00:07:53] Thiago: When someone else runs it in production. And now if I use faker in production and I set the locale, It only exists like in one of the requests and then the others don’t, don’t see that anymore. And so it’s kind of like one fix broke someone else’s setup, and then we have to fix both things now.

[00:08:13] Nate Berkopec: Right. So the, the, the change that was made to try to implement this story was that, uh, locale is no longer just, uh, an at a class attribute. Uh, class, uh, instance class variable. It is a, uh, it sets thread dot current, uh, open bracket, faker config locale, close bracket. And we’re re so we’re, we’re writing, writing and reading thread dot current. Some thread, well, we thought it was a thread variable, and then you, you know, you learn later in the issue is people read the docs.

[00:08:52] Nate Berkopec: It’s actually fiber local. So that was the change. The instance, the the locale instance, uh, I keep saying instance, it’s not an in class instance variable. It’s a class variable. That class variable locale was removed, completely removed. And there’s some other stuff in here, which I actually haven’t looked at there.

[00:09:08] Nate Berkopec: You have some other like thread dot current use. Like you, you’ve removed basically several class variables replace them with thread dot current is kind of the theme here. Uh, local, like this local setting is kinda the one I’m just going to talk about it cause it’s the one I looked at and it’s, it’s a good example of everything else here.

[00:09:26] Nate Berkopec: So then the bug report comes in and per this person’s like this broke something else for me. So, uh, you set up a Rails app in Puma or other threaded web server, you set faker dot conf, faker config dot locale, and Initializer make a request. And then you get the fake data for the default locale for faker, which is en, so the, the faker config dot local setting broke for all Puma users using more than one thread.

[00:09:58] Nate Berkopec: Um, or actually I think it should be any threadat all. So basically it broke it for Puma completely. So let’s talk about why that happened. Cause I think if you understand why this happened, then like the fix here becomes more obvious. How Puma works. So Puma, when it starts up, it has one process and one thread to when it starts, uh, when it starts itself.

[00:10:23] Nate Berkopec: And depending on the, uh, mode you have it set to, it may or may not start other processes. But the important thing here is that when Puma processes requests, it sends the request into what we call the um, Our thread pool, which act has a different name? Uh, in pool? No, I guess we just call it the thread pool.

[00:10:45] Nate Berkopec: Yeah. So it’s the puma thread pool. And, um, that thread pool has anywhere from one to X number of threads where, you know, you set it to whatever you want. So the application request is always processed in a different thread than the thread that Puma was started with. So we. Start the Puma server. We initialize your application, run all the initializers, and then we create the threads for the thread pool.

[00:11:17] Nate Berkopec: Okay? So that timing is important here because the initializer is run before we create the app, uh, app application threads that, that actually run the, the application requests. So that faker config dot locale is called. Before those, uh, thread pool threads are created, behavior of thread dot current and, uh, sorry, not thread dot current, but the, this is the bracket method.

[00:11:46] Nate Berkopec: On, on a thread instance, you, uh, instance method, uh, uh, on a thread you could say. So this is accessing thread. No fiber local variables as, as, uh, the documentation points out. So, um, in case you don’t know, all threads have fibers. Fibers are lower level concurrency unit than a thread. So, um, all processes have at least one thread, and all threads have at least one fiber in Ruby.

[00:12:14] Nate Berkopec: Okay? So some puma doesn’t do this and, uh, your application might create new threads, but, uh, Create new fibers, I should say. But Puma doesn’t actually create new fibers. So for our purposes in this conversation, it’s, it’s, uh, there are no like a thread, it is a thread, uh, variable because there’s, we don’t have, we don’t have multiple fibers here.

[00:12:37] Nate Berkopec: So, so the thing is, is like when you create these, these fiber local variables with, uh, the bracket method, they’re not inherited. So there’s a really great reproduction in this issue. Um, when you scroll down a bit where someone made like the minimally, the minimally, uh, required reproduction. And all they did was set the fiber, uh, in the current thread, set the, uh, faker locale, then create a new thread inside of that thread, set it again, and then try to read what that was.

[00:13:08] Nate Berkopec: And, uh, it was, it reproduced the bug in 10 lines of code. Really, really, first of all, as an open source maintainer. That’s exactly what we wanna read, right? We wanna read the 10 line reproduction. It’s. So good job to that person that wrote that. So these, these, these variables are not inherited. So like Puma creates the, the, uh, thread pool to run your, like the threads that actually run the app.

[00:13:31] Nate Berkopec: And now this locale fiber variable is no longer set. So it goes back to the default value of, of English. So that was the, that was the source. Of the bug here. So does that all, does that all make sense? I’ve been talking a lot now. I’m, I’m done talking.

[00:13:49] Thiago: Yeah, for sure. That makes a lot of sense.

[00:13:52] Stefanni: Mm-hmm. Yeah. That, that was Matheus, uh, he has been helping a lot with, with this issue, like I am learning.

[00:14:04] Thiago: Yeah. I’m curious to know about, because you mentioned that the fiber local variable is not inherited. Right. The other threads. Mm-hmm. But then if we talk about the class variable, right? If I said that class variable on the initializer right, it would be available to all the other threads. Is that Yes. How it works?

[00:14:29] Nate Berkopec: Yes. Yep. And this is like, uh, this is kind of the, uh, to me, like the, uh, fine grained part of this issue. When it comes to thread safety, it’s just like people are like, okay, we gotta get rid of all shared mutable, state shared mutable state is bad, get rid of it. And that’s like what the PR did, right?

[00:14:51] Nate Berkopec: Like it nuked the shared mutable state. No more class variables don’t do it. But like you actually do want shared mutable state. In this story, like you want to be able to change the locale for all threads at a particular point in time, right? You want to do that during initialization when there aren’t, when, when you know there aren’t multiple threads, probably trying to read this value and do stuff with it.

[00:15:21] Nate Berkopec: Like there’s not, like you, you, you wanna do that during a time when it’s probably safe to do it and then later you want the, uh, private state you want thread private state, right? So it’s kind of like a, uh, a little bit complicated there, where like, you want both things, right? Like you want to be able to override this value for all threads.

[00:15:42] Nate Berkopec: You just wanna be able to do it at a particular time. Um, so the original PR was, uh, thread safe, but just also didn’t work. So you, you, you gained thread safety while breaking the feature. Yeah.

[00:15:56] Thiago: Yeah. It’s interesting that. Maybe threadsafety is not always the, the goal. Depending on your feature, maybe you want to be able to share state between threats and maybe if you want to mutate the global state, then you have to worry about thread safety and how you, you would approach that.

[00:16:17] Thiago: But in this case, you broke the feature by doing that. Right. Which is kind of interesting.

[00:16:23] Nate Berkopec: Yes, and, and sometimes it’s also a little bit complicated here because. You, you also don’t, in this story, right, you don’t actually really care about the thread safety of setting this locale for all threads. Like you could write this in such a way that, like, the class variable is, is that access and, and reading is thread safe.

[00:16:51] Nate Berkopec: So like we could use, um, something from concurrent Ruby, for example. It’s a library, a ruby gem that’s used in Rails. And, um, we could set this up so that it’s thread safe to change the locale for all threads, but you don’t really actually care about that because the only time you’re gonna do that is during app initialization.

[00:17:11] Nate Berkopec: And so, or we could write our own thing with a mutex or whatever to do this, but. You don’t really actually care about the thread safety there, because that happens during initialization where, where there’s only one thread running anyway. So like you could add that to like, you know, check all your thread safety boxes here.

[00:17:25] Nate Berkopec: But what you really wanted was like private thread state. Like you want to be able to change this value on a per thread basis and not have, not have it affect other threats. Okay. So that’s, To me that’s like a little bit different than thread safety. There’s a, there’s a thread safe way to, to implement this that would still not satisfy the story if we just took the original behavior of the locale class variable and made that access, um, thread safe.

[00:17:57] Nate Berkopec: What would happen is, is every, every request would change the locale for all other running requests. So like in the middle of a, a halfway through a rendering a response, the faker locale could be changing because other threads are changing. That locale value that you don’t want that, right? Like you don’t want global state, you want per thread state.

[00:18:18] Nate Berkopec: So like threadsafety is more complicated than just it is thread-safe or not be depending on what, uh, behavior you desire.

[00:18:26] Stefanni: Yeah, and I also think it’s really easy to. Create those bugs because I don’t think most of us are aware of those details.

[00:18:39] Nate Berkopec: Um, I think in general, this doesn’t happen in application development because most of this. It’s just like very uncommon to need to write or need to use class variables, for example. So the common sources of of thread safety issues, generally you don’t reach for these like tools that cause these problems in application development, but it does happen all the time in library development. So the most common causes of thread safety issues are class variables. Global variables, which used to be a thing that people did, but I don’t know, mostly people don’t even reach for them anymore and just use constants. Um, constants and uh, rack middleware. So those three things. Of those three things, really, constants is the only thing I see people making mistakes with in application development.

[00:19:34] Nate Berkopec: But when you’re doing libraries, class variables and in rack middleware, it’s very common to have those, those things. And so you can really. You need to know more about threadsafety when you’re, when you’re writing libraries, I think, than, than writing rails applications, for example. Mm-hmm.

[00:19:49] Thiago: For sure. I’m curious about the cases you mentioned about like constants, that being a problem. Is that because people try to modify, uh, the value of a constant Yes. At front time?

[00:20:03] Nate Berkopec: Yeah, exactly. Like, um, setting a constant to like a collection, like a, an array or. Those accesses are not, uh, thread safe. Even, uh, Samuel Williams, the maintainer of, of uh, uh, Falcon, the web server has, has also had this demo where he’s shown that hash access is not, uh, thread safe in hash.

[00:20:26] Nate Berkopec: Just half hash access is in writing is not thread safe in new, you can even like corrupt the hash, uh, end up with all these crazy behaviors. So yeah, setting constants to collections. And then modifying those collections. So there, there was like a trend, uh, I don’t know, like two years ago, no more than that, I don’t know, four or five years ago to like freeze constants.

[00:20:50] Nate Berkopec: And we used to do this mostly as a memory saving measure. So like when you freeze an object, Ruby internally allows everybody, everyone that uses that constant can like point to the same object. So we used to like just put a lot of things in constants and freeze them. Uh, now there’s like this Rubocop rule that tells you to freeze everything you put into a constant.

[00:21:10] Nate Berkopec: And freezing is nice from a memory usage perspective, but it’s even better from a, from a, uh, thread safety perspective because now you get an error, right? If someone tries to modify the object, the, you can still get these problems though with freezing because you can satisfy the rubocop rule. By calling freeze on an array inside of a constant.

[00:21:32] Nate Berkopec: But if you have a array inside of that array, rubocop won’t say anything. And you still have a threadsafety issue because you’re modifying this unfrozen array inside of the constant. So anything that’s accessible from that constant really can, can lead to a threadsafety issue. So I, I think I see that issue sometimes in app development where people create, um, caches that they want to use. That’s kind of the most common thing is they put a cache inside of a constant and then they end up with a threadsafety issue there. Or the other one I see is, uh, database connections. So if you put a database connection inside of a, a constant, for example, if you just like capital Redis equals redis.new, the issue you’re gonna get there. Everybody accessing that con, accessing that constant is getting the same exact database connection.

[00:22:21] Nate Berkopec: So if you have two threads accessing the same database connection, you can end up with issues where one thread gets the response for another thread. Um, so you don’t wanna do that and uh, there’s like a lot of gems that help you with this.

[00:22:34] Nate Berkopec: But basically you need to set it up so that each thread is getting its own connection out of the, uh, connection pool. Um, so those are the most common ones I see in day-to-day app.

[00:22:44] Thiago: That’s interesting. A good rule of thumb, maybe if you’re working with constants and you’re trying to do something weird or adding some hashes or arrays to the constant, you gotta be careful what you’re doing.

[00:22:59] Thiago: Yeah. Or a database connection. Yeah. Cool. Mm-hmm.

[00:23:04] Stefanni: And, and like since you are talking about those common. Things that you see happening in, in development, um, is there something that we as developers could change how we see? Things when we are implementing them. Like how can we start paying more attention to, to those potential thread, safe, uh, issues?

[00:23:33] Stefanni: Because like you mentioned, it’s not something that we do it every day, for example. Nowadays, I think it’s more common for us to know, oh, this is gonna have a, any plus one query or something like that. So how can we start changing our, our ways of working to. start identifying, those issues. Yeah.

[00:23:53] Nate Berkopec: My biggest, um, recommendation is always to make your test multi-threaded.

[00:24:00] Nate Berkopec: So, uh, if you are using Minit test, um, you can run each test inside of its own thread. It’s called mini test. Oh man. Now I’m gonna forget, uh, let me get this mini test cause. Yeah, parallelize is what it’s called. So, um, if you require mini tests slash parallel on a test, uh, or no, include it. Uh, now I’m, I’m not gonna remember it, but look it up.

[00:24:30] Nate Berkopec: Um, but yeah, you could, you can set up mini tests so it runs each test in a different threat. So that covers your unit tests and makes all your unit tests multi-threading. Um, so if you’re just running minit tests, uh, I suggest turning that. If you’re using RSpec, you’re outta luck. Sorry. Uh, RSpec isn’t multi-threaded, never will be.

[00:24:50] Nate Berkopec: So you’re stuck. Uh, your only option for multi-threaded tests is to convert them to mini-test. Um, so, you know, good luck.

[00:24:59] Stefanni: Oh, I was gonna say, okay. What about RSpec? Yeah.

[00:25:03] Nate Berkopec: Uh, for integration and system tests, so, System tests start a puma server or integration test. Can, you know, you can set up to do whatever you want, right?

[00:25:15] Nate Berkopec: But you should set up your integration test to set up to start a Puma server and run that puma server with multiple threats. So that will also potentially flush out, uh, threading bugs. Now this is also gonna make your, your test suite less stable like you can. Make a threading bug, usually you can’t cause it like a hundred percent of the time, so you’re gonna start getting flakes probably where they’re caused by threads.

[00:25:44] Nate Berkopec: Like we, you can’t just like write a test that’s like always triggers a, the thread bug most of the time. So, um, it will probably make your test less stable, but like, you know, that’s, that’s kind of the price you’re gonna pay here. That to me is like, the best possible thing is make your test multi-threaded.

[00:26:01] Nate Berkopec: So you are actually testing, uh, thread safety. Second thing is like all you can really do is look for those three different sources of, of threading bugs that I talked about. Anytime you’re writing your own rack middleware. So the, the, the threadsafety issue here is that, um, there is only one rack middleware stack for, for an application, right?

[00:26:25] Nate Berkopec: So, The objects that are created in for, to that actually run your rack middleware, there’s only one of those for every application, um, or every, you know, uh, process. So your application runs in different threads, but they’re all using the same rack middleware objects. So if you have an instance variable inside of a rack middleware, you can end up with a thread safety issue.

[00:26:50] Nate Berkopec: The fix is actually really easy. There is a, uh, middleware. Freezer that, uh, uh, Samuel Williams wrote called Rack dash Freeze, and. It basically ensures that your rack middleware are, are thread safe by freezing all of your instance variables. Um, and so you can’t possibly cause a problem and it’ll blow up if you try to do thread unsafe things.

[00:27:18] Nate Berkopec: So take a look at that. For rack middleware, for constants, I think probably everybody should audit constants, created and Initializers, um, for this issue that I talked about. Basically look, make sure you know what you’re putting into constants, uh, is not a collection that’s going to be modified and is not, uh, you know, just a straight up database connection.

[00:27:43] Nate Berkopec: There’s for database connections, there’s a gem called Connection pool. This is, uh, I think still maintained by Mike Perham. Yeah, so Mike wrote it. I think he wrote it originally anyway, but it’s mper. From Sidekiq, M P M P E R H A M slash connection underscore pool. This is like a generic connection pooler that works with any, uh, underlying database connection gem.

[00:28:07] Nate Berkopec: So you can get thread safe connection pools that will work with, um, with threads. So you would assign that to a constant instead of, uh, just like redis.new. And then for class variables, those are a little bit easier because hopefully you can just find the at, at like the, at looking for the @@ is like, you know, control F your code base for that.

[00:28:30] Nate Berkopec: But, um, you and the original, uh, PR actually you had class inherits from self and then it was at locale equals. So like, since you can always do that, it’s kind of hard to like just grep through a code base for class variable. Um, but uh, if you see a file that has a class variable in it, you know, that is shared global mutable states, so you know, it’s only thread unsafe when it, someone tries to write to it from multiple threads.

[00:28:59] Nate Berkopec: So just because these exist doesn’t mean I think that you should be replacing them all the time. Often, like one thing that, uh, happens is. Someone needs to write a value to a, uh, class variable, and then multiple threads want to read that value. If the writer method, if every thread will just write the same thing.

[00:29:26] Nate Berkopec: To the class variable, like initialize, like a default value, and then everybody reads the same value after that. That’s not really a thread safety issue because every thread is trying to write the same value, right? So it doesn’t, it doesn’t matter that they could possibly access the, the, the thread ver uh, variable at the sa, the class variable at the same time, because they’re gonna try to write the same thing.

[00:29:46] Nate Berkopec: So who cares. So it’s like there can be shared global state without there being a thread safety issue, but I think you have to be aware every time you see a class variable or class instance variable that, um, What, think about what, what is trying to write to this and when is it trying to write to this and could there possibly be an issue there?

[00:30:06] Stefanni: Yeah. I like the questions to ask before you go out there and try to replace everything.

[00:30:14] Nate Berkopec: Because, because this is complicated, right? Like Yeah. Especially with class variables. Um, you know, writing the mutex dot synchronized stuff, like you’ve probably never written anything with mutex before. You know, pulling something out of concurrent ruby that you’ve never used before and using like a, a data structure out of concurrent Ruby, like, it’s not the easiest thing in the world.

[00:30:35] Nate Berkopec: So, um, you know, definitely try to avoid it if you can. Um, so yeah, I mean, and people smarter than all of us have done that and then made a mistake anyway, so yeah, it’s, uh, it’s not easy stuff.

[00:30:50] Thiago: Maybe one thing that exacerbates the problem is that we are very used to. Thinking in Ruby, it’s just like, it’s just one tread and you don’t have to use other treads or anything like that cuz compared to maybe other languages, when we say, oh, in Java, be careful with static variables and things like that.

[00:31:11] Thiago: But in Ruby, We don’t talk about that a lot, uh, about concurrency. At least, at least in rails. Like it’s just, you don’t have to worry too much about that request is one thread and you don’t have to worry about those things. Yeah. But then when you run into those weird. Bugs, you’re not sure what to do. You just, you just think, oh, I don’t know what that is.

[00:31:34] Thiago: I don’t, I have no idea why this is happening. But if you try again, you, you’re not gonna have the problem. And so I’m curious about what kind, what kind of things people can do so that when they run into a weird problem, they came, they can point and say, oh, maybe this is a thread safety bug instead of something else.

[00:31:55] Thiago: So maybe like some strategies or some. Some characteristics. Oh, it’s pretty soft books.

[00:32:01] Nate Berkopec: It’s pretty easy. Yeah. Like if, if every time I hear, uh, oh one request, no, the request A got the response for request B. So every time I hear like, oh, someone is getting someone else’s responses. And that’s obviously a security issue, right?

[00:32:22] Nate Berkopec: That’s always, that’s, that’s kind of how this usually comes up is like, oh no, somebody got authorized to someone else. Account because they got someone else’s cookie header, something like that. Right? So anytime I hear, uh, one user A got the response for user B, it’s like thread safety issue immediately.

[00:32:39] Nate Berkopec: So that, that’s probably the most common one at the application level that I, that I hear about the faker issue specifically. I think maybe you, we kind of all knew because it was like, oh, it was this change. Or like someone realized it was only in Puma. I guess that’s the other, yeah, if it, if it, if switching to unicorn fixes the issue, then you know it’s a threadsafety issue.

[00:33:02] Nate Berkopec: Right. So, um, cuz in unicorn you don’t even have, there’s no, um, like for example, so I talked about how Puma starts essentially even in the, in the, in the simplest case, it has to start two threads. It has the thread that starts Puma and boots your app, and then the thread that actually runs the application.

[00:33:21] Nate Berkopec: Right. Technically we’re running your application single-threaded there. Like the actual, uh, every request that comes in to that puma process will always be processed by the same thread. So yes, like it technically is single threaded, but we kind of have this like thread issue, right? With faker even in that scenario because that fiber local variable was not inherited.

[00:33:48] Nate Berkopec: If moving to Puma breaks the fix, breaks the issue and, and getting off of it fixes it, then you know you have a thread safety issue. Um, but yeah, I think generally like any issue where state is kind of correct for one person but not correct for someone else, and it’s flaky and random, uh, then the, your, your thread safety issue, spider sense should be t.

[00:34:17] Thiago: Yeah. It’s not Puma’s fault either. It’s just the way Yeah.

[00:34:21] Nate Berkopec: I mean, it’s your fault threat. Puma’s thread safe. You’re not, so it wasn’t me, man. Yeah,

[00:34:31] Stefanni: yeah. That was, that was a hard one. And we were like, I think we should ask for someone who knows how to fix this issue. Yeah. Cause we, we were not sure and. I think it’s also, it’s something that I want to see more is people say, uh, well, developers saying that they don’t know things right, and they ask for help.

[00:34:55] Stefanni: So I thought this would be a, a good way to, to do that.

[00:35:03] Nate Berkopec: Yeah, and like when I started maintaining Puma, I knew nothing. So like mm-hmm. In 2016 when Evan Phoenix, the original author, like, asked me to start maintaining Puma, like I didn’t know anything about threads, thread safety, or all the other like kind of specific. Things that Puma needs to run. Like, um, knowing about sockets, TCP, UDP, like the f the deep specifics of HTP, um, C extensions.

[00:35:36] Nate Berkopec: Like, I didn’t know any of that when I started maintaining Puma. And, uh, now I, I know a lot more, but um, when I started I didn’t know anything. So like, we all start not knowing anything. So, um, you know, we are, we every one that you ask a question about thread-safety or whatever. At one point, they didn’t know the answer to that either.

[00:35:56] Nate Berkopec: So yeah, I don’t think you should feel, uh, intimidated about asking, uh, asking questions like that.

[00:36:02] Thiago: And it’s also a cool opportunity for contributors. So for example, Matheus who’s. Taking a look at that issue, he said, oh, I don’t know anything about threads but I’ll try to learn something. And then he learn a couple of things and shared.

[00:36:19] Thiago: And so it’s just a nice way to, to learn more because. You don’t really have to know before you get started on an issue. And then eventually if you continue working on that, you, you’re gonna figure it out and then we can have nice conversations about that. It’s kinda cool. Yeah.

[00:36:36] Nate Berkopec: Yep. And I think, um, one thing that, I’ll bring it up again that Matheus did that was just like really important for that was to get the minimal reproducing case.

[00:36:46] Nate Berkopec: So when he had that 10 line example that reproduced the issue. That’s so important for learning because then you have this little experiment that you can, that you can try things on. So you can say, oh, if I change this over here, does that, how does that change the behavior? In my, in my example, um, if you don’t have the minimal reproducible example, it’s much more difficult to to learn because you don’t have a little tool that you can change things on and see what happens.

[00:37:11] Nate Berkopec: So getting to that minimal example was so important, I think for. For where he went and the rest of the issue. So, um, if I have any advice with that is to like do to, to emulate that behavior to, you know, find, try to try to get to the 10 line example that reproduces the problem.

[00:37:29] Stefanni: Yeah, I love that. It’s, it’s a very underrated way to get started.

[00:37:37] Stefanni: We’ve not only contributing to open source, but I think almost anything related to developmental, let’s say. Because you, you get to just try. You’re not trying to fix anything. You’re just trying to find what is going on. And you learn a lot about things

[00:37:53] Nate Berkopec: in Puma, on GitHub, uh, we have a needs repro label, uh, as a needs reproduction, and I put that on any issue where the original poster has not provided a similarly simple example that can just be run and, and uh, and reproduce it.

[00:38:13] Nate Berkopec: Um, in Puma, if you’d like to contribute. Um, that’s one way to do it is you go to the needs repro label and just try to reproduce people’s issues. And I can tell you as a maintainer, it’s also helpful if you can’t reproduce it and you leave a comment and tell us, you’re like, Hey, I looked at this for three hours.

[00:38:31] Nate Berkopec: I couldn’t reproduce it. That is super helpful for me because now I know, okay, someone tried to do this for three hours and they still couldn’t get it. Maybe this is not reproducible, maybe this isn’t actually a problem with Puma, so it’s, it’s really helpful for an open source project to find issues which are not currently don’t have a reproducible case and to try to try to find one.

[00:38:54] Nate Berkopec: So I highly, highly encourage that.

[00:38:56] Stefanni: Yeah, and, and just to emphasize, I don’t think, well, I believe I can say that for you, but correct me if I’m wrong. Mm-hmm. Or not say that if you don’t know how to reproduce, you can’t report the bug, but Oh, yeah. If, right. But if you,

[00:39:13] Nate Berkopec: I mean, you have a bug, right? So Yeah. You should report it.

[00:39:15] Stefanni: Yeah. So everyone can contribute on their ways. Um, yeah. But yeah, reproducing is really great. I think that’s also how we got started with Ruby on Rails, and we actually copied the reproduction script for Faker. Oh. Which is really, really helpful.

[00:39:32] Nate Berkopec: Yeah. If, uh, if anyone listening is not aware of that, there’s a, like the, the Rails bug reproduction script is really, uh, very good and I think those are available if you go to like the Rails contributing guide on Rails guides and then like you kind of go down to the bug report section, you can find the links to all of them and it’s really cool and it’ll show you kind of in 30 lines.

[00:39:54] Nate Berkopec: How they set up a Rails app to reproduce an issue in the most minimal way possible, depending on what part of rails you’re reporting the bug to. And it’s a really good example of how to make a, a minimal reproducible case, not only for a Rails app, but for really any, any Ruby project. Um, so yeah, I’ve done the same thing. I’ve copied that script multiple times.

[00:40:15] Thiago: I guess even at your own job, if you’re not contributing or anything, maybe there’s a way to use those kinds of scripts to reproduce a bug. So you don’t have, um, what is it called again, when you, you don’t want the bug to appear again?

[00:40:31] Stefanni: Ah, regress regression. Yeah.

[00:40:32] Thiago: Regressions. Mm-hmm. You don’t want regressions, so this is really important, so mm-hmm. Add that little test there so you don’t have regressions. Mm-hmm. A cool habit sometimes to have. Yeah, absolutely.

[00:40:45] Stefanni: Well, I think we got to, to the end of it, so we only have 10 minutes left. Is there anything else, Nate, that you would like to share about the issue, either about the issue or about the conversation we were having?

[00:41:00] Nate Berkopec: Uh, nope. Um, I would say that, uh, if someone listening to this is interested in learning, More about working in a multi-threaded environment. Um, I do have a product called Sidekiq in Practice that has a number of live code examples that talk about thread safety and have other, it’s a intended to be a manual, like how to actually scale Sidekiq.

[00:41:28] Nate Berkopec: And, uh, you know, cuz it’s Sidekiq. Uh, there’s a lot of things in there about threading and, and, um, how threads work in Sidekiq, why threads are important. It talks about the global VM lock, which we didn’t really discuss at all today. But, uh, if, if you are looking to learn more about threads and scaling the threaded environment, um, I do, I do sell something to help you with that. So go check it out.

[00:41:51] Stefanni: Absolutely. I, it’s in my reading list. I really need to get it. I, which one should I read first? That one or the rails? The Guide to Rails performance? I don’t know.

[00:42:07] Nate Berkopec: Um, I mean, yeah, it just depends on what, uh, your goal is first. They’re not, they’re not, um, intended like. You don’t have to read one to read the other.

[00:42:18] Nate Berkopec: So if you’re interested in a more general perspective of how do I make a Rails app feel faster? How do I make it more scalable, like that’s the complete guide to Rails performance. If you were specifically having issues with Sidekiq and uh, scaling Sidekiq, I suggest reading that first.

[00:42:35] Thiago: Yeah, that would be a nice episode to talk just about Sidekiq, because there are so many things to talk about. Sidekiq performance. Mm-hmm. We could do that in the future.

[00:42:46] Stefanni: Yeah. Like how to log your workers, your scheduled workers, like how to. Logging and logging and retries jobs. Yeah.

[00:42:57] Nate Berkopec: Yeah. I’ve had a lot of fun. My, my, my current client is, um, Gusto, which is a huge, um, payroll company in the United States, and they’ve got 600 plus engineers working on Sidekiq.

[00:43:10] Nate Berkopec: And it’s been a really interesting experience to me to see kind of how Sidekiq scales like as an organization. So like what, what, what happens when 600 engineers all have queues and workers and like, that’s been a whole new side of Sidekiq that I’ve learned a lot about at, uh, at Gusto.

[00:43:27] Thiago: Yeah, it sounds really exciting work, you know, a lot of problems and, and challenges to solve, which is kinda cool.

[00:43:34] Thiago: Yeah, it’s been really cool

[00:43:35] Stefanni: and yeah, and I think we’re supposed to call jobs now and not workers. I’m still getting used to the new terminology I need to catch up. I think I read something about that.

[00:43:46] Nate Berkopec: Oh, that’s changed. I sh I feel like I should know that when I don’t,

[00:43:48] Stefanni: I, I don’t know. I remember seeing a comment about that, like instead of me workers, you, you like change the folder in Rails or something?

[00:43:57] Stefanni: Yeah, yeah. to jobs. No, I’m not really sure. I have to catch up. I just remember, uh, reading about that and they’re like, oh, I’ll probably need to read this some at some point. Hmm. Um, yeah, so I think we. At the end, and I would like to be respectful of your time, Nate, but thank you so, so much. I learned a lot and I know it was a bit of homework for you, but I hope it was fun.

[00:44:23] Nate Berkopec: It was fun. Yeah, it was fun. It was fun. Nice to talk to you.

[00:44:27] Thiago: Yeah. Yeah. It was, it was a very specific problem with a very specific solution, so it was nice to learn from that. Mm-hmm. I’ve learned a lot, so yeah. Thanks so much for, for sharing your expertise with us today.

[00:44:40] Nate Berkopec: Great. My pleasure

[00:44:42] Stefanni: Everyone, make sure to check out Puma and Nate’s books, the one about Sidekiq and Rails performance.

[00:44:48] Stefanni: We’ll leave the links in the description notes if people want to know what you’re doing or wanna buy your books or your workshops as well. Where should they go?

[00:44:59] Nate Berkopec: Uh, speedshop.co. Uh, this is where I have links to all that stuff.

[00:45:05] Stefanni: Awesome. Cool. Cool. Thank you, Nate. Have a good weekend.

[00:45:10] Nate Berkopec: Thank you.

[00:45:11] Thiago: Thank you so much, Nate, for joining us today.

[00:45:14] Thiago: And if you’ve learned something from this episode, please share with a friend and check out our newsletter at www.hexdevs.com/newsletter. I hope you’ve enjoyed this episode. See you on the next one.