Extract Data From Columns of Text with awk in Bash

Cameron Nokes
InstructorCameron Nokes
Share this video with your friends

Social Share Links

Send Tweet
Published 4 years ago
Updated 3 years ago

awk is a specialized tool and programming language for querying and transforming text. In this lesson, we’ll scratch the surface of the possibilities it provides by leveraging it in one of its most common use cases: extracting data from columns of text. Awk makes it easy to select and print just the columns and rows that we want from commands, like ps, that return a table of data.

Cameroon Nokes: [0:00] Do ps aux, pipe that to less so we can see the column headers at the top. We have the USER, the process ID, the CPU usage percentage, and so on. Let's say I want to print the CPU usage percentage column, which is the third column. Let's see how we do that.

[0:18] We run the command again, we'll pipe it to awk. Then here, we'll do opening and closing braces and we're going to do print, and then, it was the third column. What I did it here, is I invoke awk, and in the single quotes here, this is my awk script. I put it in single quotes to prevent Bash from interpreting the awk script.

[0:39] Awk automatically splits out the columns for you and assigns them to a variable based on their order, so to get the third column, I use $3. From here, I'll pipe this to head so I don't get too much data. Cool. We see that worked.

[0:53] The basic syntax of awk is this. We have the awk command that we invoke, optionally any flags here, which we don't have. Then, in single quotes, we have the awk script. In there we can optionally have a condition, which filters down the information that we process and is handed off to the command statements, which then operate on the data selected by the condition.

[1:15] Let's look at some conditions we can use. Let's say we want to filter to processes that have CPU usage greater than 2 percent, so we run our command again, column three. Let's do greater than 2 percent, and then I'll print. Cool. That looks like that's working.

[1:33] Let's print the process name column as well, which was column 11. I can do comma, column 11 there, and then I'll print them out with a space between them. Cool. That looks like that's working.

[1:48] Interestingly, we can see that it doesn't print all of column 11 because there can be spaces in a process name, so it's getting split out. Awk isn't going to be perfect in every case, just something to be aware of.

[1:59] Note that if we want to print the whole row, we can leave out the print statement all together and do the condition. You can see it's working, it's a lot of information. This is the process name column and it's pretty long. Then, here's a new row.

egghead
egghead
~ an hour ago

Member comments are a way for members to communicate, interact, and ask questions about a lesson.

The instructor or someone from the community might respond to your question Here are a few basic guidelines to commenting on egghead.io

Be on-Topic

Comments are for discussing a lesson. If you're having a general issue with the website functionality, please contact us at support@egghead.io.

Avoid meta-discussion

  • This was great!
  • This was horrible!
  • I didn't like this because it didn't match my skill level.
  • +1 It will likely be deleted as spam.

Code Problems?

Should be accompanied by code! Codesandbox or Stackblitz provide a way to share code and discuss it in context

Details and Context

Vague question? Vague answer. Any details and context you can provide will lure more interesting answers!

Markdown supported.
Become a member to join the discussionEnroll Today