2021-07-21
|~3 min read
|404 words
Continuing my exploration of slicing and dicing data from the command line, today I want to write about awk
. In many ways awk
is similar to grep
, however, it seems more ergonomic for certain types of conditional filtering to me.
For example, one of the questions I asked in Combining sed, cut, and xargs To Clean Data was “What is the list of primary keys where the attribute is true?”
As a reminder, I was dealing with an example.csv
where the data looked like this:1
"pk","fk","hasAttribute","_count"
"69870","80576","true","12"
"142927","80776","true","6"
"98738","57138","false","4496"
"98828","57138","true","200"
awk
Fundamentally, the way awk
works is that it will break a file up into space delimited records (i.e., a row).
Using the -F
flag, we can change the delimiter however:
awk -F, # comma delimited
awk -F\; # semicolon delimited
awk -F\| # pipe delimited
Once you have the delimiters in place, awk will separate a record by variable placeholders (similar to SQL). $0
is the full row, while $1
, $2
, etc. are each individual attribute.
So, in our example, for the first record:
$0
is "pk","fk","hasAttribute","_count"
$1
is "pk"
$2
is "fk"
$3
is "hasAttribute"
$4
is "_count"
awk
Now that we have the basics down, let’s use awk
to filter our file to only show the foreign keys where the attribute is true.
% awk -F, '$3~/true/' example.csv
"69870","80576","true","12"
"142927","80776","true","6"
"98828","57138","true","200"
But that’s not just the column we care about - to do that, we’d pass in $2
to the action print
:
% awk -F, '$3~/true/ {print $2}' example.csv
"80576"
"80776"
"57138"
With this list in hand, we can return to what we learned about using sed
and xargs
to create a single list of comma separated keys. (We don’t need cut
because awk
handled that with the print
action):
% awk -F, '$3~/true/ {print $2}' example.csv |
sed 's/"//g' |
xargs echo |
sed 's/ /,/g'
80576,80776,57138
Et voila! We have our filtered list!
grep
, but that’s a feature of the fact that there’s only one boolean flag in a row. If the data were more complicated, awk
’s column specificity would prove invaluable.Hi there and thanks for reading! My name's Stephen. I live in Chicago with my wife, Kate, and dog, Finn. Want more? See about and get in touch!