Regular expression challenge for awk and grep

liunx · Jun 5, 2008

I've inherited a massive orders archive that I need to extract a total from. Fortunately, I only need a grand total so I don't need to worry about separating products and getting totals per product - just the total number of products appearing. The archive is a giant text file. Each line has shipping address info, date info, a series of products, and a quantity for each product. I need to extract just the quantities. The problem is that the number of products is not consistent among lines, and so the number of quantities on each line is arbitrary - making it difficult (or maybe impossible) to use awk to just print the necessary fields. The one thing that all the quantities do have in common is that they are one digit, preceded by a colon and followed by a less-than (<) sign. (There is never a quantity greater than 9, so it's always 1 digit.) A regular expression that matches them all is /:[0-9]</ How can I create an output that consists of every instance that matches that expression, even when there are multiples per line, with one number per line of output? Ideally, the output would look something like this: 3 2 0 0 1 3 2 .... ad nauseum. If the output includes the : and < for each number, that's fine too since I can easily use awk at that point to strip them out. My first instinct to to try to use grep or awk - but grep will just print the lines (which is exactly what I already have), and awk will require me to know the exact field numbers (or will it?). This is on a Linux box, by the way. Any ideas? Thanks, Wayne 
</div>

Regular expression challenge for awk and grep

liunx

Guest