Wednesday, August 31, 2022
We could have skipped writing a program
A bit of background—“Project: Sippy-Cup” uses data from a single column from a database to do its job. It doesn't query the database directly since we have a tight deadline, so there's a custom binary file that contains around 100,000,000 records, each record having a unique key and a 32-bit value. It doesn't matter what the key or the value is, just that this file exists. So, with that out of the way …
I was at lunch today with some fellow cow-orkers. Talk turned towards a QA engineer who was tasked by my friend TS (a senior QA engineer) to write a program to scan the data file used by “Project: Sippy-Cup” and count each unique value. I had written such a program in Lua (which worked by directly reading the binary file itself—easy enough since “Project: Sippy-Cup” is in Lua and has to read the binary file). TS wrote one in Python to do the work from a text dump of the binary file. The text output is just:
unique-key-1 = value unique-key-2 = value
It's not hard to parse, it's just that the text dump is 100,000,000 lines long.
The QA engineer in question couldn't get his program to work.
It was only after lunch did I realize that none of us had to write a program. No, all it would have taken was running:
GenericUnixPrompt> dump-proprietary-data -s Project-Sippy-Cup.data \ | awk '{print $3}' \ | sort \ | uniq -c \ | sort -rn \ > /tmp/report.out
Sigh.