Thursday, November 29, 2018

Summing numbers at the end of lines in a text file

I wrote this one liner to sum the counts of various attacks that were being blocked and logged by the firewall during an automated scan.

Let's say your log looks something like this.

Contents of log.txt

Microsoft Windows win.ini Access Attempt Detected 30851 vulnerability 782 
HTTP Cross Site Scripting Attempt 32658 vulnerability 288
Generic HTTP Cross Site Scripting Attempt 31475 vulnerability 94
HTTP /etc/passwd Access Attempt 35107 vulnerability 82
HTTP SQL Injection Attempt 30514 vulnerability 52
PHP CGI Query String Parameter Handling Information Disclosure Vulnerability 34804 vulnerability 28
Generic HTTP Cross Site Scripting Attempt 31476 vulnerability 24
Apache Tomcat URIencoding Directory Traversal Vulnerability 35298 vulnerability 13
Export RSA cipher suite detected 37493 vulnerability 11
HTTP SQL Injection Attempt 33338 vulnerability 10
Squid HTTP Header Parsing Assertion Failure Denial of Service Vulnerability 39682 vulnerability 10
Oracle 9i Application Server Dynamic Monitoring Services Anonymous Access 33756 vulnerability 8
HTTP SQL Injection Attempt 35823 vulnerability 6
PHP-Charts PHP Code Execution Vulnerability 37008 vulnerability 6
Microsoft Internet Explorer Cached Objects Zone Bypass Vulnerability 33813 vulnerability 4
Advantech Studio NTWebServer Arbitrary File Access Vulnerability 35784 vulnerability 2
Generic HTTP Cross Site Scripting Attempt 30847 vulnerability 2
Microsoft IIS ServerVariables_JScript. asp Information Disclosure 33073 vulnerability 2
Microsoft IIS 5.0 Form_JScript.asp XSS Vulnerability 32775 vulnerability 2
Joomla HTTP User Agent Object Injection Vulnerability 38519 vulnerability 1
OpenSSL Status Extension Memory Leak Denial of Service Vulnerability 39926 vulnerability 1

The five digit numbers before the word "vulnerability" are an ID and the digits at the end of each line are the counts of how many were blocked. We could sit here with a calculator and add all the digits at the end together, but the below one liner will do it for you.

grep -oP '\d{1,4}$' log.txt | xargs | tr ' ' + | bc

When run, it will output: 1428

Let's break down the one liner to understand what each part is doing.

grep -oP

  • -o tells grep to only output the matched text, not the whole line. 
  • -P tells grep we want to use PCRE regexp vs the default POSIX regexp.


  • \d tells grep we're looking for a digit.
  • {1,4} tells grep we're looking for a number that will be between one and four digits.
  • $ tells grep we're looking for these one to four digit numbers to be at the end of the line.
Technical note: Just in case someone has an issue with how I phrased the above, what we're really telling grep is to look for between one and four digits sequentially.

log.txt is the name of the log file

| means we're piping the output of the previous command, grep in this case, into another command

xargs is going to take all the one to four digit numbers found and concatenate them into a string like this:
782 288 94 82 52 28 24 13 11 10 10 8 6 6 4 2 2 2 2 1 1

tr ' ' + will replace all the spaces between the numbers with the plus sign like this:

bc is a command line calculator that will evaluate the string provided by tr above and produce a sum. 1428 in this case.

If you needed to subtract instead of add, you'd just change the '+' in the tr command to '-'. Or if your counts could be a five digit number just change '{1,4}' to '{1,5}'. Or let's say your counts will always at least be three digits, but no more than five, you'd change it to '{3,5}'