rank grep result by entries' timestamp - linux

I would like to rank log entries by the timestamp of each entry.
let's say my grep result is like this, with each entry having different number of fields and time on different number of columns:
a, 3, time:123
b, time:124, 4
c, time:122, 5
how should I pipe the result such that it looks like this?
c, time:122, 5
a, 3, time:123
b, time:124, 4

Would you try the following:
while IFS= read -r line; do
[[ $line =~ time:([0-9]+) ]] && printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$line"
done < file | sort -n | cut -f 2-
It first extracts the time after the time: substring.
Then it prepends the time before the line using a tab as a delimiter.
It numerically sorts the lines.
Finally it cuts off the 1st field.

A general solution is:
for each line:
detect log format
extract timestamp column based on detected format
convert timestamp into sortable-form
print sortable-form + column delimiter + original line
pipe output of previous stage into something that sorts on the new first column
pipe output of previous stage into something that strips off the new first column

Related

Linux SHELL script, read each row for different number of columns

I have file and for example values in it:
1 value1.1 value1.2
2 value2.1
3 value3.1 value3.2 value3.3
I need to read values using the shell script from it but number of columns in each row is different!!!
I know that if for example I want to read second column I will do it like this (for row number as input parameter)
$ awk -v key=1 '$1 == key { print $2 }' input.txt
value1.1
But as I mentioned number of columns is different for each row.
How to make this read dynamic?
For example:
if input parameter is 1 it means I should read columns from the first row so output should be
value1.1 value1.2
if input parameter is 2 it means I should read columns from the second row so output should be
value2.1
if input parameter is 3 it means I should read columns from the third row so output should be
value3.1 value3.2 value3.2
Th point is that number of columns is not static and I should read columns from that specific row until the end of the row.
Thank you
Then you can simply say:
awk -v key=1 'NR==key' input.txt
UPDATED
If you want to process with the column data, there will be several ways.
With awk you can say something like:
awk -v key=3 'NR==key {
for (i=1; i<=NF; i++)
printf "column %d = %s\n", i, $i
}' input.txt
which outputs:
column 1 = value3.1
column 2 = value3.2
column 3 = value3.2
In awk you can access each column value by $1, $2, $3 directly or by $i indirectly where variable i holds either of 1, 2, 3.
If you prefer going with bash, try something like:
line=$(awk -v key=3 'NR==key' input.txt)
set -- $line # split into columns
for ((i=1; i<=$#; i++)); do
echo column $i = ${!i}
done
which outputs the same results.
In bash the indirect access is a little bit complex and you need to say ${!i} where i is a variable name.
Hope this helps.

Linux split a file in two columns

I have the following file that contains 2 columns :
A:B:IP:80 apples
C:D:IP2:82 oranges
E:F:IP3:84 grapes
How is possible to split the file in 2 other files, each column in a file like this:
File1
A:B:IP:80
C:D:IP2:82
E:F:IP3:84
File2
apples
oranges
grapes
Try:
awk '{print $1>"file1"; print $2>"file2"}' file
After runningl that command, we can verify that the desired files have been created:
$ cat file1
A:B:IP:80
C:D:IP2:82
E:F:IP3:84
And:
$ cat file2
apples
oranges
grapes
How it works
print $1>"file1"
This tells awk to write the first column to file1.
print $2>"file2"
This tells awk to write the second column to file2.
Perl 1-liner using (abusing) the fact that print goes to STDOUT, i.e. file descriptor 1, and warn goes to STDERR, i.e. file descriptor 2:
# perl -n means loop over the lines of input automatically
# perl -e means execute the following code
# chomp means remove the trailing newline from the expression
perl -ne 'chomp(my #cols = split /\s+/); # Split each line on whitespace
print $cols[0] . "\n";
warn $cols[1] . "\n"' <input 1>col1 2>col2
You could, of course, just use cut -b with the appropriate columns, but then you would need to read the file twice.
Here's an awk solution that'll work with any number of columns:
awk '{for(n=1;n<=NF;n++)print $n>"File"n}' input.txt
This steps through each field on the line and prints the field to a different output file based on the column number.
Note that blank fields -- or rather, lines with fewer fields than other lines, will cause line numbers to mismatch. That is, if your input is:
A 1
B
C 3
Then File2 will contain:
1
3
If this is a concern, mention it in an update to your question.
You could of course do this in bash alone, in a number of ways. Here's one:
while read -r line; do
a=($line)
for m in "${!a[#]}"; do
printf '%s\n' "${a[$m]}" >> File$((m+1))
done
done < input.txt
This reads each line of input into $line, then word-splits $line into values in the $a[] array. It then steps through that array, printing each item to the appropriate file, named for the index of the array (plus one, since bash arrays start at zero).

Uniqing a delimited file based on a subset of fields

I have data such as below:
1493992429103289,207.55,207.5
1493992429103559,207.55,207.5
1493992429104353,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
Due to the nature of the last two columns, their values change throughout the day and their values are repeated regularly. By grouping the way outlined in my desired output (below), I am able to view each time there was a change in their values (with the enoch time in the first column). Is there a way to achieve the desired output shown below:
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
So I consolidate the data by the second two columns. However, the consolidation is not completely unique (as can be seen by 207.55, 207.5 being repeated)
I have tried:
uniq -f 1
However the output gives only the first line and does not go on through the list
The awk solution below does not allow the occurrence which happened previously to be outputted again and so gives the output (below the awk code):
awk '!x[$2 $3]++'
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
I do not wish to sort the data by the second two columns. However, since the first is epoch time, it may be sorted by the first column.
You can't set delimiters with uniq, it has to be white space. With the help of tr you can
tr ',' ' ' <file | uniq -f1 | tr ' ' ','
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
You can use an Awk statement as below,
awk 'BEGIN{FS=OFS=","} s != $2 && t != $3 {print} {s=$2;t=$3}' file
which produces the output as you need.
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
The idea is to store the second and third column values in variables s and t respectively and print the line contents only if the current line is unique.
I found an answer which is not as elegant as Inian but satisfies my purpose.
Since my first column is always enoch time in microseconds and does not increase or decrease in characters, I can use the following uniq command:
uniq -s 17
You can try to manually (with a loop) compare current line with previous line.
previous_line=""
# start at first line
i=1
# suppress first column, that don't need to compare
sed 's#^[0-9][0-9]*,##' ./data_file > ./transform_data_file
# for all line within file without first column
for current_line in $(cat ./transform_data_file)
do
# if previous record line are same than current line
if [ "x$prev_line" == "x$current_line" ]
then
# record line number to supress after
echo $i >> ./line_to_be_suppress
fi
# record current line as previous line
prev_line=$current_line
# increment current number line
i=$(( i + 1 ))
done
# suppress lines
for line_to_suppress in $(tac ./line_to_be_suppress) ; do sed -i $line_to_suppress'd' ./data_file ; done
rm line_to_be_suppress
rm transform_data_file
Since your first field seems to have a fixed length of 18 characters (including the , delimiter), you could use the -s option of uniq, which would be more optimal for larger files:
uniq -s 18 file
Gives this output:
1493992429103289,207.55,207.5
1493992429104491,207.6,207.55
1493992429110551,207.55,207.5
From man uniq:
-f num
Ignore the first num fields in each input line when doing comparisons.
A field is a string of non-blank characters separated from adjacent fields by blanks.
Field numbers are one based, i.e., the first field is field one.
-s chars
Ignore the first chars characters in each input line when doing comparisons.
If specified in conjunction with the -f option, the first chars characters after
the first num fields will be ignored. Character numbers are one based,
i.e., the first character is character one.

Bash- sum values from an array in one line

I have this array:
array=(1 2 3 4 4 3 4 3)
I can get the largest number with:
echo "num: $(printf "%d\n" ${array[#]} | sort -nr | head -n 1)"
#outputs 4
But i want to get all 4's add sum them up, meaning I want it to output 12 (there are 3 occurrences of 4) instead. any ideas?
dc <<<"$(printf '%d\n' "${array[#]}" | sort -n | uniq -c | tail -n 1) * p"
sort to get max value at end
uniq -c to get only unique values, with a count of how many times they appear
tail to get only the last line (with the max value and its count)
dc to multiply the value by the count
I picked dc for the multiplication step because it's RPN, so you don't have to split up the uniq -c output and insert anything in the middle of it - just add stuff to the end.
Using awk:
$ printf "%d\n" "${array[#]}" | sort -nr | awk 'NR>1 && p!=$0{print x;exit;}{x+=$0;p=$0;}'
12
Using sort, the numbers are sorted(-n) in reverse(-r) order, and the awk keeps summing the numbers till it finds a number which is different from the previous one.
You can do this with awk:
awk -v RS=" " '{sum[$0]+=$0; if($0>max) max=$0} END{print sum[max]}' <<<"${array[#]}"
Setting RS (record separator) to space allows you to read your array entries as separate records.
sum[$0]+=$0; means sum is a map of cumulative sums for each input value; if($0>max) max=$0 calculates the max number seen so far; END{print sum[max]} prints the sum for the larges number seen at the end.
<<<"${array[#]}" is a here-document that allows you to feed a string (in this case all elements of the array) as stdin into awk.
This way there is no piping or looping involved - a single command does all the work.
Using only bash:
echo $((${array// /+}))
Replace all spaces with plus, and evaluate using double-parentheses expression.

Separate comma delimited cells to new rows with shell script

I have a table with comma delimited columns and I want to separate the comma delimited values in my specified column to new rows. For example, the given table is
Name Start Name2
A 1,2 X,a
B 5 Y,b
C 6,7,8 Z,c
And I need to separate the comma delimited values in column 2 to get the table below
Name Start Name2
A 1 X,a
A 2 X,a
B 5 Y,b
C 6 Z,c
C 7 Z,c
C 8 Z,c
I am wondering if there is any solution with shell script, so that I can create a workflow pipe.
Note: the original table may contain more than 3 columns.
Assuming the format of your input and output does not change:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $NF; print $1, $3, $NF}' input_file
Input:
input_file:
A 1,2 X
B 5,6 Y
Output:
A 1 X
A 2 X
B 5 Y
B 6 Y
Explanation:
awk: invoke awk, a tool for manipulating lines (records) and fields
'...': content enclosed by single-quotes are supplied to awk as instructions
'BEGIN{FS="[ ,]"}: before reading any lines, tell awk to use both space and comma as delimiters; FS stands for Field Separator.
{print $1, $2, $NF; print $1, $3, $NF}: For each input line read, print the 1st, 2nd and last field on one line, and then print the 1st, 3rd, and last field on the next line. NF stands for Number of Fields, so $NF is the last field.
input_file: supply the name of the input file to awk as an argument.
In response to updated input format:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $4","$5; print $1, $3, $4","$5}' input_file
After Runner's modification of the original question another approach might look like this:
#!/bin/sh
# Usage $0 <file> <column>
#
FILE="${1}"
COL="${2}"
# tokens separated by linebreaks
IFS="
"
for LINE in `cat ${FILE}`; do
# get number of columns
COLS="`echo ${LINE} | awk '{print NF}'`"
# get actual field by COL, this contains the keys to be splitted into individual lines
# replace comma with newline to "reuse" newline field separator in IFS
KEYS="`echo ${LINE} | cut -d' ' -f${COL}-${COL} | tr ',' '\n'`"
COLB=$(( ${COL} - 1 ))
COLA=$(( ${COL} + 1 ))
# get text from columns before and after actual field
if [ ${COLB} -gt 0 ]; then
BEFORE="`echo ${LINE} | cut -d' ' -f1-${COLB}` "
else
BEFORE=""
fi
AFTER=" `echo ${LINE} | cut -d' ' -f${COLA}-`"
# echo "-A: $COLA ($AFTER) | B: $COLB ($BEFORE)-"
# iterate keys and re-build original line
for KEY in ${KEYS}; do
echo "${BEFORE}${KEY}${AFTER}"
done
done
With this shell file you might do what you want. This will split column 2 into multiple lines.
./script.sh input.txt 2
If you'd like to pass inputs though standard input using pipes (e.g. to split multiple columns in one go) you could change the 6. line to:
if [ "${1}" == "-" ]; then
FILE="/dev/stdin"
else
FILE="${1}"
fi
And run it this way:
./script.sh input.txt 1 | ./script.sh - 2 | ./script.sh - 3
Note that cut is very sensitiv about the field separators. Soif the line starts with a space character, column 1 would be "" (empty). If the fields were separated by amixture of spaces and tabs this script would have other issues too. In this case (as explained above) filtering the input resource (so that fields are only separated by one space character) should do it. If this is not possible or the data in each column contains space characters too, the script might get more complicated.

Resources