awk max value of column two for dates in column one - linux

I am trying to print only max values of column two for dates in column one.
My file is:
2014-04-09,135303
2014-04-09,416400
2014-04-15,143684
2014-04-15,156011
2014-04-15,184406
2014-04-16,1123083
2014-04-16,167486
2014-04-16,862196
2014-04-17,963023
2014-04-19,583844
Required Output:
2014-04-09,416400
2014-04-15,184406
2014-04-16,1123083
2014-04-17,963023
2014-04-19,583844
I tried sort but not working:
cat file|sort -k2 -r | sort --unique --stable -k1
please suggest how it can be done using awk or sort

kent$ awk -F, '{a[$1]=$2>a[$1]?$2:a[$1]}END{for(x in a)print x "," a[x]}' file
2014-04-15,184406
2014-04-16,1123083
2014-04-17,963023
2014-04-09,416400
2014-04-19,583844
if you want the result ordered by date, pipe the line above to sort:
awk -F, '{a[$1]=$2>a[$1]?$2:a[$1]}END{for(x in a)print x "," a[x]}' f|sort
2014-04-09,416400
2014-04-15,184406
2014-04-16,1123083
2014-04-17,963023
2014-04-19,583844

Related

How to merge column output to the end of a row in the previous column?

I have a .csv file containing three columns and I need to merge the value of column 2 with the end of the row of column 1.
The .csv file contains thousands of rows and this needs to be done for each row.
Iv'e tried using awk but I'm finding it difficult to get the code correct
cat file.csv | awk '{print $1, $2}'
awk '{if ($2!= " ") {print $1+$2 }}'
These of course don't work
Sample input:
The command used to produce the actual output is simply:
cat test.csv
[2,4,5,6,2,34,61,32,34,54,34, 22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23, 34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34, 21] 0.347643
Desired Output:
col1 col2
[2,4,5,6,2,34,61,32,34,54,34,22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23,34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34,21] 0.347643
Replace "comma followed by one or more spaces" with "comma":
sed 's/, \{1,\}/,/' file.csv
sed 's/, */,/g' file.csv
Print columns $1 and $2 as $1 (optionally separate with a tab):
awk '{print $1 $2, $3}' OFS='\t' file.csv
You can try:
awk '{printf("%s%s\t%s\n",$1,$2,$3)}' file.cvs
I only see spaces after a comma when you don't want them.
$: sed -E 's/,\s+/,/' file.csv
[2,4,5,6,2,34,61,32,34,54,34,22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23,34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34,21] 0.347643
Add -i (after the -E) to make it an in-place edit.
$: sed -Ei 's/,\s+/,/' file.csv
$: cat file.csv
[2,4,5,6,2,34,61,32,34,54,34,22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23,34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34,21] 0.347643

find records longer/shorter than a particular col

this is my file: FILEABC.txt
Name|address|age|country
john|london|12|UK
adam|newyork|39|US|X12|123
jake|madrid|45|ESP
ram|delhi
joh|cal|34|US|788
I wanted to find the the header count in the file. so i've this command
cat FILEABC.txt | awk --field-separator='|' '{print NF}' | sort -n |uniq -c
the result i get for this cmd is
cat FILEABC.txt | awk --field-separator='|' '{print NF}' | sort -n |uniq -c
1 2
3 4
1 5
1 6
My requirement is that, how do i find those records that have only 2 fields, 4 fields and so on from my file.
for ex,
if want to see the records having only 2 col:
ram|delhi
if want to see rec's having more than 4 col:
adam|newyork|39|US|X12|123
If you want to only print the records which have 2 fields then following may help you in same.
awk -F"|" 'NF==2' Input_file
For any kind of records if you need a line which has more than 4 fields then change above condition to NF>4 or you need line which have more than 5 fields eg--> NF>5
Explanation: BY doing -F"|" I am making sure field separator is pipe here, then NF is an awk out of the box variable which defines the TOTAL number of fields in a line, so as per your request checking if number of fields are more than 2 here, if this condition is TRUE then print the current line(where I have NOT written print because awk works on method of condition and action, so if condition is TRUE here I am not mentioning any action and by default action print will happen for that line).
Using awk, variable NF gives total number of fields in record/row, by default awk use single space as field separator, if you alter FS, it will calculate NF based on field separator mentioned, so what you can do is
awk -v FS='|' 'NF==2' infile
Which is same as
# Usual Syntax : awk 'condition { action }' infile
awk -v FS='|' 'NF==2{ print }' infile
For more than 4 fields,
awk -v FS='|' 'NF > 4' infile
you can also use grep to filter 2-columed records:
grep '^[^|]*|[^|]*$' FILEABC.txt
It will output:
ram|delhi

Using awk to extract data and count

How do I use awk on a file that looks like this:
abcd Z
efdg Z
aqbs F
edf F
aasd A
I want to extract the number of times each letter of the alphabet occurs in the second column, so output should be:
Z 2
F 2
A 1
try: If you want the order of output same as Input_file then following may help you.
awk 'FNR==NR{A[$2]++;next} A[$2]{print $2,A[$2];delete A[$2]}' Input_file Input_file
if you don't bother of order of $2 then following may help you.
awk '{A[$2]++} END{for(i in A){print i,A[i]}}' Input_file
In first solution reading the Input_file twice and creating an array A whose index is $2 with it's incrementing value. then when second Input_file is being read then printing the $2 and it's count.
In Second solution creating an array A whose index $2 and incrementing value of it. Then in end section go through the array A and print it's index and array A's value.
I would use sort | uniq for this purpose as these two utils are designed specifically for this kind of task:
cat <<END |
abcd Z
efdg Z
aqbs F
edf F
aasd A
END
awk '{print $2}' | sort -r | uniq -c | awk '{printf "%s %d\n", $2, $1}'
Would produce exactly the desired output
Z 2
F 2
A 1
Here awk '{print $2}' is used to get the second column from a document with fields separated by one or more whitespace characters. If we knew the width of the columns is fixed, we could use a faster cut utility instead.
sort -r | uniq -c is doing the main algorithmic part of the task - sort the letters in reverse order and count the number of occurrences of each letter.
awk '{printf "%s %d\n", $2, $1}' does some reformatting of the uniq -c output to match the required format exactly.
Update: AWK has powerful array support so this can be done with awk alone:
cat <<END |
abcd Z
efdg Z
aqbs F
edf F
aasd A
END
awk '{a[$2]++}
END {n=asorti(a,b,"#ind_str_desc");
for (k=1;k<=n;k++) {printf b[k], a[b[k]]} }'
We use the array a that is indexed with letters found in the input stream, and on each line the element indexed by the corresponding letter gets incremented.
In the END clause we reverse the order of indices and output the array.

Linux sort: how to sort numerically but leave empty cells to the end

I have this data to sort. The 1st column is the item ID. The 2nd column is the numerical value. Some items do not have a numerical value.
03875334 -4.27
03860156 -7.27
03830332
19594535 7.87
01542392 -5.74
01481815 11.45
04213946 -10.06
03812865 -8.67
03831625
01552174 -9.28
13540266 -8.27
03927870 -7.25
00968327 -8.09
I want to use the Linux sort command to sort the items numerically in the ascending order of their value, but leave those empty items to the end. So, this is the expected output I want to obtain:
04213946 -10.06
01552174 -9.28
03812865 -8.67
13540266 -8.27
00968327 -8.09
03860156 -7.27
03927870 -7.25
01542392 -5.74
03875334 -4.27
19594535 7.87
01481815 11.45
03830332
03831625
I tried "sort -k2n" and "sort -k2g", but neither yielded the output I want. Any idea?
Here is a simple Schwartzian transform based on the assumption that all actual values are smaller than 123456789.
awk '{ printf "%s\t%s", ($2 || 123456789), $0 }' file |
sort -n | cut -f2- >output
Assuming data is in d.txt and blanks have 4 spaces at the end
egrep " $" d.txt > blanks.txt ; egrep -v " $" d.txt | sort -n -k2 | cat - blanks.txt
This should work:
awk '$2 ~ /[0-9]$/' d.txt | sort -k2g && awk '$2 !~ /[0-9]$/' d.txt

output the 2nd column of a file

given a file with two columns, separatedly by standard white space
a b
c d
f g
h
how do I output the second column
cut -d' ' -f2
awk '{print $2}'
Because the last line of your example data has no first column you'll have to parse it as fixed width columns:
awk 'BEGIN {FIELDWIDTHS = "2 1"} {print $2}'
Use cut with byte offsets:
cut -b 3
Use sed to remove trailing columns:
sed s/..//
cut -c2 listdir
Here you can see for visualization:

Resources