I have written a shell script to get any column.
The script is:
#!/bin/sh
awk '{print $c}' c=${1:-1}
So that i can call it as
ls -l | column 2
But how do i implement it for multiple columns?
Say, it i want something like :
ls -l | column 2 3
In this case, I wouldn't use awk at all:
columns() { tr -s '[:blank:]' ' ' | cut -d ' ' -f "$#"; }
This uses tr to squeeze all sequences of whitespace to a single space, then cut to extract the fields you're interested in.
ls -l | columns 1,5,9-
Note, you shouldn't parse the output of ls.
awk -v c="3,5" '
BEGIN{ split(c,a,/[^[:digit:]]+/) }
{ for(i=1;i in a;i++) printf "%s%s", (i==1?"":OFS), $(a[i]); print "" }
'
Use any non-digit(s) you like as the column number separator (e.g. comma or space).
ls -l | nawk -v r="3,5" 'BEGIN{split(r,a,",")}{for(i in a)printf $a[i]" ";print "\n"}'
Now you can simply change your r variable in shell script and pass it on.
or you can configure it in your shell script and use r=$your_list_var
What ever fields numbers are present in $your_list_var will be printed by awk command.
The example above print 3rd and 5th fields of ls -l output.
Related
Please how can I count the list of numbers in a file
awk '{for(i=1;i<=NF;i++){if($i>=0 && $i<=25){print $i}}}'
Using the command above I can display the range of numbers on the terminal but if there are so many it will be difficult to count them. Please how can I show the count of the numbers on the terminal for example
1-20,
2-22,
3-23,
4-24,
etc
I know I can use wc but I don't know how to infuse it into the command above
awk '
{ for(i=1;i<=NF;i++) if (0<=$i && $i<=25) cnts[$i]++ }
END { for (n in cnts) print n, cnts[n] }
' file
Pipe the output to sort -n and uniq -c
awk '{for(i=1;i<=NF;i++){if($i>=0 && $i<=25){print $i}}}' filename | sort -n | uniq -c
You need to sort first because uniq requires all the same elements to be consecutive.
While I'm personally an awk fan, you might be glad to learn about grep -o functionality. I'm using grep -o to match all numbers in the file, and then awk can be used to pick all the numbers between 0 and 25 (inclusive). Last, we can use sort and uniq to count the results.
grep -o "[0-9][0-9]*" file | awk ' $1 >= 0 && $1 <= 25 ' | sort -n | uniq -c
Of course, you could do the counting in awk with an associative array as Ed Morton suggests:
egrep -o "\d+" file | awk ' $1 >= 0 && $1 <= 25 ' | awk '{cnt[$1]++} END { for (i in cnt) printf("%s-%s\n", i,cnt[i] ) } '
I modified Ed's code (typically not a good idea - I've been reading his code for years now) to show a modular approach - an awk script for filtering numbers in the range of 0 and 25 and another awk script for counting a list (of anything).
I also provided another subtle difference from my first script with egrep instead of grep.
To be honest, the second awk script generates some unexpected output, but I wanted to share an example of a more general approach. EDIT: I applied Ed's suggestion to correct the unexpected output - it's fine now.
I've created a CSV from shell. Then I need to filter the information by column. I used this command:
$cut -d ';' -f 12,22 big_file.csv
The input looks like:
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;2;000008608008602;AAAAAAAAAAA;0051;;;;;;093505;
ACT;XXXXXX;MCD;881XXXX;881017XXXXXX;ABCD;BMORRR;GEN;88XXXXXXXXXX;00000;01;3;000008608008602;AAAAAAAAAAA;0051;;;;;;085000;anl#mail.com
The output is:
ID CLIENT;email
00000xxxxxxxxx
00000000xxxxxx;anl#mail.com
As you can see, the last column does not appear (note, that the semicolon is missing in the first line). I want this:
ID CLIENT;email
00000xxxxxxxxx;
00000000xxxxxx;anl#mail.com
I have another CSV file with information and it works. I've reviewed the csv and the columns exist.
There doesn't seem to be a way to make cut do this. The next step up in expressivity is awk, which does it easily:
$ cat testfile
one;two;three;four
1;2;3
first;second
only
$ awk -F';' '{ OFS=FS; print $1, $3 }' < testfile
one;three
1;3
first;
only;
$
You don't get the semicolon in the output of your second line, because your second line contains just 21 fields (the first contains 23 fields).
You can check that using:
(cat bigfile.csv | tr -d -c ";\n" ; echo "1234567890123456789012") | cat -n | grep -v -E ";{22}"
This will output all lines from bigfile.txt with less than 22 semicolons along with the corresponding line numbers.
To fix that, you can add a bunch of empty fields at the end of each line and pipe the result to cut like this:
sed -e's|^\(.*\)|\1;;;;;;;;;;;;;;;;;;;;;;;;|g' bigfile.csv | cut -d ';' -f 12,22 | cut -d ';' -f 12,22
The result is:
XXXXXXXXYYY;XXXNNN
XXXXYYYYXXXXX;
I would like to know the count of unique values in column using linux commands. The column has values like below (data is edited from previous ones). I need to ignore .M, .Q and .A at the end and just count the unique number of plants
"series_id":"ELEC.PLANT.CONS_EG_BTU.56855-ALL-ALL.M"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56855-ALL-ALL.Q"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56855-WND-ALL.A"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56868-LFG-ALL.Q"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56868-LFG-ALL.A"
"series_id":"ELEC.PLANT.CONS_EG_BTU.56841-WND-WT.Q"
"series_id":"ELEC.CONS_TOT.COW-GA-2.M"
"series_id":"ELEC.CONS_TOT.COW-GA-94.M"
I've tried this code but I'm not able to avoid those suffix
cat ELEC.txt | grep 'series_id' | cut -d, -f1 | wc -l
For above sample, expected count should be 6 but I get 8
This should do the job:
grep -Po "ELEC.PLANT.*" FILE | cut -d. -f -4 | sort | uniq -c
You first grep for the "ELEC.PLANT." part
remove the .Q,A,M
remove duplicates and count using sort | uniq -c
EDIT:
for the new data it should be only necessary to do the following:
grep -Po "ELEC.*" FILE | cut -d. -f -4 | sort | uniq -c
When you have to do some counting, you can easily do it with awk. Awk is an extremely versatile tool and I strongly recommend you to have a look at it. Maybe start with Awk one-liners explained.
Having that said, you can easily do some conditioned counting here:
What you want, is to count all unique lines which have series_id in it.
awk '/series_id/ && (! $0 in a) { c++; a[$0] } END {print c}'
This essentially states: if my line contains "series_id" and I did not store the line in my array a, then it means I did not encounter my line yet and increase the counter c with 1. At the END of the program, I print the count c.
Now you want to clean things up a bit. Your lines of interest essentially look like
"something":"something else"
So we are interested in something else which is in the 4th field if " is a field separator, and we are only interested in that if something is series_id located in field 2.
awk -F'"' '($2=="series_id") && (! $4 in a ) { c++; a[$4] } END {print c}'
Finally, you don't care about the last letter of the fourth field, so we need to make a small substitution:
awk -F'"' '($2=="series_id") { str=$4; gsub(/.$/,"",str); if (! str in a) {c++; a[str] } } END {print c}'
You could also rewrite this differently as:
awk -F'"' '($2 != "series_id" ) { next }
{ str=$4; gsub(/.$/,"",str) }
( str in a ) { next }
{ c++; a[str] }
END { print c }'
My standard way to count unique values is making sure I have the list of values (using grep and cut in your case), and add the following commands behind a pipe:
| sort -n | uniq -c
The sort does the sorting, based on number sorting, while the uniq gets the unique entries (the -c stands for "count").
Do this : cat ELEC.txt | grep 'series_id' | cut -f1-4 -d. | uniq | wc -l
-f1-4 will remove the the fourth . from each line
Here is a possible solution using awk:
awk 'BEGIN{FS="[:.\"]+"} /^"series_id":/{print $6}' \
ELEC.txt |sort -n |uniq -c
The ouput for the sample you posted will be something like this:
1 56841-WND-WT
2 56855-ALL-ALL
1 56855-WND-ALL
2 56868-LFG-ALL
If you need the entire string, you can print the other fields as well:
awk 'BEGIN{FS="[:.\"]+"; OFS="."} /^"series_id":/{print $3,$4,$5,$6}' \
ELEC.txt |sort -n | uniq -c
And the output will be something like this:
1 ELEC.PLANT.CONS_EG_BTU.56841-WND-WT
2 ELEC.PLANT.CONS_EG_BTU.56855-ALL-ALL
1 ELEC.PLANT.CONS_EG_BTU.56855-WND-ALL
2 ELEC.PLANT.CONS_EG_BTU.56868-LFG-ALL
This line:
echo $(grep Uid /proc/1/status) | cut -d ' ' -f 2
Produces output:
0
This line:
grep Uid /proc/1/status | cut -d ' ' -f 2
Produces output:
Uid: 0 0 0 0
My goal was the first output. My question is, why the second command does not produce the output I expected. Why am I required to echo it?
One way to do this is to change the Output Field Separator or OFS variable in the bash shell
IFSOLD="$IFS" # IFS for Internal field separator
IFS=$'\t'
grep 'Uid' /proc/1/status | cut -f 2
0 # Your result
IFS="$IFSOLD"
or the easy way
grep 'Uid' /proc/1/status | cut -d $'\t' -f 2
Note : By the way tab is the default delim for cut as pointed out [ here ]
Use awk
awk '/Uid/ { print $2; }' /proc/1/status
You should almost never need to write something like echo $(...) - it's almost equivalent to calling ... directly. Try echo "$(...)" (which you should always use) instead, and you'll see it behaves like ....
The reason is because when the $() command substitution is invoked without quotes the resulting string is split by Bash into separate arguments before being passed to echo, and echo outputs each argument separated by a single space, regardless of the whitespace generated by the command substitution (in your case tabs).
As sjsam suggested, if you want to cut tab-delimited output, just specify tabs as the delimiter instead of spaces:
cut -d $'\t' -f 2
grep Uid /proc/1/status |sed -r ās/\s+/ /gā | awk ā{print $3}ā
Output
0
I ignore what is the problem with this code ?
#! /bin/bash
File1=$1
for (( j=1; j<=3; j++ ))
{
output=$(`awk -F; 'NR=='$j'{print $3}' "${File1}"`)
echo ${output}
}
File1 looks like this :
Char1;2;3;89
char2;9;6;66
char5;3;77;8
I want to extract on every line looped the field 3
so the result will be
3
6
7
It should be like this:
#! /bin/bash
File1=$1
for (( j=1; j<=3; j++ ))
{
output=$(awk -F ';' 'NR=='$j' {print $3}' "${File1}")
echo ${output}
}
It working well on my CentOS.
You are mixing single quotes and backticks all over the place and not escaping them
You can't use bash variables in an awk script without using the -v flag
awk already works in a loop so there is no reason to loop the loop...
Just:
awk -F";" '{print $3}' "${file1}"
Will do exactly what your entire script is trying to do now.
Even easier, use the cut utility : cut -d';' -f3 will produce the result you're looking for, where -d specifies the delimiter to use and -f the field/column you're looking for (1-indexed).
If you simply want to extract a column out from a structured file like the one you have, use the cut utility.
cut will allow you to specify what the delimiter is in your data (;) and what column(s) you'd like to extract (column 3).
cut -d';' -f3 "$file1"
If you would like to loop over the result of this, use a while loop and read the values one by one:
cut -d';' -f3 "$file1" |
while read data; do
echo "data is $data"
done
Would you want the values in a variable, do this
var=$( cut -d';' -f3 "$file1" | tr '\n' ' ' )
The tr '\n' ' ' bit replaces newlines with spaces, so you would get 3 6 77 as a string.
To get them into an array:
declare -a var=( $( cut -d';' -f3 "$file1" ) )
(the tr is not needed here)
You may then access the values as ${var[0]}, ${var[1]} etc.