Linux SHELL script, read each row for different number of columns

Linux SHELL script, read each row for different number of columns - linux

I have file and for example values in it:
1 value1.1 value1.2
2 value2.1
3 value3.1 value3.2 value3.3
I need to read values using the shell script from it but number of columns in each row is different!!!
I know that if for example I want to read second column I will do it like this (for row number as input parameter)
$ awk -v key=1 '$1 == key { print $2 }' input.txt
value1.1
But as I mentioned number of columns is different for each row.
How to make this read dynamic?
For example:
if input parameter is 1 it means I should read columns from the first row so output should be
value1.1 value1.2
if input parameter is 2 it means I should read columns from the second row so output should be
value2.1
if input parameter is 3 it means I should read columns from the third row so output should be
value3.1 value3.2 value3.2
Th point is that number of columns is not static and I should read columns from that specific row until the end of the row.
Thank you

Then you can simply say:
awk -v key=1 'NR==key' input.txt
UPDATED
If you want to process with the column data, there will be several ways.
With awk you can say something like:
awk -v key=3 'NR==key {
for (i=1; i<=NF; i++)
printf "column %d = %s\n", i, $i
}' input.txt
which outputs:
column 1 = value3.1
column 2 = value3.2
column 3 = value3.2
In awk you can access each column value by $1, $2, $3 directly or by $i indirectly where variable i holds either of 1, 2, 3.
If you prefer going with bash, try something like:
line=$(awk -v key=3 'NR==key' input.txt)
set -- $line # split into columns
for ((i=1; i<=$#; i++)); do
echo column $i = ${!i}
done
which outputs the same results.
In bash the indirect access is a little bit complex and you need to say ${!i} where i is a variable name.
Hope this helps.

Related

Filtering on a condition using the column names and not numbers

I am trying to filter a text file with columns based on two conditions. Due to the size of the file, I cannot use the column numbers (as there are thousands and are unnumbered) but need to use the column names. I have searched and tried to come up with multiple ways to do this but nothing is returned to the command line.
Here are a few things I have tried:
awk '($colname1==2 && $colname2==1) { count++ } END { print count }' file.txt
to filter out the columns based on their conditions
and
head -1 file.txt | tr '\t' | cat -n | grep "COLNAME
to try and return the possible column number related to the column.
An example file would be:
ID ad bd
1 a fire
2 b air
3 c water
4 c water
5 d water
6 c earth
Output would be:
2 (count of ad=c and bd=water)

with your input file and the implied conditions this should work
$ awk -v c1='ad' -v c2='bd' 'NR==1{n=split($0,h); for(i=1;i<=n;i++) col[h[i]]=i}
$col[c1]=="c" && $col[c2]=="water"{count++} END{print count+0}' file
2
or you can replace c1 and c2 with the values in the script as well.
to find the column indices you can run
$ awk -v cols='ad bd' 'BEGIN{n=split(cols,c); for(i=1;i<=n;i++) colmap[c[i]]}
NR==1{for(i=1;i<=NF;i++) if($i in colmap) print $i,i; exit}' file
ad 2
bd 3
or perhaps with this chain
$ sed 1q file | tr -s ' ' \\n | nl | grep -E 'ad|bd'
2 ad
3 bd
although may have false positives due to regex match...
You can rewrite the awk to be more succinct
$ awk -v cols='ad bd' '{while(++i<=NF) if(FS cols FS ~ FS $i FS) print $i,i;
exit}' file
ad 2
bd 3

As I mentioned in an earlier comment, the answer at https://unix.stackexchange.com/a/359699/133219 shows how to do this:
awk -F'\t' '
NR==1 {
for (i=1; i<=NF; i++) {
f[$i] = i
}
}
($(f["ad"]) == "c") && ($(f["bd"]) == "water") { cnt++ }
END { print cnt+0 }
' file
2
I'm assuming your input is tab-separated due to the tr '\t' in the command in your question that looks like you're trying to convert tabs to newlines to convert column names to numbers. If I'm wrong and they're just separated by any chains of white space then remove -F'\t' from the above.

Use miller toolkit to manipulate tab-delimited files using column names. Below is a one-liner that filters a tab-delimited file (delimiter is specified using --tsv) and writes the results to STDOUT together with the header. The header is removed using tail and the lines are counted with wc.
mlr --tsv filter '$ad == "c" && $bd == "water"' file.txt | tail -n +2 | wc -l
Prints:
2
SEE ALSO:
miller manual
Note that miller can be easily installed, for example, using conda, like so:
conda create --name miller miller

For years it bugged me there is no succinct way in Unix to do this sort of thing, although miller is a pretty good tool for this. Recently I wrote pick to choose columns by name, and additionally modify, combine and add them by name, as well as filtering rows by clauses using column names. The solution to the above with pick is
pick -h #ad=c #bd=water < data.txt | wc -l
By default pick prints the header of the selected columns, -h is to omit it. To print columns you simply name them on the command line, e.g.
pick ad water < data.txt | wc -l
Pick has many modes, all of them focused on manipulating columns and selecting/filtering rows with a minimal amount of syntax.

rank grep result by entries' timestamp

I would like to rank log entries by the timestamp of each entry.
let's say my grep result is like this, with each entry having different number of fields and time on different number of columns:
a, 3, time:123
b, time:124, 4
c, time:122, 5
how should I pipe the result such that it looks like this?
c, time:122, 5
a, 3, time:123
b, time:124, 4

Would you try the following:
while IFS= read -r line; do
[[ $line =~ time:([0-9]+) ]] && printf "%s\t%s\n" "${BASH_REMATCH[1]}" "$line"
done < file | sort -n | cut -f 2-
It first extracts the time after the time: substring.
Then it prepends the time before the line using a tab as a delimiter.
It numerically sorts the lines.
Finally it cuts off the 1st field.

A general solution is:
for each line:
detect log format
extract timestamp column based on detected format
convert timestamp into sortable-form
print sortable-form + column delimiter + original line
pipe output of previous stage into something that sorts on the new first column
pipe output of previous stage into something that strips off the new first column

AWK write to new column base on if else of other column

I have a CSV file with columns A,B,C,D. Column D contains values on a scale of 0 to 1. I want to use AWK to write to a new column E base in values in column D.
For example:
if value in column D <0.7, value in column E = 0.
if value in column D>=0.7, value in column E =1.
I am able to print the output of column E but not sure how to write it to a new column. Its possible to write the output of my code to a new file then paste it back to the old file but i was wondering if there was a more efficient way. Here is my code:
awk -F"," 'NR>1 {if ($3>=0.7) $4= "1"; else if ($3<0.7) $4= "0"; print $4;}' test_file.csv

below awk command should give you intended output
cat yourfile.csv|awk -F "," '{if($4>=0.7)print $0",1";else if($4<0.7)print $0",0"}' > test_file.csv

You can use:
awk -F, 'NR>1 {$0 = $0 FS (($4 >= 0.7) ? 1 : 0)} 1' test_file.csv

Separate comma delimited cells to new rows with shell script

I have a table with comma delimited columns and I want to separate the comma delimited values in my specified column to new rows. For example, the given table is
Name Start Name2
A 1,2 X,a
B 5 Y,b
C 6,7,8 Z,c
And I need to separate the comma delimited values in column 2 to get the table below
Name Start Name2
A 1 X,a
A 2 X,a
B 5 Y,b
C 6 Z,c
C 7 Z,c
C 8 Z,c
I am wondering if there is any solution with shell script, so that I can create a workflow pipe.
Note: the original table may contain more than 3 columns.

Assuming the format of your input and output does not change:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $NF; print $1, $3, $NF}' input_file
Input:
input_file:
A 1,2 X
B 5,6 Y
Output:
A 1 X
A 2 X
B 5 Y
B 6 Y
Explanation:
awk: invoke awk, a tool for manipulating lines (records) and fields
'...': content enclosed by single-quotes are supplied to awk as instructions
'BEGIN{FS="[ ,]"}: before reading any lines, tell awk to use both space and comma as delimiters; FS stands for Field Separator.
{print $1, $2, $NF; print $1, $3, $NF}: For each input line read, print the 1st, 2nd and last field on one line, and then print the 1st, 3rd, and last field on the next line. NF stands for Number of Fields, so $NF is the last field.
input_file: supply the name of the input file to awk as an argument.
In response to updated input format:
awk 'BEGIN{FS="[ ,]"} {print $1, $2, $4","$5; print $1, $3, $4","$5}' input_file

After Runner's modification of the original question another approach might look like this:
#!/bin/sh
# Usage $0 <file> <column>
#
FILE="${1}"
COL="${2}"
# tokens separated by linebreaks
IFS="
"
for LINE in `cat ${FILE}`; do
# get number of columns
COLS="`echo ${LINE} | awk '{print NF}'`"
# get actual field by COL, this contains the keys to be splitted into individual lines
# replace comma with newline to "reuse" newline field separator in IFS
KEYS="`echo ${LINE} | cut -d' ' -f${COL}-${COL} | tr ',' '\n'`"
COLB=$(( ${COL} - 1 ))
COLA=$(( ${COL} + 1 ))
# get text from columns before and after actual field
if [ ${COLB} -gt 0 ]; then
BEFORE="`echo ${LINE} | cut -d' ' -f1-${COLB}` "
else
BEFORE=""
fi
AFTER=" `echo ${LINE} | cut -d' ' -f${COLA}-`"
# echo "-A: $COLA ($AFTER) | B: $COLB ($BEFORE)-"
# iterate keys and re-build original line
for KEY in ${KEYS}; do
echo "${BEFORE}${KEY}${AFTER}"
done
done
With this shell file you might do what you want. This will split column 2 into multiple lines.
./script.sh input.txt 2
If you'd like to pass inputs though standard input using pipes (e.g. to split multiple columns in one go) you could change the 6. line to:
if [ "${1}" == "-" ]; then
FILE="/dev/stdin"
else
FILE="${1}"
fi
And run it this way:
./script.sh input.txt 1 | ./script.sh - 2 | ./script.sh - 3
Note that cut is very sensitiv about the field separators. Soif the line starts with a space character, column 1 would be "" (empty). If the fields were separated by amixture of spaces and tabs this script would have other issues too. In this case (as explained above) filtering the input resource (so that fields are only separated by one space character) should do it. If this is not possible or the data in each column contains space characters too, the script might get more complicated.

CSV grep but keep the header

I have a CSV file that look like this:
A,B,C
1,2,3
4,4,4
1,2,6
3,6,9
Is there an easy way to grep all the rows in which the B column is 2, and keep the header? For example, I want the output be like
A,B,C
1,2,3
1,2,6
I am working under linux

Using awk:
awk -F, 'NR==1 || $2==2' file
NR==1 -> if first line,
$2==2 -> if second column is equal to 2. Lines are printed if either of the above is true.
To choose the column using the header column name:
awk -F, -v col="B" 'NR==1{for(i=1;i<=NF;i++)if($i==col)break;print;next}$i==2' file
Replace B with the appropriate name of the column which you want to check against.

You can use addresses in sed:
sed -n '1p;/^[^,]*,2/p'
It means:
1p Print the first line.
/ Start a match.
^ Match the beginnning of a line.
[^,] Match anything but a comma
* zero or more times.
, Match a comma.
2 Match a 2.
/p End of match, if it matches, print.
If the header can contain the value you are looking for, you should be more careful:
sed -n '1p;1!{/^[^,]*,2/p}'
1!{ ... } just means "Do the following for lines other then the first one".
For column number n>2, you can add a quantifier:
sed -n '1p;1!{/^\([^,]*,\)\{M\}2/p}'
where M=n-1. The quantifier just means repetition, so the non-comma-0-or-more-times-comma thing is repeated M times.
For true CSV files where a value can contain a comma, switch to Perl and Text::CSV.

$ awk -F, 'NR==1 { for (i=1;i<=NF;i++) h[$i] = i; print; next } $h["B"] == 2' file
A,B,C
1,2,3
1,2,6
By the way, sed is an excellent tool for simple substitutions on a single line, for anything else, just use awk - the code will be clearer and MUCH easier to enhance in future if necessary.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Linux SHELL script, read each row for different number of columns - linux

Related

Filtering on a condition using the column names and not numbers

rank grep result by entries' timestamp

AWK write to new column base on if else of other column

Separate comma delimited cells to new rows with shell script

CSV grep but keep the header

Categories

Resources