Change values from a dataset variable in Bash - linux

I am new in Bash and I am trying to change the values of a column from the file data.csv coma delimitted.
In the dataset I have the variable sex with only 2 possible values 'Female' and 'Male' and I would like to transform Male into 'm' and Female into 'f'.
I have tried this:
#!/bin/bash
sex=$(cut -d , -f 5 data.csv) #I select column 5 related with the variable sex
for i in $sex; do
if [[$i='Female']]; then
$i='f'
fin
done
The code is wrong and I do not know how to modify it.
Besides, I would like to update my data.csv with the new values in sex.

# awk
# without header
awk -F, 'BEGIN{OFS=FS}{$5=="Male" ? $5="m" : $5="f"}1' data.csv > output1.csv
# with header
awk -F, 'BEGIN{OFS=FS} NR!=1{$5=="Male" ? $5="m" : $5="f"}1' data.csv > output1.csv
# bash
while read line
do
line=${line/,Male,/,m,}
line=${line/,Female,/,f,}
echo $line
done < data.csv > output2.csv
# sed
sed 's/,Male,/,m,/; s/,Female,/,f,/' data.csv > output3.csv

awk -F , -v OFS=, '
$5 == "Female" {$5 = "f"}
$5 == "Male" {$5 = "m"} 1' data.csv

Related

Linux, bash: Determines the row number of a cell that is in a specific column and has a specific content

Determines the row number of a cell that is in a specific column and has a specific content.
Remark:
The heading of a column counts as a line.
An empty field in a column counts as a row.
The fields of csv are separated by comma.
Given:
The follow csv file are given:
file.csv
col_o2g,col_dgjdhu,col_of_interest,,
1234567890,tg75fjksfh,$kj56hahb,,
dsewsf,1234567890,,,
khhhdg,5gfj578fj,1234567890,,
,57ijf6ehg,46h%sgf,,
ubthfgfv,zts576fufj,256hf%(",,
Given variables:
# col variable
col=col_of_interest
# variable with the value of the field of interest
value_of_interest=1234567890
# output variable
# thats he part I am looking for
wanted_line_number=
What I have:
LINE_CNT=$(awk '-F[\t ]*,[\t ]*' -vcol=${col} '
FNR==1 {
for(i=1; i<=NF; ++i) {
if($i == col) {
col = i;
break;
}
}
if(i>NF) {
exit 1;
}
}
FNR>1 {
if($col) maxc=FNR;
}
END{
print maxc;
}' file.csv)
echo line count of lines from column $col
echo "$LINE_CNT"
Wanted output:
echo "The wanted line number are:"
echo $wanted_line_number
output:4
I have been trying to decipher your question, so let me know whether I did it right or not. I guess in your case you don't know how many columns are present in the csv file, and also you don't know whether the first line is the header or not.
For the second remark, I have no automatic solution, so you need to provide whether the line 1 is a header or not based on an input parameter.
Let me show you with a test case
]$ more test.csv
col_1,col_2,col_3,col_4
1234567890,tg75fjksfh,kj56hahb,dkdkdkd
dsewsf,1234567890,,dkdkdk
khhhdg,5gfj578fj,1234567890,akdkdkd
ubthfgfv,zts576fufj,256hf,,
Then you want to know the position of the column of interest in your csv and also the line where the value of interest is located. Here my example script ( that can be improved ). Keep in mind that I harcoded my example of test.csv file into the script.
$ cat check_csv.sh
column_of_interest=$1
value_of_interest=$2
with_header=$3
# check which column is the one
if [[ $with_header = "Y" ]];
then
num_cols=$(cat test.csv | awk --field-separator="," "{ print NF }" | head -n 1)
echo "csv contains $num_cols columns"
to_rows=$(cat test.csv | head -n 1 | tr ',' '\n')
iteration=0
for i in $(cat test.csv | head -n 1 | tr ',' '\n')
do
iteration=$(expr $iteration + 1)
counter=$(echo $i | egrep -i "$column_of_interest" | wc -l)
#echo $i
#echo $counter
if [ $counter -eq 1 ]
then
echo "Column of interest $i is located on number $iteration"
export my_col_is=$iteration;
fi
done
# fine line that ccontains the value of interest
iteration=0
while IFS= read -r line
do
iteration=$(expr $iteration + 1 )
if [[ $iteration -gt 1 ]];
then
#echo $line
is_there=$(echo $line | awk -v temp=$my_col_is -F ',' '{print $temp}' | egrep -i "$value_of_interest"| wc -l)
#echo $is_there
if [ $is_there -gt 0 ];
then
echo "Value of interest $value_of_interest is present on line $iteration"
fi
fi
done < test.csv
fi
Running the example when I want to know which column is col_2 ( position ) and lines where it appears the value 1234567890 for that column. I use an option to identify that the file has header
$ more test.csv
col_1,col_2,col_3,col_4
1234567890,tg75fjksfh,kj56hahb,dkdkdkd
dsewsf,1234567890,,dkdkdk
khhhdg,5gfj578fj,1234567890,akdkdkd
ubthfgfv,zts576fufj,256hf,,
$ ./check_csv.sh col_2 1234567890 Y
csv contains 4 columns
Column of interest col_2 is located on number 2
Value of interest 1234567890 is present on line 3
With lines duplicated
$ more test.csv
col_1,col_2,col_3,col_4
1234567890,tg75fjksfh,kj56hahb,dkdkdkd
dsewsf,1234567890,,dkdkdk
khhhdg,5gfj578fj,1234567890,akdkdkd
ubthfgfv,zts576fufj,256hf,,
dsewsf,1234567890,,dkdkdk
dsewsf,1234567890,,dkdkdk
$ ./check_csv.sh col_2 1234567890 Y
csv contains 4 columns
Column of interest col_2 is located on number 2
Value of interest 1234567890 is present on line 3
Value of interest 1234567890 is present on line 6
Value of interest 1234567890 is present on line 7
$
If you want to treat the files without header, you only need to copy the code to the treat those without head -1, but in those cases, you cannot get names of the columns and you won't know where to find them respect of the columns.
col="col_of_interest"
value_of_interest="1234567890"
awk -v FS="," -v coi="$col" -v voi="$value_of_interest" \
'NR==1{
for(i=1; i<=NF; i++){
if(coi==$i){
y=i
}
}
next
}
{if($y==voi){print NR}}' file
Output:
4
See: GNU awk: String-Manipulation Functions (split), Arrays in awk, 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR and man awk
file=./input.csv
d=,
# get column number for col_of_interest
c=$(head -n1 "$file" | grep -oE "[^$d]+" | grep -niw "$col" | cut -d: -f1)
# print column with cut and get line numbers for 1234567890
[ "$c" -gt 0 ] && wanted_line_number=$(cut -d$d -f$c "$file" | grep -niw "$value_of_interest" | cut -d: -f1)
printf "The wanted line number are: %b\n" $wanted_line_number

AWK- reading an exel file(CSV) and making another

i am using "awk" command to read my csv file, i want to use it with a condition, if the condition is valid i want to take the "row" not with all the colomns and write it in another csv file.
for example:
the CSV File:
fname lname id address street phone telephone
row1:myfname mylname 123 serlanka j12street 05666355 02365410
row2...
row3...
the condition: if the row have an id "123" -> then i want just fname, lname and id columns in the new csv.
i'v used awk command in my code.
code:
zcat "$FileName" | awk -F'\t' '(($4 >=400) && ($4 <=599)) {Str="HTTP Error: " $4;print Str >> "New.csv"}'
how can i write the row or some information from the same row.
Thanks.
More or less add this to the end of your script:
$3 ~ /123/ {print $1,$2,$3 >> "New.csv"}
Try this if the field delimiter is just spaces:
awk '$3 == "123" {print $1,$2,$3}'
or this if the field delimiter is a ,
awk -F, '$3 == "123" {print $1,$2,$3}'

Convert number from text file

I have a file:
id name date
1 paul 23.07
2 john 43.54
3 marie 23.4
4 alan 32.54
5 patrick 32.1
I want to print names that start with "p" and have an odd numbered id
My command:
grep "^p" filename | cut -d ' ' -f 2 | ....
result:
paul
patrick
Awk can do it all:
$ awk 'NR > 1 && $2 ~ /^p/ && ($1 % 2) == 1 { print $2 }' op.txt
paul
patrick
EDIT
To use : as the field separator:
$ awk -F: 'NR > 1 && $2 ~ /^p/ && ($1 % 2) == 1 { print $2 }' op.txt
NR > 1
Skip the header
$2 ~ /^p/
Name field starts with p
$1 % 2 == 1
ID field is odd
If all of the above are true:
{ print $2 }
Print the name field
How about a little awk?
awk '{if ($1 % 2 == 1 && substr($2, 1, 1) == "p") print $2}' filename
In awk the fields are split by spaces, tabs and newlines by default, so your id is available as $1, name as $2 etc. The if is quite self-explanatory, when the condition is true, the name is printed out, otherwise nothing is done. AWK and its syntax is far more friendly than people usually think.
Just remember the basic pattern:
BEGIN {
# ran once in the beginning
}
{
# done for each line
}
END {
# ran once in the end
}
If you need a more complex parsing, you can keep the script clear and readable in a separate file and call it like this:
awk -f script.awk filename
You might try this
grep -e "[0-9]*[13579]\s\+p[a-z]\+" -o text | tr -s ' ' | cut -d ' ' -f 2
Odd number is easily represented by regex which here we write
[0-9]*[13579]
If you try to run this command with sample file name text
file: text
id name date
1 paul 23.07
2 john 43.54
3 marie 23.4
5 patrick 32.1
38 peter 21.44
10019 peyton 12.02
you will get outputs:
paul
patrick
peyton
Note that tr -s ' ' uses to make sure that your delimiter is always 1 space.

formatting text using awk

Hi I have the following text and I need to use awk or sed to print 3 separate columns
11/13/14 101 HUDSON AUBONPAINJERSEY CITY NJ $4.15
11/22/14 MTAMVM*110TH ST/CATNEW YORK NY $19.05
11/22/14 DUANE READE #14226 0NEW YORK NY $1.26
So I like to produce a file containing all the dates. Another file containing all the description and third file containing all the numbers
I can use an awk to print the first column printy $1 and then use -F [$] option to print last column but I'm not able to just print the middle column as there are spaces etc. Can I ignore the spaces? or is there a better way of doing this?
Thaking you in advance
Try doing this :
$ awk '
{
print $1 > "dates"; $1=""
print $NF > "prices"; $NF=""
print $0 > "desc"
}
' file
or :
awk -F' +' '
{
print $1 > "dates"
print $2 > "desc"
print $3 > "prices"
}
' file
Then :
$ cat dates
$ cat desc
$ cat prices
Wasn't fast enough to be the first to give an awk solution, so here's one with grep and sed...
grep -o '^.*/.*/1.' file #first col
sed 's/^.*\/.*\/1.//;s/\$.*//' file #middle col
grep -o '\$.*$' file #last col

Count specific numbers from a column from an input file linux

i was trying to read a file and count a specific number at a specific place and show how many times it appears, for example:
1st field are numbers, 2nd field brand name, 3rd field a group they belong to, 4th and 5th not important.
1:audi:2:1990:5
2:bmw:2:1987:4
3:bugatti:3:1988:19
4.buick:4:2000:12
5:dodge:2:1999:4
6:ferrari:2:2000:4
As an output, i want to search by column 3, and group together 2's(by brand name) and count how many of them i have.
The output i am looking for should look like this:
1:audi:2:1990:5
2:bmw:2:1987:4
5:dodge:2:1999:4
6:ferrari:2:2000:4
4 -> showing how many lines there are.
I have tried taken this approach but can't figure it out:
file="cars.txt"; sort -t ":" -k3 $file #sorting by the 3rd field
grep -c '2' cars.txt # this counts all the 2's in the file including number 2.
I hope you understand. and thank you in advance.
I am not sure exactly what you mean by "group together by brand name", but the following will get you the output that you describe.
awk -F':' '$3 == 2' Input.txt
If you want a line count, you can pipe that to wc -l.
awk -F':' '$3 == 2' Input.txt | wc -l
I guess line 4 is 4:buick and not 4.buick. Then I suggest this
$ awk 'BEGIN{FS=":"} $3~2{total++;print} END{print "TOTAL --- "total}' Input.txt
Plain bash solution:
#!/bin/bash
while IFS=":" read -ra line; do
if (( ${line[2]} == 2 )); then
IFS=":" && echo "${line[*]}"
(( count++ ))
fi
done < file
echo "Count = $count"
Output:
1:audi:2:1990:5
2:bmw:2:1987:4
5:dodge:2:1999:4
6:ferrari:2:2000:4
Count = 4

Resources