Substitute second digit of a column with a constant number linux - linux

I have a table like:
Person age
name1 45
name2 13
name3 28
name4 89
I would like, in an automatised way (since it's a big table) , to modify the second digit of the second table for a 0, so that I have decade groups instead of exact age number:
Person age
name1 40
name2 10
name3 20
name4 80
Which is the neatest way to do that? Thanks!

Try this script, it does the trick ( just remove the first line from your input file )
#!/bin/bash
file_path="/home/mobaxterm/Desktop/f.txt"
dest_file="/home/mobaxterm/Desktop/f2.txt"
echo "Person age" > /home/mobaxterm/Desktop/f2.txt
while read p; do
name=$(echo "$p" | cut -f1 -d' ')
age=$(echo "$p" | cut -f2 -d' ' | sed 's/\(.\{1\}\)./\10/')
echo "$name $age" >> /home/mobaxterm/Desktop/f2.txt
done < $file_path

Related

Use values in a column to separate strings in another column in bash

I am trying to separate a column of strings using the values from another column, maybe an example will be easier for you to understand.
The input is a table, with strings in column 2 separated with a comma ,.
The third column is the field number that should be outputted, with , as the delimited in the second column.
Ben mango,apple 1
Mary apple,orange,grape 2
Sam apple,melon,* 3
Peter melon 1
The output should look like this, where records that correspond to an asterisk should not be outputted (the Sam row is not outputted):
Ben mango
Mary orange
Peter melon
I am able to generate the desired output using a for loop, but I think it is quite cumbersome:
IFS=$'\n'
for i in $(cat input.txt)
do
F=`echo $i | cut -f3`
paste <(echo $i | cut -f1) <(echo $i | cut -f2 | cut -d "," -f$F) | grep -v "\*"
done
Is there any one-liner to do it maybe using sed or awk? Thanks in advance.
The key to doing it in awk is the split() function, which populates an array based on a regular expression that matches the delimiters to split a string on:
$ awk '{ split($2, fruits, /,/); if (fruits[$3] != "*") print $1, fruits[$3] }' input.txt
Ben mango
Mary orange
Peter melon

Bash: Reading a CSV file and sorting column based on a condition

I am trying read a CSV text file and print all entries of one column (sorted), based on a condition.
The input sample is as below:
Computer ID,User ID,M
Computer1,User3,5
Computer2,User5,8
computer3,User4,9
computer4,User10,3
computer5,User9,0
computer6,User1,11
The user-ID (2nd column) needs to be printed if the hours (third column) is greater than zero. However, the printed data should be sorted based on the user-id.
I have written the following script:
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
if [ $col3 -gt 0 ]
then
echo "$col2" > login.txt
fi
done < <(tail -n+2 user-list.txt)
The output of this script is:
User3
User5
User4
User10
User1
I am expecting the following output:
User1
User3
User4
User5
User10
Any help would be appreciated. TIA
awk -F, 'NR == 1 { next } $3 > 0 { match($2,/[[:digit:]]+/);map[$2]=substr($2,RSTART) } END { PROCINFO["sorted_in"]="#val_num_asc";for (i in map) { print i } }' user-list.txt > login.txt
Set the field delimiter to commas with -F, Ignore the header with NR == 1 { next } Set the index of an array (map) to the user when the 3rd delimited field is greater than 0. The value is set the number part of the User field (found with the match function) In the end block, set the sort order to value, number, ascending and loop through the map array created.
The problem with your script (and I presume with the "sorting isn't working") is the place where you redirect (and may have tried to sort) - the following variant of your own script does the job:
#!/bin/bash
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
if [ $col3 -gt 0 ]
then
echo "$col2"
fi
done < <(tail -n+2 user-list.txt) | sort > login.txt
Edit 1: Match new requirement
Sure we can fix the sorting; sort -k1.5,1.7n > login.txt
Of course, that, too, will only work if your user IDs are all 4 alphas and n digits ...
Sort ASCIIbetically:
tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2
computer6,User1,11
computer4,User10,3
Computer1,User3,5
computer3,User4,9
Computer2,User5,8
Or sort numerically by the user number:
tail -n +2 user-list.txt | perl -F',' -lane 'print if $F[2] > 0;' | sort -t, -k2,2V
computer6,User1,11
Computer1,User3,5
computer3,User4,9
Computer2,User5,8
computer4,User10,3
Using awk for condition handling and sort for ordering:
$ awk -F, ' # comma delimiter
FNR>1 && $3 { # skip header and accept only non-zero hours
a[$2]++ # count instances for duplicates
}
END {
for(i in a) # all stored usernames
for(j=1;j<=a[i];j++) # remove this if there are no duplicates
print i | "sort -V" # send output to sort -V
}' file
Output:
User1
User3
User4
User5
User10
If there are no duplicated usernames, you can replace a[$2]++ with just a[$2] and remove the latter for. Also, no real need for sort to be inside awk program, you could just as well pipe data from awk to sort, like:
$ awk -F, 'FNR>1&&$3{a[$2]++}END{for(i in a)print i}' file | sort -V
FNR>1 && $3 skips the header and processes records where hours column is not null. If your data has records with negative hours and you only want positive hours, change it to FNR>1 && $3>0.
Or you could use grep with PCRE andsort:
$ grep -Po "(?<=,).*(?=,[1-9])" file | sort -V

Search the amount of unique value and how many times they appear

I have a csv file with
value name date sentence
0000 name1 date1 I want apples
0021 name2 date1 I want bananas
0212 name3 date2 I want cars
0321 name1 date3 I want pinochio doll
0123 name1 date1 I want lemon
0100 name2 date1 I want drums
1021 name2 date1 I want grape
2212 name3 date2 I want laptop
3321 name1 date3 I want Pot
4123 name1 date1 I want WC
2200 name4 date1 I want ramen
1421 name5 date1 I want noodle
2552 name4 date2 I want film
0211 name6 date3 I want games
0343 name7 date1 I want dvd
I want to find the unique value in the name tab (I know I have to use -f 2 but then I also want to know how many times they appear/the amount of sentence they made.
eg: name1,5
name2,3
name3,2
name4,2
name5,1
name6,1
name7,1
Then afterwards I want to make another data on how many people per appearence
1 appearance, 3
2 appearance ,2
3 appearance ,1
4 appearance ,0
5 appearance ,1
The answer to the first part is using awk below
awk -F" " 'NR>1 { print $2 } ' jerome.txt | sort | uniq -c
For the second part, you can pipe it through Perl and get the results as below
> awk -F" " 'NR>1 { print $2 } ' jerome.txt | sort | uniq -c | perl -lane '{$app{$F[0]}++} END {#c=sort keys %app; foreach($c[0] ..$c[$#c]) {print "$_ appearance,",defined($app{$_})?$app{$_}:0 }}'
1 appearance,3
2 appearance,2
3 appearance,1
4 appearance,0
5 appearance,1
>
EDIT1:
Second part using a Perl one-liner
> perl -lane '{$app{$F[1]}++ if $.>1} END {$app2{$_}++ for(values %app);#c=sort keys %app2;foreach($c[0] ..$c[$#c]) {print "$_ appearance,",$app2{$_}+0}}' jerome.txt
1 appearance,3
2 appearance,2
3 appearance,1
4 appearance,0
5 appearance,1
>
For the 1st report, you can use:
tail -n +2 file | awk '{print $2}' | sort | uniq -c
5 name1
3 name2
2 name3
2 name4
1 name5
1 name6
1 name7
For the 2nd report, you can use:
tail -n +2 file | awk '{print $2}'| sort | uniq -c | awk 'BEGIN{max=0} {map[$1]+=1; if($1>max) max=$1} END{for(i=1;i<=max;i++){print i" appearance,",(i in map)?map[i]:0}}'
1 appearance, 3
2 appearance, 2
3 appearance, 1
4 appearance, 0
5 appearance, 1
The complexity here is due to the fact that you wanted a 0 and custom text appearance in the output.
What you are after is a classic example of combining a set of core-tools of Linux in a pipeline:
This solves your first problem:
$ awk '(NR>1){print $2}' file | sort | uniq -c
5 name1
3 name2
2 name3
2 name4
1 name5
1 name6
1 name7
This solves your second problem:
$ awk '(NR>1){print $2}' file | sort | uniq -c | awk '{print $1}' | uniq -c
1 5
1 3
2 2
3 1
You notice that the formatting is a bit missing, but this essentially solves your problem.
Of course in awk you can do it in one go, but I do believe that you should try to understand the above line. Have a look at man sort and man uniq. The awk solution is:
Problem 1:
awk '(NR>1){a[$2]++}END{ for(i in a) print i "," a[i] }' file
name6,1
name7,1
name1,4
name2,3
name3,2
name4,2
name5,1
Problem 2:
awk '(NR>1){a[$2]++; m=(a[$2]<m?m:a[$2])}
END{ for(i in a) c[a[i]]++;
for(i=1;i<=m;++i) print i, "appearance,", c[i]+0
}' foo.txt
1 appearance, 3
2 appearance, 2
3 appearance, 1
4 appearance, 0
5 appearance, 1

Swap column x of tab-separated values file with column x of second tsv file

Let's say I have:
file1.tsv
Foo\tBar\tabc\t123
Bla\tWord\tabc\tqwer
Blub\tqwe\tasd\tqqq
file2.tsv
123\tzxcv\tAAA\tqaa
asd\t999\tBBB\tdef
qwe\t111\tCCC\tabc
And I want to overwrite column 3 of file1.tsv with column 3 of file2.tsv to end up with:
Foo\tBar\tAAA\t123
Bla\tWord\tBBB\tqwer
Blub\tqwe\tCCC\tqqq
What would be a good way to do this in bash?
Take a look at this awk:
awk 'FNR==NR{a[NR]=$3;next}{$3=a[FNR]}1' OFS='\t' file{2,1}.tsv > output.tsv
If you want to use just bash, with little more effort:
while IFS=$'\t' read -r a1 a2 _ a4; do
IFS=$'\t' read -ru3 _ _ b3 _
printf '%s\t%s\t%s\t%s\n' "$a1" "$a2" "$b3" "$a4"
done <file1.tsv 3<file2.tsv >output.tsv
Output:
Foo Bar AAA 123
Bla Word BBB qwer
Blub qwe CCC qqq
Another way to do this can be, with correction as pointed out by #PesaThe:
paste -d$'\t' <(cut -d$'\t' -f1,2 file1.tsv) <(cut -d$'\t' -f3 file2.tsv) <(cut -d$'\t' -f4 file1.tsv)
The output will be:
Foo Bar AAA 123
Bla Word BBB qwer
Blub qwe CCC qqq

bash: Find irregular values in strings

I'm looking to fetch a value after a match in a string.
Lets say I have two string:
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Now I want to set three parameters: Name, Age and City.
If I were to do:
name=$(echo "$string1" | awk '{ print $2 $3 }')
city=$(echo "$string1" | awk '{ print $5 }')
city=$(echo "$string1" | awk '{ print $8 $9 }
It would work for string1, but obviously not for string2.
After some googling I believe I should put it in some kind of array, but I do not really know how to proceed.
Basically, I want everything after Name: and before Age: to be parameter $name. Everything between Age: and City: to be $age, and so on.
Best regards
Needs bash version 3 or higher:
if [[ $string1 =~ ^Name:\ (.*)\ Age:\ (.*)\ City:\ (.*) ]] ; then
name=${BASH_REMATCH[1]}
age=${BASH_REMATCH[2]}
city=${BASH_REMATCH[3]}
fi
You might need Age:\ ([0-9]*).*\ City: if you do not want "Years" to be included in $years.
awk is the best solution for this because you can set the field separator to a regex and then your fields are $2, $3 and $4
name=$(awk -F'[[:alpha:]]+: ' '{print $2}' <<<"$string1")
age=$(awk -F'[[:alpha:]]+: ' '{print $3}' <<<"$string1")
city=$(awk -F'[[:alpha:]]+: ' '{print $4}' <<<"$string1")
Perl solution (taken partly from my answer here):
Capture name:
name=`perl -ne 'print $1 if /Name: ([a-zA-Z ]+) Age:/' <<< $string`
Capture age:
age=`perl -ne 'print $1 if /Age: ([0-9a-zA-Z ]+) City:/' <<< $string`
-ne tells perl to loop the specified one-liner over the input file or standard input without printing anything by default (you could call it awk emulation mode).
The parens in the regexes specify the bits you're interested in capturing. The other fragments acts as delimiters.
After running both of these through $string1 of your example I get 'John Doe' and '28'.
Edit: replaced echo $string with <<< $string, which is nice.
Something like this might work:
string1="Name: John Doe Age: 28 City: Oklahoma City"
string1ByRow=$(echo "$string1" | perl -pe 's/(\w+:)/\n$1\n/g' | sed '/^$/d' | sed 's/^ *//')
string1Keys=$(echo "$string1ByRow" | grep ':$' | sed 's/:$//')
string1Vals=$(echo "$string1ByRow" | grep -v ':$')
echo "$string1Keys"
Name
Age
City
echo "$string1Vals"
John Doe
28
Oklahoma City
Consider these commands:
name=$(awk -F": |Age" '{print $2}' <<< $string1)
age=$(awk -F": |City|Years" '{print $3}' <<< $string1)
city=$(awk -F"City: " '{print $2}' <<< $string1)
You can use three perl one-liners for assigning value to your variables -
name=$(perl -pe 's/.*(?<=Name: )([A-Za-z ]+)(?=Age).*/\1/' file)
age=$(perl -pe 's/.*(?<=Age: )([A-Za-z0-9 ]+)(?=City).*/\1/' file)
OR
age=$(perl -pe 's/.*(?<=Age: )([0-9 ]+)(?=Years|City).*/\1/' file)
city=$(perl -pe 's/.*(?<=City: )([A-Za-z ]+)"/\1/' file)
Test File:
[jaypal:~/Temp] cat file
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Name:
[jaypal:~/Temp] perl -pe 's/.*(?<=Name: )([A-Za-z ]+)(?=Age).*/\1/' file
John Doe
Jane
Age:
[jaypal:~/Temp] perl -pe 's/.*(?<=Age: )([A-Za-z0-9 ]+)(?=City).*/\1/' file
28
29 Years
OR
if you just want the age and not years then
[jaypal:~/Temp] perl -pe 's/.*(?<=Age: )([0-9 ]+)(?=Years|City).*/\1/' file
28
29
City:
[jaypal:~/Temp] perl -pe 's/.*(?<=City: )([A-Za-z ]+)"/\1/' file
Oklahoma City
Boston
I propose a generic solution:
keys=() values=()
for word in $string; do
wlen=${#word}
if [[ ${word:wlen-1:wlen} = : ]]; then
keys+=("${word:0:wlen-1}") values+=("")
else
alen=${#values[#]}
values[alen-1]=${values[alen-1]:+${values[alen-1]} }$word
fi
done
bash-3.2$ cat sample.log
string1="Name: John Doe Age: 28 City: Oklahoma City"
string2="Name: Jane Age: 29 Years City: Boston"
Using awk match inbuilt function:
awk ' { match($0,/Name:([A-Za-z ]*)Age:/,a); match($0,/Age:([ 0-9]*)/,b); match($0,/City:([A-Za-z ]*)/,c); print a[1]":" b[1]":"c[1] } ' sample.log
Output:
John Doe : 28 : Oklahoma City
Jane : 29 : Boston

Resources