Convert number from text file - linux

I have a file:
id name date
1 paul 23.07
2 john 43.54
3 marie 23.4
4 alan 32.54
5 patrick 32.1
I want to print names that start with "p" and have an odd numbered id
My command:
grep "^p" filename | cut -d ' ' -f 2 | ....
result:
paul
patrick

Awk can do it all:
$ awk 'NR > 1 && $2 ~ /^p/ && ($1 % 2) == 1 { print $2 }' op.txt
paul
patrick
EDIT
To use : as the field separator:
$ awk -F: 'NR > 1 && $2 ~ /^p/ && ($1 % 2) == 1 { print $2 }' op.txt
NR > 1
Skip the header
$2 ~ /^p/
Name field starts with p
$1 % 2 == 1
ID field is odd
If all of the above are true:
{ print $2 }
Print the name field

How about a little awk?
awk '{if ($1 % 2 == 1 && substr($2, 1, 1) == "p") print $2}' filename
In awk the fields are split by spaces, tabs and newlines by default, so your id is available as $1, name as $2 etc. The if is quite self-explanatory, when the condition is true, the name is printed out, otherwise nothing is done. AWK and its syntax is far more friendly than people usually think.
Just remember the basic pattern:
BEGIN {
# ran once in the beginning
}
{
# done for each line
}
END {
# ran once in the end
}
If you need a more complex parsing, you can keep the script clear and readable in a separate file and call it like this:
awk -f script.awk filename

You might try this
grep -e "[0-9]*[13579]\s\+p[a-z]\+" -o text | tr -s ' ' | cut -d ' ' -f 2
Odd number is easily represented by regex which here we write
[0-9]*[13579]
If you try to run this command with sample file name text
file: text
id name date
1 paul 23.07
2 john 43.54
3 marie 23.4
5 patrick 32.1
38 peter 21.44
10019 peyton 12.02
you will get outputs:
paul
patrick
peyton
Note that tr -s ' ' uses to make sure that your delimiter is always 1 space.

Related

Change values from a dataset variable in Bash

I am new in Bash and I am trying to change the values of a column from the file data.csv coma delimitted.
In the dataset I have the variable sex with only 2 possible values 'Female' and 'Male' and I would like to transform Male into 'm' and Female into 'f'.
I have tried this:
#!/bin/bash
sex=$(cut -d , -f 5 data.csv) #I select column 5 related with the variable sex
for i in $sex; do
if [[$i='Female']]; then
$i='f'
fin
done
The code is wrong and I do not know how to modify it.
Besides, I would like to update my data.csv with the new values in sex.
# awk
# without header
awk -F, 'BEGIN{OFS=FS}{$5=="Male" ? $5="m" : $5="f"}1' data.csv > output1.csv
# with header
awk -F, 'BEGIN{OFS=FS} NR!=1{$5=="Male" ? $5="m" : $5="f"}1' data.csv > output1.csv
# bash
while read line
do
line=${line/,Male,/,m,}
line=${line/,Female,/,f,}
echo $line
done < data.csv > output2.csv
# sed
sed 's/,Male,/,m,/; s/,Female,/,f,/' data.csv > output3.csv
awk -F , -v OFS=, '
$5 == "Female" {$5 = "f"}
$5 == "Male" {$5 = "m"} 1' data.csv

Print line numbers of duplicate entries

I have a file in the following format:
ABRA CADABRA
ABRA CADABRA
boys
girls
meds toys
I'd like to have the line number returned of any duplicate lines, so the results would look like the following:
1
2
I'd prefer a short one-line command with linux tools. I've tried experimenting with awk and sed but have not had success as of yet.
This would work:
nl file.txt | uniq -f 1 -D | cut -f 1
nl prepends a line number to each line
uniq finds duplicates
-f 1 ignores the first field, i.e., the line number
-D prints (only) the lines that are duplicate
cut-f 1 shows only the first field (the line number)
With a combination of sort, uniq, and awk you can use this series of commands.
sort File_Name | uniq -c | awk '{print $2}'
Here:
uniq -d < $file | while read line; do grep -hn "$line" $file; done
Do this:
perl -e 'my $l = 0; while (<STDIN>) { chomp; $l++; if (exists $f{$_}) { if ($f{$_}->[0]++ == 1) { print "$f{$_}->[1]\n"; print "$l\n"; } } else { $f{$_} = [1,$l]; } }' < FILE
Ugly, but works for unsorted files.
$ cat in.txt
ABRA CADABRA
ABRA CADABRA
boys
girls
meds toys
girls
$ perl -e 'my $l = 0; while (<STDIN>) { chomp; $l++; if (exists $f{$_}) { if ($f{$_}->[0]++ == 1) { print "$f{$_}->[1]\n"; print "$l\n"; } } else { $f{$_} = [1,$l]; } }' < in.txt
1
2
4
6
$
EDIT: Actually it will shorten slightly:
perl -ne '$l++; if (exists $f{$_}) { if ($f{$_}->[0]++ == 1) { print "$f{$_}->[1]\n"; print "$l\n"; } } else { $f{$_} = [1,$l]; }' < in.txt
to get all "different" duplicates in all lines you can try:
nl input.txt | sort -k 2 | uniq -D -f 1 | sort -n
this will not give you just the line numbers but also the duplicate found in those lines. Omit the last sort to get the duplicates grouped together.
also try running:
nl input.txt | sort -k 2 | uniq --all-repeated=separate -f 1
This will group the various duplicates by adding an empty line between groups of duplicates.
pipe results through
| cut -f 1 | sed 's/ \+//g'
to only get line numbers.
$ awk '{a[$0]=($0 in a ? a[$0] ORS : "") NR} END{for (i in a) if (a[i]~ORS) print a[i]}' file
1
2

Bash script to isolate words in a file

Here is my initial input data to be extracted:
david ex1=10 ex2=12 quiz1=5 quiz2=9 exam=99
judith ex1=8 ex2=16 quiz1=4 quiz2=10 exam=90
sam ex1=8 quiz1=5 quiz2=11 exam=85
song ex1=8 ex2=20 quiz2=11 exam=87
How do extract each word to be formatted in this way:
david
ex1=10
ex2=12
etc...
As I eventually want to have output like this:
david 12 99
judith 16 90
sam 0 85
song 20 87
when I run my program with the commands:
./marks ex2 exam < file
Supposed your input file is named input.txt, just replace space char by new line char using tr command line tool:
tr ' ' '\n' < input.txt
For your second request, you may have to extract specific field on each line, so the cut and awk commands may be useful (note that my example is certainly improvable):
while read p; do
echo -n "$(echo $p | cut -d ' ' -f1) " # name
echo -n "$(echo $p | cut -d ' ' -f3 | cut -d '=' -f2) " # ex2 val
echo -n $(echo $p | awk -F"exam=" '{ print $2 }') # exam val
echo
done < input.txt
This script does what you want:
#!/bin/bash
a=$#
awk -v a="$a" -F'[[:space:]=]+' '
BEGIN {
split(a, b) # split field names into array b
}
{
printf "%s ", $1 # print first field
for (i in b) { # loop through fields to search for
f = 0 # unset "found" flag
for (j=2; j<=NF; j+=2) # loop though remaining fields, 2 at a time
if ($j == b[i]) { # if field matches value in array
printf "%s ",$(j+1)
f = 1 # set "found" flag
}
if (!f) printf "0 " # add 0 if field not found
}
print "" # add newline
}' file
Testing it out
$ ./script.sh ex2 exam
david 12 99
judith 16 90
sam 0 85
song 20 87

formatting text using awk

Hi I have the following text and I need to use awk or sed to print 3 separate columns
11/13/14 101 HUDSON AUBONPAINJERSEY CITY NJ $4.15
11/22/14 MTAMVM*110TH ST/CATNEW YORK NY $19.05
11/22/14 DUANE READE #14226 0NEW YORK NY $1.26
So I like to produce a file containing all the dates. Another file containing all the description and third file containing all the numbers
I can use an awk to print the first column printy $1 and then use -F [$] option to print last column but I'm not able to just print the middle column as there are spaces etc. Can I ignore the spaces? or is there a better way of doing this?
Thaking you in advance
Try doing this :
$ awk '
{
print $1 > "dates"; $1=""
print $NF > "prices"; $NF=""
print $0 > "desc"
}
' file
or :
awk -F' +' '
{
print $1 > "dates"
print $2 > "desc"
print $3 > "prices"
}
' file
Then :
$ cat dates
$ cat desc
$ cat prices
Wasn't fast enough to be the first to give an awk solution, so here's one with grep and sed...
grep -o '^.*/.*/1.' file #first col
sed 's/^.*\/.*\/1.//;s/\$.*//' file #middle col
grep -o '\$.*$' file #last col

unix - breakdown of how many lines with number of character occurrences

Is there an inbuilt command to do this or has anyone had any luck with a script that does it?
I am looking to get counts of how many lines had how many occurrences of a specfic character. (sorted descending by the number of occurrences)
For example, with this sample file:
gkdjpgfdpgdp
fdkj
pgdppp
ppp
gfjkl
Suggested input (for the 'p' character)
bash/perl some_script_name "p" samplefile
Desired output:
occs count
4 1
3 2
0 2
Update:
How would you write a solution that worked off a 2 character string such as 'gd' not a just a specific character such as p?
$ sed 's/[^p]//g' input.txt | awk '{print length}' | sort -nr | uniq -c | awk 'BEGIN{print "occs", "count"}{print $2,$1}' | column -t
occs count
4 1
3 2
0 2
You could give the desired character as the field separator for awk, and do this:
awk -F 'p' '{ print NF-1 }' |
sort -k1nr |
uniq -c |
awk -v OFS="\t" 'BEGIN { print "occs", "count" } { print $2, $1 }'
For your sample data, it produces:
occs count
4 1
3 2
0 2
If you want to count occurrences of multi-character strings, just give the desired string as the separator, e.g., awk -F 'gd' ... or awk -F 'pp' ....
#!/usr/bin/env perl
use strict; use warnings;
my $seq = shift #ARGV;
die unless defined $seq;
my %freq;
while ( my $line = <> ) {
last unless $line =~ /\S/;
my $occurances = () = $line =~ /(\Q$seq\E)/g;
$freq{ $occurances } += 1;
}
for my $occurances ( sort { $b <=> $a} keys %freq ) {
print "$occurances:\t$freq{$occurances}\n";
}
If you want short, you can always use:
#!/usr/bin/env perl
$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>
;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f;
or, perl -e '$x=shift;/\S/&&++$f{$a=()=/(\Q$x\E)/g}while<>;print"$_:\t$f{$_}\n"for sort{$b<=>$a}keys%f' inputfile, but now I am getting silly.
Pure Bash:
declare -a count
while read ; do
cnt=${REPLY//[^p]/} # remove non-p characters
((count[${#cnt}]++)) # use length as array index
done < "$infile"
for idx in ${!count[*]} # iterate over existing indices
do echo -e "$idx ${count[idx]}"
done | sort -nr
Output as desired:
4 1
3 2
0 2
Can to it in one gawk process (well, with a sort coprocess)
gawk -F p -v OFS='\t' '
{ count[NF-1]++ }
END {
print "occs", "count"
coproc = "sort -rn"
for (n in count)
print n, count[n] |& coproc
close(coproc, "to")
while ((coproc |& getline) > 0)
print
close(coproc)
}
'
Shortest solution so far:
perl -nE'say tr/p//' | sort -nr | uniq -c |
awk 'BEGIN{print "occs","count"}{print $2,$1}' |
column -t
For multiple characters, use a regex pattern:
perl -ple'$_ = () = /pg/g' | sort -nr | uniq -c |
awk 'BEGIN{print "occs","count"}{print $2,$1}' |
column -t
This one handles overlapping matches (e.g. it finds 3 "pp" in "pppp" instead of 2):
perl -ple'$_ = () = /(?=pp)/g' | sort -nr | uniq -c |
awk 'BEGIN{print "occs","count"}{print $2,$1}' |
column -t
Original cryptic but short pure-Perl version:
perl -nE'
++$c{ () = /pg/g };
}{
say "occs\tcount";
say "$_\t$c{$_}" for sort { $b <=> $a } keys %c;
'

Resources