Command to find how many different name there is in a file - linux

I have a file with with the name, gender and year of birth of a group of kids (+5xxx) and I need to find how many different name there is. Here's a sample of how the file is :
2008 fille Avah
2008 fille Carleigh
2008 fille Kenley
2000 garçon Michael
2000 garçon Joseph
I tried this command (cat prenoms.txt | cut -c 12-30 |uniq |wc -l), but the problem is that the when I cut the first 12 or 13 rows it never gives me only the names because the genders are different size word. Can anyone help ?
Thank you in advance.

Use space as delimiter like below .
$cat sample.txt |cut -d" " -f3 |sort|uniq
Avah
Carleigh
Joseph
Kenley
Michael
or you can use awk
$awk '{print $3}' sample.txt |sort|uniq
Avah
Carleigh
Joseph
Kenley
Michael
Please try and let us know the result . cheers

Related

Finding repeated names in a file

Hi i have a txt file with last name and name of people, now i want do use egrep to only display the names of the people with the same last name. I have no idea how i could do this. Thanks for help
my txt looks like this:
snow john
snow jack
miller george
mcconner jenny
and the output should be:
john
jack
I've currently tried running:
cat names.txt | cut -d " " -f 1 | awk 'seen[$]++'
...but this fails with an error:
awk: syntax error at source line 1
context is
>>> seen[$] <<<
awk: bailing out at source line 1
You can use a typical 2-pass approach with awk:
awk 'NR == FNR {freq[$1]++; next} freq[$1]>1{print $2}' file file
john
jack
Reference: Effective AWK Programming
awk is your friend. With a single pass approach, you could achieve your result using a memory technique where you store last record in variables
Given an input file as follows:
$ cat file
snow john
snow jack
miller tyler
snow leopard
kunis ed
snow jack
snow miller
snow miller
sofo mubu
sofo gubu
...the following shell command uses a single awk pass to generate correct output:
$ awk 'count1[$1]==1 && ++count2[name[$1]]==1{print fn} # replica of next step with prev record values
count1[$1]++ && ++count2[$2]==1{print $2} # our main logic
{name[$1]=$2} # Here,we keep a copy of current record for next passes
' file
john
jack
leopard
miller
mubu
gubu
Note: The final answer includes the suggestion from #ordoshsen mentioned in [ this ] comment. For more on awk, refer [ the manual ].

How to use cut and paste commands as a single line command without using grep,sed awk, perl?

NOTE: avoid command grep, sed awk, perl
In Unix, I am trying to write a sequence of cut and paste commands (saving result of each command in a file) that inverts every name in the file(below) shortlist and places a coma after the last name(for example, bill johnson becomes johnson, bill).
here is my file shortlist:
2233:charles harris :g.m. :sales :12/12/52: 90000
9876:bill johnson :director :production:03/12/50:130000
5678:robert dylan :d.g.m. :marketing :04/19/43: 85000
2365:john woodcock :director :personnel :05/11/47:120000
5423:barry wood :chairman :admin :08/30/56:160000
I am able to cut from shortlist but not sure how to paste it onto my filenew file in same command line. Here is my code for cut:
cut -d: -f2 shortlist
result:
charles harris
bill johnson
robert dylan
john woodcock
barry wood
Now I want this to be pasted in my filenew file and when I cat filenew, result should look like below,
harris, charles
johnson, bill
dylan, robert
woodcock, john
wood, barry
Please guide me through this. Thank you.
With awk and column:
awk -F'[[:space:]]*|:' '{$2=$2","$3;$3=""}' file | column -t
With cut and paste only (and process substitution <(cmd)):
$ paste -d, <(cut -d: -f2 file | cut -d' ' -f2) <(cut -d: -f2 file | cut -d' ' -f1)
harris,charles
johnson,bill
dylan,robert
woodcock,john
wood,barry
If process substitution is not available in your shell (since it's not defined in POSIX, but is supported in bash, zsh and ksh), you use named pipes, or easier, save the intermediate result to files (first holding first name, last holding last name only):
$ cut -d: -f2 file | cut -d' ' -f1 >first
$ cut -d: -f2 file | cut -d' ' -f2 >last
$ paste -d, last first
If you need to also include a space between the last and first name, you can paste from three sources (middle one being a null source, like /dev/null, or shorter <(:) - null command in a process substitution), and reusing delimiters from a list of two (comma and space):
$ paste -d', ' <(cut -d: -f2 file | cut -d' ' -f2) <(:) <(cut -d: -f2 file | cut -d' ' -f1)
harris, charles
johnson, bill
dylan, robert
woodcock, john
wood, barry

Extracting a string from a string in linux

Extract the value for OWNER in the following:
{{USERID 9898}}{{OWNER Wayne, Daniel}}{{EMAIL danielwayne#blah.com}}
To get this string I am using grep on a text file. In all other cases only one value is contained on each line, so they are not an issue.
My problem is removing the text after OWNER but before the }} brackets, leaving me with only the string 'Wayne, Daniel'.
So far I have began looking into writing a for loop to go through the text a character at a time, but I feel there is a more elegant solution then my limited knowledge of unix.
With grep
> cat file
{{USERID 9898}}{{OWNER Wayne, Daniel}}{{EMAIL danielwayne#blah.com}}
> grep -Po '(?<=OWNER )[\w, ]*' file
Wayne, Daniel
Try cat file.txt | perl -n -e'/OWNER ([^\}]+)/ && print $1'
You can use this awk:
awk -F '{{|}}' '{sub(/OWNER +/, "", $4); print $4}' file
Wayne, Daniel
Try this. I use cut
INPUT="{{USERID 9898}}{{OWNER Wayne, Daniel}}{{EMAIL danielwayne#blah.com}}"
SUBSTRING=`echo $INPUT| cut -d' ' -f3`
SUBSTRING2=`echo $INPUT| cut -d',' -f2`
SUBSTRING2=`echo $SUBSTRING2| cut -d'}' -f1`
echo $SUBSTRING$SUBSTRING2
maybe is not the most elegant way but works.

Comparing files off first x number of characters

I have two text files that both have data that look like this:
Mon-000101,100.27242,9.608597,11.082,10.034,0.39,I,0.39,I,31.1,31.1,,double with 1355,,,,,,,,
Mon-000171,100.2923,9.52286,14.834,14.385,0.45,I,0.45,I,33.7,33.7,,,,,,,,,,
Mon-000174,100.27621,9.563802,11.605,10.134,0.95,I,1.29,I,30.8,30.8,,,,,,,,,,
I want to compare the two files based off of the Mon-000101(as an example of one ID) characters to see where they differ. I tried some diff commands that I found in another question, which didn't work. I'm out of ideas so I'm turning to anybody with more experience than myself.
Thanks.
HazMatt:Desktop m$ diff NGC2264_classI_h7_notes.csv /Users/m/Downloads/allfitclassesI.txt
1c1
Mon-000399,100.25794,9.877631,12.732,12.579,0.94,I,-1.13,I,9.8,9.8,,"10,000dn vs 600dn brighter source at 6 to 12"" Mon-000402,100.27347,9.59Mon-146053,100.23425,9.571719,12.765,11.39,1.11,I,1.04,I,16.8,16.8,,"double 3"" confused with 411, appears brighter",,,,,,,,
\ No newline at end of file
---
Mon-146599 Mon-146599 4.54 I 4.54 III
\ No newline at end of file
This was my attempt and the output. The thing is, is that I know the files differ by eleven lines...corresponding to eleven mismatched values. I don't want to do this by hand (who would?). Maybe I'm misreading the diff output. But I'd expect more than this.
Have you tried :
diff `cat file_1 | grep Mon-00010` `cat file_2 | grep Mon-00010`
First sort both the files and then try using diff
sort file1 > file1_sorted
sort file2 > file2_sorted
diff file1_sorted file2_sorted
Sorting will help arranging both the files as per first ID field, so that you don't get unwanted mismatches.
I am not sure what you are searching, but I'll try to help. Otherwise you could give some examples of input files and desired output.
My input-files are:
prompt> cat in1.txt
Mon-000101,100.27242,9.608597,11.082,10.034,0.39,I,0.39,I,31.1,31.1,,double with 1355,,,,,,,,
Mon-000171,100.2923,9.52286,14.834,14.385,0.45,I,0.45,I,33.7,33.7,,,,,,,,,,
Mon-000174,100.27621,9.563802,11.605,10.134,0.95,I,1.29,I,30.8,30.8,,,,,,,,,
and
prompt> cat in2.txt
Mon-000101,111.27242,9.608597,11.082,10.034,0.39,I,0.39,I,31.1,31.1,,double with 1355,,,,,,,,
Mon-000172,100.2923,9.52286,14.834,14.385,0.45,I,0.45,I,33.7,33.7,,,,,,,,,,
Mon-000174,122.27621,9.563802,11.605,10.134,0.95,I,1.29,I,30.8,30.8,,,,,,,,,,
If you are just interested in the "ID" (whatever that means) you have to seperate it. I assume the ID is the tag before the first comma, so it is possible to cut everything except the ID and compare:
prompt> diff <(cut -d',' -f1 in1.txt) <(cut -d',' -f1 in2.txt)
2c2
< Mon-000171
---
> Mon-000172
If the ID is more complicated you can grep with the use of regular expressions.
Additionally diff -y gives you a little graphical output of which lines are differing. You can use this to merely compare the complete file or use it with the cutting explained before:
prompt> diff -y <(cut -d',' -f1 in1.txt) <(cut -d',' -f1 in2.txt)
Mon-000101 Mon-000101
Mon-000171 | Mon-000172
Mon-000174 Mon-000174

Insert character in a file with bash

Hello I have a problem in bash.
i have a file and i am trying insert a point in the final line of each line:
cat file | sed s/"\n"/\\n./g > salida.csv
but not works =(.
Because i need count the lines with a word
I need count the lines with the same country
and if i do a grep the grep take colombia and colombias.
And other question how i can count lines with the same country?
for example
1 colombia
2 brazil
3 ecuador
4 colombias
5 colombia
colombia 2
colombias 1
ecuador 1
brazil 1
how about
cut -f2 -d' ' salida.csv | sort | uniq -c
since a sed solution was posted (probably the best tool for this task), I'll contribute an awk
awk '$NF=$NF"."' file > salida.csv
Update:
$ cat input
1 colombia
2 brazil
3 ecuador
4 colombias
5 colombia
$ awk '{a[$2]++}END{for (i in a) print i, a[i]}' input
brazil 1
colombias 1
ecuador 1
colombia 2
...and, please stop updating your question with different questions...
Your command line has a few problems. Some that matter, some that are style choices, but here's my take:
Unnecessary cat. sed can take a filename as an argument.
Your sed command doesn't need the g. Since each line only has one end, there's no reason to tell it to look for more.
Don't look for the newline character, just match the end of line with $.
That leaves you with:
sed s/$/./ file > salida.csv
Edit:
If your real question is "How do I grep for colombia, but not match colombias?", you just need to use the -w flag to match whole words:
grep -w colombia file
If you want to count them, just add -c:
grep -c -w colombia file
Read the grep(1) man page for more information.

Resources