Finding repeated names in a file

Finding repeated names in a file - linux

Hi i have a txt file with last name and name of people, now i want do use egrep to only display the names of the people with the same last name. I have no idea how i could do this. Thanks for help
my txt looks like this:
snow john
snow jack
miller george
mcconner jenny
and the output should be:
john
jack
I've currently tried running:
cat names.txt | cut -d " " -f 1 | awk 'seen[$]++'
...but this fails with an error:
awk: syntax error at source line 1
context is
>>> seen[$] <<<
awk: bailing out at source line 1

You can use a typical 2-pass approach with awk:
awk 'NR == FNR {freq[$1]++; next} freq[$1]>1{print $2}' file file
john
jack
Reference: Effective AWK Programming

awk is your friend. With a single pass approach, you could achieve your result using a memory technique where you store last record in variables
Given an input file as follows:
$ cat file
snow john
snow jack
miller tyler
snow leopard
kunis ed
snow jack
snow miller
snow miller
sofo mubu
sofo gubu
...the following shell command uses a single awk pass to generate correct output:
$ awk 'count1[$1]==1 && ++count2[name[$1]]==1{print fn} # replica of next step with prev record values
count1[$1]++ && ++count2[$2]==1{print $2} # our main logic
{name[$1]=$2} # Here,we keep a copy of current record for next passes
' file
john
jack
leopard
miller
mubu
gubu
Note: The final answer includes the suggestion from #ordoshsen mentioned in [ this ] comment. For more on awk, refer [ the manual ].

Related

Command to find how many different name there is in a file

I have a file with with the name, gender and year of birth of a group of kids (+5xxx) and I need to find how many different name there is. Here's a sample of how the file is :
2008 fille Avah
2008 fille Carleigh
2008 fille Kenley
2000 garçon Michael
2000 garçon Joseph
I tried this command (cat prenoms.txt | cut -c 12-30 |uniq |wc -l), but the problem is that the when I cut the first 12 or 13 rows it never gives me only the names because the genders are different size word. Can anyone help ?
Thank you in advance.

Use space as delimiter like below .
$cat sample.txt |cut -d" " -f3 |sort|uniq
Avah
Carleigh
Joseph
Kenley
Michael
or you can use awk
$awk '{print $3}' sample.txt |sort|uniq
Avah
Carleigh
Joseph
Kenley
Michael
Please try and let us know the result . cheers

Using cat and grep to print line and its number but ignore at the same time blank lines

I have created a simple script that prints the contents of a text file using cat command. Now I want to print a line along with its number, but at the same time I need to ignore blank lines. The following format is desired:
1 George Jones Berlin 2564536877
2 Mike Dixon Paris 2794321976
I tried using
cat -n catalog.txt | grep -v '^$' catalog.txt
But I get the following results:
George Jones Berlin 2564536877
Mike Dixon Paris 2794321976
I have managed to get rid of the blank lines, but line's number is not printed. What am I doing wrong?
Here are the contents of catalog.txt:
George Jones Berlin 2564536877
Mike Dixon Paris 2794321976

Your solution doesn't work because cat -n catalog.txt is already giving you non-blank lines.
You can pipe grep's output to cat -n:
grep -v '^$' yourFile | cat -n
Example:
test.txt:
Hello
how
are
you
?
$ grep -v '^$' test | cat -n
1 Hello
2 how
3 are
4 you
5 ?

At first glance, you should drop the file name in the command line to grep to make grep read from stdin:
cat -n catalog.txt | grep -v '^$'
^^^
In your code, you supplied catalog.txt to grep, which made it read from the file and ignore its standard input. So you're basically grepping from the file instead of the output of cat piped to its stdin.
To correctly ignore blank lines the prepend line numbers, switch the order of grep and cat:
grep -v '^$' catalog.txt | cat -n

Another awk
$ awk 'NF{$0=FNR " " $0}NF' 48488182
1 George Jones Berlin 2564536877
3 Mike Dixon Paris 2794321976
The second line was blank in this case.

single, simple, basic awk solution could help you here.
Solution 1st:
awk 'NF{print FNR,$0}' Input_file
Solution 2nd: Above will print line number including the line number of NULL lines, in case you want to leave empty lines line number then following may help you in same.
awk '!NF{FNR--;next} NF{print FNR,$0}' Input_file
Solution 3rd: Using only grep, though output will have a colon in between line number and the line.
grep -v '^$' Input_file | grep -n '.*'
Explanation of Solution 1st:
NF: Checking condition here if NF(Number of fields in current line, it is awk's out of the box variable which has the value of number of fields in a line) is NOT NULL, if this condition is TRUE then following the actions mentioned next to it.
{print FNR,$0}: Using print function of awk here to print FNR(Line number, which will have the line's number in it, it is awk's out of box variable) then print $0 which means current line.
By this we satisfy OP's both the conditions of leaving empty lines and print the line numbers along with lines too. I hope this helps you.

move everything after the 6th backslash up one line with a linux script

http://www.somesite/play/episodes/xyz/fred-episode-110
http://www.somesite/play/episodes/abc/simon-episode-266
http://www.somesite/play/episodes/qwe/mum-episode-39
http://www.somesite/play/episodes/zxc/dad-episode-41
http://www.somesite/play/episodes/asd/bob-episode-57
i have many url's saved in a txt file like show above
i want to move everything after the 6th backslash up one line with a script
the txt after the 6th backslash is the title and always different
i need to select the title so i can play it
so i need it to look like this
fred-episode-110
http://www.somesite/play/episodes/xyz/fred-episode-110
simon-episode-266
http://www.somesite/play/episodes/abc/simon-episode-266
mum-episode-39
http://www.somesite/play/episodes/qwe/mum-episode-39
dad-episode-41
http://www.somesite/play/episodes/zxc/dad-episode-41
bob-episode-57
http://www.somesite/play/episodes/asd/bob-episode-57
i have
sed
awk
wget
can this be done

Use this command:
awk -F/ '{print $7; print $0}'
E.g.:
awk -F/ '{print $7; print $0}' < file.txt > new-file.txt

just to add to this
awk -F/ '{print $7; print $0}' < file.txt > new-file.txt
is there anyway to remove all the hyphens from just the title and leave a space
some of the titles have lots of hyphens
and it makes it a bit hard to read with the hyphens there
change these
simon-episode-2-playing-football-in-the-park
fred-episode-110-the-big-clash-tonight
bob-episode-57
to
simon episode 2 playing football in the park
fred episode 110 the big clash tonight
bob episode 57
thanks for your expertise and time

Linux: Sed command output of field 5 to a file

This image is the /etc/passwd fileLinux: What single sed command can be used to output matches from /etc/passwd that have Smith or Jones in their description (5th field) to a file called smith_jones.txt?

I wouldn't use sed, but it looks like you're referencing a standard /etc/passwd file, so something that may do what you're looking for is this:
cat /etc/passwd | awk -F ":" '{if ($5 ~ /Smith/ || $5 ~ /Jones/) print}'
So awk '{print $5}' is commonly used to print the 5th column of something piped to it, in this case the /etc/passwd file. However, as it's not tabular data, I've supplied -F argument with the delimiter ":" as that's what splits our values.
It's then a fairly easy if statement essentially saying, if this string contains Smith OR Jones in it somewhere, print it.

Insert character in a file with bash

Hello I have a problem in bash.
i have a file and i am trying insert a point in the final line of each line:
cat file | sed s/"\n"/\\n./g > salida.csv
but not works =(.
Because i need count the lines with a word
I need count the lines with the same country
and if i do a grep the grep take colombia and colombias.
And other question how i can count lines with the same country?
for example
1 colombia
2 brazil
3 ecuador
4 colombias
5 colombia
colombia 2
colombias 1
ecuador 1
brazil 1

how about
cut -f2 -d' ' salida.csv | sort | uniq -c

since a sed solution was posted (probably the best tool for this task), I'll contribute an awk
awk '$NF=$NF"."' file > salida.csv
Update:
$ cat input
1 colombia
2 brazil
3 ecuador
4 colombias
5 colombia
$ awk '{a[$2]++}END{for (i in a) print i, a[i]}' input
brazil 1
colombias 1
ecuador 1
colombia 2
...and, please stop updating your question with different questions...

Your command line has a few problems. Some that matter, some that are style choices, but here's my take:
Unnecessary cat. sed can take a filename as an argument.
Your sed command doesn't need the g. Since each line only has one end, there's no reason to tell it to look for more.
Don't look for the newline character, just match the end of line with $.
That leaves you with:
sed s/$/./ file > salida.csv
Edit:
If your real question is "How do I grep for colombia, but not match colombias?", you just need to use the -w flag to match whole words:
grep -w colombia file
If you want to count them, just add -c:
grep -c -w colombia file
Read the grep(1) man page for more information.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Finding repeated names in a file - linux

You can use a typical 2-pass approach with awk: awk 'NR == FNR {freq[$1]++; next} freq[$1]>1{print $2}' file file john jack Reference: Effective AWK Programming

Related

Command to find how many different name there is in a file

Using cat and grep to print line and its number but ignore at the same time blank lines

move everything after the 6th backslash up one line with a linux script

Linux: Sed command output of field 5 to a file

Insert character in a file with bash

Categories

Resources