How to only grep one of each address. Linux

How to only grep one of each address. Linux - linux

Okay so lets say I have a list of addresses in a text file like this:
https://www.amazon.com
https://www.google.com
https://www.msn.com
https://www.google.com
https://www.netflix.com
https://www.amazon.com
...
There is a whole bunch of other stuff there but basically the issue I am having is that after running this:
grep "https://" addresses.txt | cut -d"/" -f3
I get amazon.com and google.com twice. I want to only get them once. I don't know how to make the search only grep for things that are unique.

Pipe your output to sort and uniq:
grep "https://" addresses.txt | cut -d"/" -f3 | sort | uniq

you can use sort for this purpose.
just add another pipe to your command and use the unique feature of sort to remove duplicates.
grep 'https://' addresses.txt | cut -d"/" -f3 | sort -u
EDIT: you can use sed instead of grep and cut which would reduce your command to
sed -n 's#https://\([^/]*\).*#\1#p' < addresses.txt | sort -u

I would filter the results post-grep.
e.g. using sort -u to sort and then produce a set of unique entries.
You can also use uniq for this, but the input has to be sorted in advance.
This is the beauty of being able to pipe these utilities together. Rather than have a single grepping/sorting/uniq(ing) tool, you get the distinct executables, and you can chain them together how you wish.

grep "https://" addresses.txt | cut -d"/" -f3 | sort | uniq is what you want

with awk you can use only one unix command instead of four with 3 pipes:
awk 'BEGIN {FS="://"}; { myfilter = match($1,/https/); if (myfilter) loggeddomains[$2]=0} END {for (mydomains in loggeddomains) {print mydomains}}' addresses.txt

Related

Use grep and cut to filter text file to only display usernames that start with ‘m’ ‘w’ or ‘s’ and their home directories

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
sys:x:3:1:sys:/dev:/usr/sbin/nologin
games:x:5:2:games:/usr/games:/usr/sbin/nologin
mail:x:8:5:mail:/var/mail:/usr/sbin/nologin
www-data:x:33:3:www-data:/var/www:/usr/sbin/nologin
backup:x:34:2:backup:/var/backups:/usr/sbin/nologin
nobody:x:65534:1337:nobody:/nonexistent:/usr/sbin/nologin
syslog:x:101:1000::/home/syslog:/bin/false
whoopsie:x:109:99::/nonexistent:/bin/false
user:x:1000:1000:edco8700,,,,:/home/user:/bin/bash
sshd:x:116:1337::/var/run/sshd:/usr/sbin/nologin
ntp:x:117:99::/home/ntp:/bin/false
mysql:x:118:999:MySQL Server,,,:/nonexistent:/bin/false
vboxadd:x:999:1::/var/run/vboxadd:/bin/false
this is an /etc/passwd file I need to do this command on. So far I have:
cut -d: -f1,6 testPasswd.txt | grep ???
that will display all the usernames and the folder associated, but I'm stuck on how to find only the ones that start with m,w,s and print the whole line.
I've tried grep -o '^[mws]*' and different variations of it, but none have worked.
Any suggestions?

Try variations of
cut -d: -f1,6 testPasswd.txt | grep '^m\|^w\|^s'
Or to put it more concisely,
cut -d: -f1,6 testPasswd.txt | grep '^[mws]'
That's neater especially if you have a lot of patterns to match.
But of course the awk solution is much better if doing it without constraints.

Easier to do with awk:
awk 'BEGIN{FS=OFS=":"} $1 ~ /^[mws]/{print $1, $6}' testPasswd.txt
sys:/dev
mail:/var/mail
www-data:/var/www
syslog:/home/syslog
whoopsie:/nonexistent
sshd:/var/run/sshd
mysql:/nonexistent

Find duplicate entries in a text file using shell

I am trying to find duplicate *.sh entry mention in a text file(test.log) and delete it, using shell program. Since the path is different so uniq -u always print duplicate entry even though there are two first_prog.sh entry in a text file
cat test.log
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/first_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
output:
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
I tried couple of way using few command but dont have idea on how to get above output.
rev test.log | cut -f1 -d/ | rev | sort | uniq -d
Any clue on this?

You can use awk for this by splitting fields on / and using $NF (last field) in an associative array:
awk -F/ '!seen[$NF]++' test.log
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

awk shines for these kind of tasks but here in a non awk solution,
$ sed 's|.*/|& |' file | sort -k2 -u | sed 's|/ |/|'
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh
or, if your path is balanced (the same number of parents for all files)
$ sort -t/ -k5 -u file
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

awk '!/my_shellprog\/test\/first/' file
/mnt/abc/shellprog/test/first_prog.sh
/mnt/abc/shellprog/test/second_prog.sh
/mnt/abc/my_shellprog/test/third_prog.sh

how to use sort, cut, and unique commands in pipe

I was wondering how do you use the cut, sort, and uniq commands in a pipeline and give a command line that indicates how many users are using each of the shells mentioned in /etc/passwd?
i'm not sure if this is right but
cut -f1 -d':' /etc/passwd | sort -n | uniq
?

Summarizing the answers excruciatingly hidden in comments:
You were close, only
as tripleee noticed, the shell is in the seventh field
as shellter noticed, since the shells are not numbers, -n is useless
as shellter noticed, for the counting, there's uniq -c
That gives
cut -f7 -d: /etc/passwd | sort | uniq -c

How to extract version from a single command line in linux?

I have a product which has a command called db2level whose output is given below
I need to extract 8.1.1.64 out of it, so far i came up with,
db2level | grep "DB2 v" | awk '{print$5}'
which gave me an output v8.1.1.64",
Please help me to fetch 8.1.1.64. Thanks

grep is enough to do that:
db2level| grep -oP '(?<="DB2 v)[\d.]+(?=", )'

Just with awk:
db2level | awk -F '"' '$2 ~ /^DB2 v/ {print substr($2,6)}'

db2level | grep "DB2 v" | awk '{print$5}' | sed 's/[^0-9\.]//g'
remove all but numbers and dot

sed is your friend for general extraction tasks:
db2level | sed -n -e 's/.*tokens are "DB2 v\([0-9.]*\)".*/\1/p'
The sed line does print no lines (the -n) but those where a replacement with the given regexp can happen. The .* at the beginning and the end of the line ensure that the whole line is matched.

Try grep with -o option:
db2level | grep -E -o "[0-9]+\.[0-9]+\.[0-9]\+[0-9]+"

Another sed solution
db2level | sed -n -e '/v[0-9]/{s/.*DB2 v//;s/".*//;p}'
This one desn't rely on the number being in a particular format, just in a particular place in the output.

db2level | grep -o "v[0-9.]*" | tr -d v

Try s.th. like db2level | grep "DB2 v" | cut -d'"' -f2 | cut -d'v' -f2
cut splits the input in parts, seperated by delimiter -d and outputs field number -f

under mac terminal: List all of the users whom has at least one running process?

how to list all of the users whom has at least one running process.
The user name should not be duplicated.
The user name should be sorted.

$ ps xau | cut -f1 -d " "| sort | uniq | tail -n +2
You may want to weed out names starting with _ as well like so :
ps xau | cut -f1 -d " "| sort | uniq | grep -v ^_ | tail -n +2

users does what is requested. From the man page:
users lists the login names of the users currently on the system, in
sorted order, space separated, on a single line.

Try this:
w -h | cut -d' ' -f1 | sort | uniq
The w -h displays all users in system, without header and some output. The cut part removes all other information without username. uniq ignores duplicate lines.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to only grep one of each address. Linux - linux

Pipe your output to sort and uniq: grep "https://" addresses.txt | cut -d"/" -f3 | sort | uniq

grep "https://" addresses.txt | cut -d"/" -f3 | sort | uniq is what you want

with awk you can use only one unix command instead of four with 3 pipes: awk 'BEGIN {FS="://"}; { myfilter = match($1,/https/); if (myfilter) loggeddomains[$2]=0} END {for (mydomains in loggeddomains) {print mydomains}}' addresses.txt

Related

Use grep and cut to filter text file to only display usernames that start with ‘m’ ‘w’ or ‘s’ and their home directories

Find duplicate entries in a text file using shell

how to use sort, cut, and unique commands in pipe

How to extract version from a single command line in linux?

under mac terminal: List all of the users whom has at least one running process?

Categories

Resources