How to grep only the content that contains x and y? - linux

I have 2mill lines of content and all lines look like this:
--username:orderID:email:country
I already added a -- prefix to all usernames.
What I need now is to get ONLY the usernames from the file. I think its possible with grep file starting with "--" ending with ":", but I have absolutely no idea.
So output should be:
usernameThank you all for the help.
THIS WORKED:
cut -d: -f1

Even without adding the prefix, you should be able to get the usernames with cut:
cut -d: -f1
-d says what the delimiter is, -f says which field(s) to return.

Try this:
cat YOUR_FILE | sed "s/:/\n/g" | grep "\-\-"

Related

How to find a substring from some text in a file and store it in a bash variable?

I have a file named config.txt which has following data:
ABC_PATH=xxx/xxx
IMAGE=docker.name.net:3000/apache:1.8.109.1
NAMESPACE=xxx
Now I am running a shell script in which I want to store 1.8.109.1 (this value may differ, rest will remain same) in a variable, maybe using sed, awk or any other linux tool.
How can I achieve that?
The following will work.
ver="$(cat config.txt | grep apache: | cut -d: -f3)"
grep apache: will find the line that has the text 'apache:' in it.
-d specifies what delimiters to use. In this case : is set as the delimiter.
-f is used to select the specific field (array index, starting at 1) of the resulting list obtained after delimiting by :
Thus, -f3 selects the 3rd occurence of the delimited list.
The version info is now captured in the variable $ver
I think this should work:
cat config.txt | grep apache: | cut -d: -f3

Use grep and cut to filter text file to only display usernames that start with ‘m’ ‘w’ or ‘s’ and their home directories

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
sys:x:3:1:sys:/dev:/usr/sbin/nologin
games:x:5:2:games:/usr/games:/usr/sbin/nologin
mail:x:8:5:mail:/var/mail:/usr/sbin/nologin
www-data:x:33:3:www-data:/var/www:/usr/sbin/nologin
backup:x:34:2:backup:/var/backups:/usr/sbin/nologin
nobody:x:65534:1337:nobody:/nonexistent:/usr/sbin/nologin
syslog:x:101:1000::/home/syslog:/bin/false
whoopsie:x:109:99::/nonexistent:/bin/false
user:x:1000:1000:edco8700,,,,:/home/user:/bin/bash
sshd:x:116:1337::/var/run/sshd:/usr/sbin/nologin
ntp:x:117:99::/home/ntp:/bin/false
mysql:x:118:999:MySQL Server,,,:/nonexistent:/bin/false
vboxadd:x:999:1::/var/run/vboxadd:/bin/false
this is an /etc/passwd file I need to do this command on. So far I have:
cut -d: -f1,6 testPasswd.txt | grep ???
that will display all the usernames and the folder associated, but I'm stuck on how to find only the ones that start with m,w,s and print the whole line.
I've tried grep -o '^[mws]*' and different variations of it, but none have worked.
Any suggestions?
Try variations of
cut -d: -f1,6 testPasswd.txt | grep '^m\|^w\|^s'
Or to put it more concisely,
cut -d: -f1,6 testPasswd.txt | grep '^[mws]'
That's neater especially if you have a lot of patterns to match.
But of course the awk solution is much better if doing it without constraints.
Easier to do with awk:
awk 'BEGIN{FS=OFS=":"} $1 ~ /^[mws]/{print $1, $6}' testPasswd.txt
sys:/dev
mail:/var/mail
www-data:/var/www
syslog:/home/syslog
whoopsie:/nonexistent
sshd:/var/run/sshd
mysql:/nonexistent

using linux cat and grep command

I am having following syntax for one of my file.Could you please anyone explain me what is this command doing
path = /document/values.txt
where we have different username specified e.g username1 = john,username2=marry
cat ${path} | grep -e username1 | cut -d'=' -f2`
my question here is cat command is reading from the file value of username1 but why why we need to use cut command?
Cat is printing the file. The file has username1=something in one of the lines. The cut command splits this and prints out the second argument.
your command was not written well. the cat is useless.
you can do:
grep -e pattern "$path"|cut ...
you can of course do it with single process with awk if you like. anyway the line in your question smells not good.
awk example:
awk -F'=' '/pattern/{print $2}' inputFile
cut -d'=' -f2`
This cut uses -d'=' that means you use '=' as 'field delimiter' and -f2 will take only de second field.
So in this case you want only the value after the "=" .

Unix cut except last two tokens

I'm trying to parse file names in specific directory. Filenames are of format:
token1_token2_token3_token(N-1)_token(N).sh
I need to cut the tokens using delimiter '_', and need to take string except the last two tokens. In above examlpe output should be token1_token2_token3.
The number of tokens is not fixed. I've tried to do it with -f#- option of cut command, but did not find any solution. Any ideas?
With cut:
$ echo t1_t2_t3_tn1_tn2.sh | rev | cut -d_ -f3- | rev
t1_t2_t3
rev reverses each line.
The 3- in -f3- means from the 3rd field to the end of the line (which is the beginning of the line through the third-to-last field in the unreversed text).
You may use POSIX defined parameter substitution:
$ name="t1_t2_t3_tn1_tn2.sh"
$ name=${name%_*_*}
$ echo $name
t1_t2_t3
It can not be done with cut, However, you can use sed
sed -r 's/(_[^_]+){2}$//g'
Just a different way to write ysth's answer :
echo "t1_t2_t3_tn1_tn2.sh" |rev| cut -d"_" -f1,2 --complement | rev

How to extract distinct part of a string from a file in linux

I'm using the following command to extract distinct urls that contain .com extension and may contain .us or whatever country extension.
grep '\.com' source.txt -m 700 | uniq | sed -e 's/www.//'
> dest.txt
The problem is that, it extracts urls in the same doamin, the thing tht I don't want. Ex:
abc.yahoo.com
efg.yahoo.com
I only need the yahoo.com. How can I using grep or any other command extract distinct domain names only ?
Maybe something like this?
egrep -io '[a-z0-9\-]+\.[a-z]{2,3}(\.[a-z]{2})?' source.txt
Have you tried using awk in instead of sed and specify "." as the delimiter and only print out the two last fields.
awk -F "." '{ print $(NF-1)"."$NF }'
Perhaps something like this should help:
egrep -o '[^.]*.com' file

Resources