Extracting a string from a string in linux - linux

Extract the value for OWNER in the following:
{{USERID 9898}}{{OWNER Wayne, Daniel}}{{EMAIL danielwayne#blah.com}}
To get this string I am using grep on a text file. In all other cases only one value is contained on each line, so they are not an issue.
My problem is removing the text after OWNER but before the }} brackets, leaving me with only the string 'Wayne, Daniel'.
So far I have began looking into writing a for loop to go through the text a character at a time, but I feel there is a more elegant solution then my limited knowledge of unix.

With grep
> cat file
{{USERID 9898}}{{OWNER Wayne, Daniel}}{{EMAIL danielwayne#blah.com}}
> grep -Po '(?<=OWNER )[\w, ]*' file
Wayne, Daniel

Try cat file.txt | perl -n -e'/OWNER ([^\}]+)/ && print $1'

You can use this awk:
awk -F '{{|}}' '{sub(/OWNER +/, "", $4); print $4}' file
Wayne, Daniel

Try this. I use cut
INPUT="{{USERID 9898}}{{OWNER Wayne, Daniel}}{{EMAIL danielwayne#blah.com}}"
SUBSTRING=`echo $INPUT| cut -d' ' -f3`
SUBSTRING2=`echo $INPUT| cut -d',' -f2`
SUBSTRING2=`echo $SUBSTRING2| cut -d'}' -f1`
echo $SUBSTRING$SUBSTRING2
maybe is not the most elegant way but works.

Related

How to strip stdout before logging into file? [duplicate]

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?
You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com
Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.
It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar
Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)
There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.
This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.
If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'
the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d
An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is
It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'
If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev
Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.
choose -1
choose supports negative indexing (the syntax is similar to Python's slices).
I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Bash: Flip strings to the other side of the delimiter

Basically, I have a file formatted like
ABC:123
And I would like to flip the strings around the delimiter, so it would look like this
123:ABC
I would prefer to do this with bash/linux tools.
Thanks for any help!
That's reasonably easy with internal bash commands, assuming two fields, as per the following transcript:
pax:~$ x='abc:123'
pax:~$ echo "${x#*:}:${x%:*}"
123:abc
The first substitution ${x#*:} removes everything from the start up to the colon. The second, ${x%:*}, removes everything from the colon to the end.
Then you just re-join them with the colon in-between.
It doesn't matter for your particular data but % and # use the shortest possible pattern. The %% and ## variants will give you the longest possible pattern (greedy).
As an aside, this is ideal if you doing it for one string at a time since you don't need to kick up an external process to do the work for you. But, if you're processing an entire file, there are better ways to do it, such as with awk:
pax:~$ printf "abc:123\ndef:456\nghi:789\n" | awk -F: '{print $2 FS $1}'
123:abc
456:def
789:ghi
#!/bin/sh -x
var1=$(echo -e 'ABC:123' | cut -d':' -f1)
var2=$(echo -e 'ABC:123' | cut -d':' -f2)
echo -e "${var2}":"${var1}"
I use cut to split the string into two parts, and store both of those parts as variables.
From there, it's possible to use echo to re-arrange the variables as you see fit.
Using sed.
sed -E 's/(.*):(.*)/\2:\1/' file.txt
Using paste and cut with process substitution.
paste -d: <(cut -d : -f2 file.txt) <(cut -d : -f1 file.txt)
A slower/slowest shell solution on large set of data/files.
while IFS=: read -r left rigth; do printf '%s:%s\n' "$rigth" "$left"; done < file.txt

Use grep and cut to filter text file to only display usernames that start with ‘m’ ‘w’ or ‘s’ and their home directories

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
sys:x:3:1:sys:/dev:/usr/sbin/nologin
games:x:5:2:games:/usr/games:/usr/sbin/nologin
mail:x:8:5:mail:/var/mail:/usr/sbin/nologin
www-data:x:33:3:www-data:/var/www:/usr/sbin/nologin
backup:x:34:2:backup:/var/backups:/usr/sbin/nologin
nobody:x:65534:1337:nobody:/nonexistent:/usr/sbin/nologin
syslog:x:101:1000::/home/syslog:/bin/false
whoopsie:x:109:99::/nonexistent:/bin/false
user:x:1000:1000:edco8700,,,,:/home/user:/bin/bash
sshd:x:116:1337::/var/run/sshd:/usr/sbin/nologin
ntp:x:117:99::/home/ntp:/bin/false
mysql:x:118:999:MySQL Server,,,:/nonexistent:/bin/false
vboxadd:x:999:1::/var/run/vboxadd:/bin/false
this is an /etc/passwd file I need to do this command on. So far I have:
cut -d: -f1,6 testPasswd.txt | grep ???
that will display all the usernames and the folder associated, but I'm stuck on how to find only the ones that start with m,w,s and print the whole line.
I've tried grep -o '^[mws]*' and different variations of it, but none have worked.
Any suggestions?
Try variations of
cut -d: -f1,6 testPasswd.txt | grep '^m\|^w\|^s'
Or to put it more concisely,
cut -d: -f1,6 testPasswd.txt | grep '^[mws]'
That's neater especially if you have a lot of patterns to match.
But of course the awk solution is much better if doing it without constraints.
Easier to do with awk:
awk 'BEGIN{FS=OFS=":"} $1 ~ /^[mws]/{print $1, $6}' testPasswd.txt
sys:/dev
mail:/var/mail
www-data:/var/www
syslog:/home/syslog
whoopsie:/nonexistent
sshd:/var/run/sshd
mysql:/nonexistent

using linux cat and grep command

I am having following syntax for one of my file.Could you please anyone explain me what is this command doing
path = /document/values.txt
where we have different username specified e.g username1 = john,username2=marry
cat ${path} | grep -e username1 | cut -d'=' -f2`
my question here is cat command is reading from the file value of username1 but why why we need to use cut command?
Cat is printing the file. The file has username1=something in one of the lines. The cut command splits this and prints out the second argument.
your command was not written well. the cat is useless.
you can do:
grep -e pattern "$path"|cut ...
you can of course do it with single process with awk if you like. anyway the line in your question smells not good.
awk example:
awk -F'=' '/pattern/{print $2}' inputFile
cut -d'=' -f2`
This cut uses -d'=' that means you use '=' as 'field delimiter' and -f2 will take only de second field.
So in this case you want only the value after the "=" .

How to find the last field using 'cut'

Without using sed or awk, only cut, how do I get the last field when the number of fields are unknown or change with every line?
You could try something like this:
echo 'maps.google.com' | rev | cut -d'.' -f 1 | rev
Explanation
rev reverses "maps.google.com" to be moc.elgoog.spam
cut uses dot (ie '.') as the delimiter, and chooses the first field, which is moc
lastly, we reverse it again to get com
Use a parameter expansion. This is much more efficient than any kind of external command, cut (or grep) included.
data=foo,bar,baz,qux
last=${data##*,}
See BashFAQ #100 for an introduction to native string manipulation in bash.
It is not possible using just cut. Here is a way using grep:
grep -o '[^,]*$'
Replace the comma for other delimiters.
Explanation:
-o (--only-matching) only outputs the part of the input that matches the pattern (the default is to print the entire line if it contains a match).
[^,] is a character class that matches any character other than a comma.
* matches the preceding pattern zero or more time, so [^,]* matches zero or more non‑comma characters.
$ matches the end of the string.
Putting this together, the pattern matches zero or more non-comma characters at the end of the string.
When there are multiple possible matches, grep prefers the one that starts earliest. So the entire last field will be matched.
Full example:
If we have a file called data.csv containing
one,two,three
foo,bar
then grep -o '[^,]*$' < data.csv will output
three
bar
Without awk ?...
But it's so simple with awk:
echo 'maps.google.com' | awk -F. '{print $NF}'
AWK is a way more powerful tool to have in your pocket.
-F if for field separator
NF is the number of fields (also stands for the index of the last)
There are multiple ways. You may use this too.
echo "Your string here"| tr ' ' '\n' | tail -n1
> here
Obviously, the blank space input for tr command should be replaced with the delimiter you need.
This is the only solution possible for using nothing but cut:
echo "s.t.r.i.n.g." | cut -d'.' -f2-
[repeat_following_part_forever_or_until_out_of_memory:] | cut -d'.' -f2-
Using this solution, the number of fields can indeed be unknown and vary from time to time. However as line length must not exceed LINE_MAX characters or fields, including the new-line character, then an arbitrary number of fields can never be part as a real condition of this solution.
Yes, a very silly solution but the only one that meets the criterias I think.
If your input string doesn't contain forward slashes then you can use basename and a subshell:
$ basename "$(echo 'maps.google.com' | tr '.' '/')"
This doesn't use sed or awk but it also doesn't use cut either, so I'm not quite sure if it qualifies as an answer to the question as its worded.
This doesn't work well if processing input strings that can contain forward slashes. A workaround for that situation would be to replace forward slash with some other character that you know isn't part of a valid input string. For example, the pipe (|) character is also not allowed in filenames, so this would work:
$ basename "$(echo 'maps.google.com/some/url/things' | tr '/' '|' | tr '.' '/')" | tr '|' '/'
the following implements A friend's suggestion
#!/bin/bash
rcut(){
nu="$( echo $1 | cut -d"$DELIM" -f 2- )"
if [ "$nu" != "$1" ]
then
rcut "$nu"
else
echo "$nu"
fi
}
$ export DELIM=.
$ rcut a.b.c.d
d
An alternative using perl would be:
perl -pe 's/(.*) (.*)$/$2/' file
where you may change \t for whichever the delimiter of file is
It is better to use awk while working with tabular data. You don't have to master on command. If it can be achieved by awk, why not use that? I suggest you do not waste your precious time, and use a handful of commands to get the job done.
Example:
# $NF refers to the last column in awk
ll | awk '{print $NF}'
If you have a file named filelist.txt that is a list paths such as the following:
c:/dir1/dir2/file1.h
c:/dir1/dir2/dir3/file2.h
then you can do this:
rev filelist.txt | cut -d"/" -f1 | rev
Adding an approach to this old question just for the fun of it:
$ cat input.file # file containing input that needs to be processed
a;b;c;d;e
1;2;3;4;5
no delimiter here
124;adsf;15454
foo;bar;is;null;info
$ cat tmp.sh # showing off the script to do the job
#!/bin/bash
delim=';'
while read -r line; do
while [[ "$line" =~ "$delim" ]]; do
line=$(cut -d"$delim" -f 2- <<<"$line")
done
echo "$line"
done < input.file
$ ./tmp.sh # output of above script/processed input file
e
5
no delimiter here
15454
info
Besides bash, only cut is used.
Well, and echo, I guess.
choose -1
choose supports negative indexing (the syntax is similar to Python's slices).
I realized if we just ensure a trailing delimiter exists, it works. So in my case I have comma and whitespace delimiters. I add a space at the end;
$ ans="a, b"
$ ans+=" "; echo ${ans} | tr ',' ' ' | tr -s ' ' | cut -d' ' -f2
b

Resources