How to take a text between "/" with Awk / cut? [duplicate] - linux

This question already has answers here:
shell script to extract text from a variable separated by forward slashes
(3 answers)
Closed 4 years ago.
I have this command in a script:
find /home/* -type d -name dev-env 2>&1 | grep -v 'Permiso' >&2 > findPath.txt
this gives me this back:
/home/user/project/dev-env
I need to take the second parameter between "/" (user) to save it later in a variable. I can not find the way to just pick up the "user" text.

Using cut:
echo "/home/user/project/dev-env" | cut -d'/' -f3
Result:
user
This tells cut to use / as the delimiter and return the 3rd field. (The 1st field is blank/empty, the 2nd field is home.)
Using awk:
echo "/home/user/project/dev-env" | awk -F/ '{print $3}'
This tells awk to use / as the field-seperator and print the 3rd field.

Assuming that the path resulting from the grep is always an absolute path:
second_component=$(find .... -type d -name dev-env 2>&1 | grep -v 'Permiso' | cut -d / -f 3)
However, your approach suffers from several other problems:
You use /home/* as starting point for find. This will work only, if there is exactly one subdirectory below /home. Not a very likely scenario.
Even then, it works only if grep results in exactly one line. This is a semantic problem: What if you get more than one line - which one are you interested in? Assume that you know that you are always interested into the first line, you can solve this by piping the result though head -n 1.
Next, you redirect the stderr from find to stdout, which means that any error from find remains unnoticed; you just get some weird result. It would be better to have any error message from find being displayed, and instead evaluate the exit code from find and grep.

... | cut -d/ -f3
"Third field, as cut by slash delimiter"

Related

Preserve '\n' newline in returned text over ssh

If I execute a find command, with grep and sort etc. in the local command line, I get returned lines like so:
# find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g
0:0:line:1
0:0:line:2
0:0:line:3
If I execute the same command over ssh, the returned text prints without newlines, like so:
# VARcmdChk="$(ssh ${VARuser}#${VARserver} "find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g")"
# echo ${VARcmdChk}
0:0:line:1 0:0:line:2 0:0:line:3
I'm trying to understand why ssh is sanitising the returned text, so that newlines are converted to spaces. I have not yet tried output'ing to file, and then using scp to pull that back. Seems a waste, since I just want to view the remote results locally.
When you echo the variable VARcmdChk, you should enclose it with ".
$ VARcmdChk=$(ssh ${VARuser}#${VARserver} "find tmp/ -iname status -exec grep 'last seen' {} \; | sort --field-separator=: -k 4 -g")
$ echo "${VARcmdChk}"
last seen:11:22:33:44:55:66:77:88:99:00
last seen:00:99:88:77:66:55:44:33:22:11
Note that I've replaced your xargs for -exec.
Ok, the question is a duplicate of this one, Why does shell Command Substitution gobble up a trailing newline char?, so partly answered.
However, I say partly, as the answers tell you the reasons for this happening as such, but the only clue to a solution is a small answer right at the end.
The solution is to quote the echo argument, as the solution suggests:
# VARcmdChk="$(ssh ${VARuser}#${VARserver} "find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g")"
# echo "${VARcmdChk}"
0:0:line:1
0:0:line:2
0:0:line:3
but there is no explanation as to why this works as such, since assumption is that the variable is a string, so should print as expected. However, reading Expansion of variable inside single quotes in a command in Bash provides the clue regarding preserving newlines etc. in a string. Placing the variable to be printed by echo into quotes preserves the contents absolutely, and you get the expected output.
The echo of the variable is why its putting it all into one line. Running the following command will output the results as expected:
ssh ${VARuser}#${VARserver} "find ~/logs/ -iname 'status' | xargs grep 'last seen' | sort --field-separator=: -k 4 -g"
To get the command output to have each result on a new line, like it does when you run the command locally you can use awk to split the results onto a new line.
awk '{print $1"\n"$2}'
This method can be appended to your command like this:
echo ${VARcmdChk} | awk '{print $1"\n"$2"\n"$3"\n"$4}'
Alternatively, you can put quotes around the variable as per your answer:
echo "${VARcmdChk}"

Use grep and get text after the pattern [duplicate]

This question already has answers here:
How to grep for contents after pattern?
(8 answers)
Closed 4 years ago.
I need to get the IP from a log I need to grep the true-client and after that I need to grep true-client-ip=[191.168.171.15] and get just the IP
2019.02.14-08:26:06:713,asd:1234:chan,0.000,asd,S,request-begin-site,POST,{remoteHost=1.2.3.4,remoteAddr=1.2.3.4,requestType=POST,serverName=api=[text/html],accept-charset=[iso-12345-15, utf-8;q=0.5, *;q=0.5],accept-encoding=[gzip],server-origin=[5],cache-control=[no-cache, max-age=0],pragma=[no-cache],program-header=[true],te=[chunked;q=1.0],true-client-ip=[191.168.171.15],true-host=[www.server.com]
I was trying grep -o "true-client-ip=[^ ]*," but it brings me:
true-client-ip=[191.168.171.15],true-host=[www.server.com]
I need just true-client-ip=[191.168.171.15] so I can cut after to bring get the IP like true-client-ip=[191.168.171.15] | cut -d= -f2
Using grep -P flag if available :
grep -oP 'true-client-ip=\[\K[^]]*'
Perl's \K meta-character discards what precedes when displaying the result, so it will match the "true-client-ip=[" part but only display the IP.
If grep -P isn't available, I would use sed :
sed -nE 's/.*true-client-ip=\[([^]]*).*/\1/p'
If you have GNU grep, you can do it like this:
$ grep -oP "(?<=true-client-ip=\[)[^\]]*" file
191.168.171.15
The (?<=) is called Positive Lookbehind, which you can find related doc here.
The backslash \ in [^\]] is actually unnecessary, I just feel like to add it to make it more intuitive, less misleading-prone :-) .

Recursively grep unique pattern in different files

Sorry title is not very clear.
So let's say I'm grepping recursively for urls like this:
grep -ERo '(http|https)://[^/"]+' /folder
and in folder there are several files containing the same url. My goal is to output only once this url. I tried to pipe the grep to | uniq or sort -u but that doesn't help
example result:
/www/tmpl/button.tpl.php:http://www.w3.org
/www/tmpl/header.tpl.php:http://www.w3.org
/www/tmpl/main.tpl.php:http://www.w3.org
/www/tmpl/master.tpl.php:http://www.w3.org
/www/tmpl/progress.tpl.php:http://www.w3.org
If you only want the address and never the file where it was found in, there is a grep option -h to suppress file output; the list can then be piped to sort -u to make sure every address appears only once:
$ grep -hERo 'https?://[^/"]+' folder/ | sort -u
http://www.w3.org
If you don't want the https?:// part, you can use Perl regular expressions (-P instead of -E) with variable length look-behind (\K):
$ grep -hPRo 'https?://\K[^/"]+' folder/ | sort -u
www.w3.org
If the structure of the output is always:
/some/path/to/file.php:http://www.someurl.org
you can use the command cut :
cut -d ':' -f 2- should work. Basically, it cuts each line into fields separated by a delimiter (here ":") and you select the 2nd and following fields (-f 2-)
After that, you can use uniq to filter.
Pipe to Awk:
grep -ERo 'https?://[^/"]+' /folder |
awk -F: '!a[substr($0,length($1))]++'
The basic Awk idiom !a[key]++ is true the first time we see key, and forever false after that. Extracting the URL (or a reasonable approximation) into the key requires a bit of additional trickery.
This prints the whole input line if the key is one we have not seen before, i.e. it will print the file name and the URL for the first occurrence of each URL from the grep output.
Doing the whole thing in Awk should not be too hard, either.

extract date from a file name in unix using shell scripting

I am working on shell script. I want to extract date from a file name.
The file name is: abcd_2014-05-20.tar.gz
I want to extract date from it: 2014-05-20
echo abcd_2014-05-20.tar.gz |grep -Eo '[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}'
Output:
2014-05-20
grep got input as echo stdin or you can also use cat command if you have these strings in a file.
-E Interpret PATTERN as an extended regular expression.
-o Show only the part of a matching line that matches PATTERN.
[[:digit:]] It will fetch digit only from input.
{N} It will check N number of digits in given string, i.e.: 4 for years 2 for months and days
Most importantly it will fetch without using any separators like "_" and "." and this is why It's most flexible solution.
Using awk with custom field separator, it is quite simple:
echo 'abcd_2014-05-20.tar.gz' | awk -F '[_.]' '{print $2}'
2014-05-20
Use grep:
$ ls -1 abcd_2014-05-20.tar.gz | grep -oP '[\d]+-[\d]+-[\d]+'
2014-05-20
-o causes grep to print only the matching part
-P interprets the pattern as perl regex
[\d]+-[\d]+-[\d]+: stands for one or more digits followed by a dash (3 times) that matches your date.
Here few more examples,
Using cut command (cut gives more readability like awk command)
echo "abcd_2014-05-20.tar.gz" | cut -d "_" -f2 | cut -d "." -f1
Output is:
2014-05-20
using grep commnad
echo "abcd_2014-05-20.tar.gz" | grep -Eo "[0-9]{4}\-[0-9]{2}\-[0-9]{2}"
Output is:
2014-05-20
An another advantage of using grep command format is that, it will also help to fetch multiple dates like this:
echo "ab2014-15-12_cd_2014-05-20.tar.gz" | grep -Eo "[0-9]{4}\-[0-9]{2}\-[0-9]{2}"
Output is:
2014-15-12
2014-05-20
I will use some kind of regular expression with the "grep" command, depending on how your file name is created.
If your date is always after "_" char I will use something like this.
ls -l | grep ‘_[REGEXP]’
Where REGEXP is your regular expression according to your date format.
Take a look here http://www.linuxnix.com/2011/07/regular-expressions-linux-i.html
Multiple ways you could do it:
echo abcd_2014-05-20.tar.gz | sed -n 's/.*_\(.*\).tar.gz/\1/p'
sed will extract the date and will print it.
Another way:
filename=abcd_2014-05-20.tar.gz
temp=${filename#*_}
date=${temp%.tar.gz}
Here temp will hold string in file name post "_" i.e. 2014-05-20.tar.gz
Then you can extract date by removing .tar.gz from the end.

Extract text between two given different delimiters in a given text in bash [duplicate]

This question already has answers here:
Print text between delimiters using sed
(2 answers)
Closed 2 years ago.
I have a line of text which looks like hh^ay-pau+h#ow, I want to extract the text between - and + which in this case is pau. This should be done in bash. Any help would be appreciated.
EDIT: I want to extract the text between the first occurence of the tokens
PS: My google search didn't take me anywhere. I apologize if this question is already asked.
The way to do this in pure bash, is by using parameter expansions in bash
$ a=hh^ay-pau+h#ow
$ b=${a%%+*}
$ c=${b#*-}
$ echo $c
pau
b: remove everything including and behind the first + occurence
c: remove everything excluding and before the first - ocurrence
More info about substring removing in bash parameter expansion
Try
grep -Po "(?<=\-).*?(?=\+)"
For example,
echo "hh^ay-pau+h#ow" | grep -Po "(?<=\-).*?(?=\+)"
If you have only one occurence of - and + you can use cut:
$ echo "hh^ay-pau+h#ow" | cut -d "-" -f 2 | cut -d "+" -f 1
pau
Assuming one occurence of + and -, you can stick to bash
IFS=+- read -r _ x _ <<<'hh^ay-pau+h#ow'
echo $x
pau
If you're guarenteed to only have one - and one + .
% echo "hh^ay-pau+h#ow" | sed -e 's/.*-//' -e 's/+.*//'
pau
echo "hh^ay-pau+h#ow" | awk -F'-' '{print $2}' |awk -F'+' '{print $1}'

Resources