Shell Script get text between 2 special characters - string

I have read a few things out there but can't seem to work out this particular problem. I am writing a shell script. I am reading a file to a variable using
LOCAL_CONFIG=`cat local-config.php`
Which has lines like this
define( 'DB_USER', 'abcxyz' );
define( 'DB_PASSWORD', 'qwerty' );
How can I get the abcxyz and the qwerty parts of this??
Thanks in advance

Using awk
$ awk -F"'" '/^define\(/ {print $4}' local-config.php
abcxyz
qwerty
Explanation:
-F"'"
This defines the field separator as the single quote.
/^define\(/
This selects the lines that start with define(
print $4
For those selected lines, this prints the fourth field.
Using sed
$ sed -rn "/^define\(/ {s/([^']*'){3}//; s/'.*//; p;}" local-config.php
abcxyz
qwerty
-rn
This turns on extended regex syntax and turns off automatic printing.
/^define\(/
This selects the lines that start with define(
{
This starts a group. Commands in this group are executed only for the selected lines.
s/([^']*'){3}//
This removes all text up through and including the third quote.
s/'.*//
This removes all text after the next remaining quote.
p
This prints the line.
}
This ends the group.

Use grep along with -P parameter to enable perl-regexp mode.
$ grep -oP "\bdefine\( *'[^']*' *, *'\K[^']*(?=' *\);)" file
abcxyz
qwerty
\K discards the previously matched characters from printing at the final.

"cut" command will do in a more simpler way...
Command:
cat local-config.php | cut -d "'" -f4
output:
abcxyz
qwerty
Explanation:
Using cut with ' as delimiter we need to take the fourth part(f4) in the lines.

Related

Fetching the value of variable stored in a file

I am trying to fetch the output of a variable stored in a file in another shell script.
Example:
cat abc.log
var1=2
var2=2
var3=25
I am writing a script to fetch the value of var3.
Thank you in advance.
awk -F= '$1 ~ /^[[:space:]]*var3/ { print $2 }' abc.log
Set the field delimiter to = and then where the line contains "var3", print the second field.
Alternatively, you could:
source abc.log
and then:
echo $var3
Using sed you can isolate 25 with particularity with:
sed -n '/^[[:space:]]*var3=/s/^[^=]*=//p' file
Explanation
This is the general substitution form s/find/replace/ with a matching expression preceding it. The total form is /match/s/find/replace/. The option -n suppresses the normal printing of pattern-space and the p at the end tells sed to print the line where the match and substitution took place. Specifically,
/match/ locates a line with any number of preceding whitespace characters followed by var3=. The POSIX [:space:] character class matches any whitespace,
the /find/ is all characters anchored from the '^' beginning that are not the [^=] character and then match the literal '=' character, and finally
the /replace/ is the empty-string leaving the 25 alone which is printed.
Example Use/Output
$ sed -n '/^[[:space:]]*var3=/s/^[^=]*=//p' file
25
A grep one-liner, if your grep has support for Perl-compatible regular expressions (the -P option; not all greps support that)
grep -Po '^\s*var3=\K.*' abc.log
or,
grep -Po '^\s*var3=\K.*' abc.log | tail -n1
in order to get the last value of the var3, if multiple var3s is a possibility.

Extracting key word from a log line

I have a log which got like this :
.....client connection.....remote=/xxx.xxx.xxx.xxx]].......
I need to extract all lines in the log which contain the above,and print just the ip after remote=.. This would be something in the pattern :
grep "client connection" xxx.log | sed -e ....
Using grep:
grep -oP '(?<=remote=/)[^\]]+' file
o is to extract only the pattern, instead of entire line.
P is to match perl like regex. In this case, we are using "negative look behind". It will try to match set of characters which is not "]" which is preceeded by remote=/
grep -oP 'client connection.*remote=/\K.*?(?=])' input
Prints anything between remote=/ and closest ] on the lines which contain client connection.
Or by using sed back referencing: Here the line is divided into three parts/groups which are later referred by \1 \2 or \3. Each group is enclosed by ( and ). Here IP address belongs to 2nd group, so whole line is replaced by 2nd group which is IP address.
sed -r '/client connection/ s_(^.*remote=/)(.*?)]](.*)_\2_g' input
Or using awk :
awk -F'/|]]' '/client connection/{print $2}' input
Try this:
grep 'client connection' test.txt | awk -F'[/\\]]' '{print $2}'
Test case
test.txt
---------
abcd
.....client connection.....remote=/10.20.30.40]].......
abcs
.....client connection.....remote=/11.20.30.40]].......
.....client connection.....remote=/12.20.30.40]].......
Result
10.20.30.40
11.20.30.40
12.20.30.40
Explanation
grep will shortlist the results to only lines matching client connection. awk uses -F flag for delimiter to split text. We ask awk to use / and ] delimiters to split text. In order to use more than one delimiter, we place the delimiters in [ and ]. For example, to split text by = and :, we'd do [=:].
However, in our case, one of the delimiters is ] since my intent is to extract IP specifically from /x.x.x.x] by spitting the text with / and ]. So we escape it ]. The IP is the 2nd item from the splitting.
A more robust way, improved over this answer would be to also use GNU grep in PCRE mode with -P for perl style regEx match, but matching both the patterns as suggested in the question.
grep -oP "client connection.*remote=/\K(\d{1,3}\.){3}\d{1,3}" file
10.20.30.40
11.20.30.40
12.20.30.40
Here, client connection.*remote matches both the patterns in the lines and extracts IP from the file. The \K is a PCRE syntax to ignore strings up to that point and print only the capture group following it.
(\d{1,3}\.){3}\d{1,3}
To match the IP i.e. 3 groups of digits separated by dots of length from 1 to 3 followed by 4th octet.

Use tr to replace single new lines but not multiple new lines

Hi I have a file with data in the following format:
262353824192
Motley Crue Too Fast For Love Vinyl LP Leathur Records LR123 rare 3rd pressing
http://www.ebay.co.uk/itm/Motley-Crue-Too-Fast-Love-Vinyl-LP-Leathur-Records-LR123-rare-3rd-pressing-/262353824192
301870324112
TRAFFIC Same UK 1st press vinyl LP in gatefold / booklet sleeve Island pink eye
http://www.ebay.co.uk/itm/TRAFFIC-Same-UK-1st-press-vinyl-LP-gatefold-booklet-sleeve-Island-pink-eye-/301870324112
141948187203
NOW That's What I Call Music LP'S Joblot 2-14 MINT CONDITION Vinyl
http://www.ebay.co.uk/itm/NOW-Thats-Call-Music-LPS-Joblot-2-14-MINT-CONDITION-Vinyl-/141948187203
I would like replace the single new lines with a pipe, but leave the double new lines as they are. I have tried:
tr '\n' '|' < text.txt
But this replaces all new lines with | so the separate products are no longer on different lines. I basically want a | delimiter between the product number, title and url, but each separate product on a different line. How can I achieve this?
Use tr and a little bit of sed:
tr "\n" "|" < text.txt | sed 's/||\+/\n/g'
You could use awk to do this:
awk ' /^$/ { print; } /./ { printf("%s|", $0); } END {print '\n'}' text.txt
This will find any blank line and just print it as-is. If it fin
ds any value on the line it will use printf and stick a pipe after it. At the end of processing it prints a newline character to finish up.
This has already been partially answered HERE, but not completely.
I would add an additional transform to change double newlines to some character (hash in this case), then replace the hashes with a newline (or two if you want to go back to the original formatting of those) after changing the single newlines to be pipes.
sed -e ':a' -e 'N' -e '$!ba' -e 's/\n\n/#/g' -e 's/\n/|/g' -e 's/#/\n/g'
This gives the output:
262353824192|Motley Crue Too Fast For Love Vinyl LP Leathur Records LR123 rare 3rd pressing|http://www.ebay.co.uk/itm/Motley-Crue-Too-Fast-Love-Vinyl-LP-Leathur-Records-LR123-rare-3rd-pressing-/262353824192
301870324112|TRAFFIC Same UK 1st press vinyl LP in gatefold / booklet sleeve Island pink eye|http://www.ebay.co.uk/itm/TRAFFIC-Same-UK-1st-press-vinyl-LP-gatefold-booklet-sleeve-Island-pink-eye-/301870324112
141948187203|NOW That's What I Call Music LP'S Joblot 2-14 MINT CONDITION Vinyl|http://www.ebay.co.uk/itm/NOW-Thats-Call-Music-LPS-Joblot-2-14-MINT-CONDITION-Vinyl-/141948187203
awk to the rescue!
awk -F'\n' -v RS= -v OFS='|' '{$1=$1;printf "%s", $0 RT}' file
this preserves spacing between paragraphs, 3 lines as in the original file.
I made a very specific solution to your problem with awk (specific because it assumes you always have the same number of new lines between the groups of records).
awk 'BEGIN {RS="\n\n\n"; FS="\n"; OFS="|"} {print $1,$2,$3}' < text.txt
It sets the record separator to 3 newlines, field separator to one newline, and the output field separator to pipe. Then for each record (every block seperated by 3 newlines), it prints the first 3 fields (that are separated by one newline), and on the output it separates them with a pipe
Just use sed:
sergey#x50n:~> cat in.txt | tr '\n' '|' | sed -e 's/||\+/\n\n/g; s/|$/\n/'
262353824192|Motley Crue Too Fast For Love Vinyl LP Leathur Records LR123 rare 3rd pressing|http://www.ebay.co.uk/itm/Motley-Crue-Too-Fast-Love-Vinyl-LP-Leathur-Records-LR123-rare-3rd-pressing-/262353824192
301870324112|TRAFFIC Same UK 1st press vinyl LP in gatefold / booklet sleeve Island pink eye|http://www.ebay.co.uk/itm/TRAFFIC-Same-UK-1st-press-vinyl-LP-gatefold-booklet-sleeve-Island-pink-eye-/301870324112
141948187203|NOW That's What I Call Music LP'S Joblot 2-14 MINT CONDITION Vinyl|http://www.ebay.co.uk/itm/NOW-Thats-Call-Music-LPS-Joblot-2-14-MINT-CONDITION-Vinyl-/141948187203
First we replace all newlines with a pipe using tr as in your example.
Then the first expression in sed command (i.e. s/||\+/\n\n/g;) replaces all occurrences of more than one pipe with two newlines. You also may replace them with one line if you do not want blank lines between the lines of output. And the second expression of sed replaces the trailing pipe with a newline to produce more readable output (or more "conventional" empty line at the end of file).
Also note that \+ in sed regex is a GNU extension. Thus if you are using non-GNU implementation of sed (FreeBSD, AIX or so), use standard syntax: |||* instead of ||\+.

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

find words in two quotes unix

I would like to display the last word in these lines I tried to look for example the word value but no answer, so I thought to look for the words between quotes but my file contains other words between quotes that I have I need not actually want to display the values ​​of the select tag knowing that my html file is.
grep '*' hosts.html | awk '{print $NF}'
For example:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
I would have
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
You need to set the field separator to > you do this with the -F option:
$ awk -F'>' '{print $NF}' hosts.html
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
Note: I'm not sure what you are trying to achieve by grep '*' hosts.html?
Interpreting the comment liberally, you have input lines which might contain:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
and you would like the names which are repeated on a line as the output:
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
This can be done using sed and capturing parentheses.
sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p"
The -n says "don't print unless I say to do so". The s///p command prints if the substitute works. The pattern looks for a stream of 'anything' (.*), a single quote, captures what's inside up to the next single quote ('\([^']*\)') followed by any text, the captured text (the first \1), and anything. The replacement text is what was captured (the second \1).
Example:
$ cat data
www and wotnot
value='www.visit-tunisia.com'>www.visit-tunisia.com
blah
value='www.watania1.tn'>www.watania1.tn
hooplah
value='www.watania2.tn'>www.watania2.tn
if 'nothing' is required, nothing will be done.
$ sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p" data
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
nothing
$
Clearly, you can refine the [^']* part of the match if you want to. I used double quotes around the expression since the pattern matches on single quotes. Life is trickier if you need to allow both single and double quotes; at that point, I'd put the script into a file and run sed -f script data to make life easier.
sed 's/.*>\(.*\)/\1/g' your_file

Resources