Get text only within parenthesis from a file in linux terminal [duplicate] - linux

This question already has an answer here:
How can I extract the content between two brackets?
(1 answer)
Closed 4 years ago.
I have a large log file I need to sort, I want to extract the text between parentheses. The format is something like this:
<#44541545451865156> (example#6144) has left the server!
How would I go about extracting "example#6144"?

This sed should work here:
sed -E -n 's/.*\((.*)\).*$/\1/p' file_name

There are many ways to skin this cat.
Assuming you always have only one lexeme in parentheses, you can use bash parameter expansion:
while read t; do echo $(t=${t#*(}; echo ${t%)*}); done <logfile
The first substitution: ${t#*(} cuts off everything up and including the left parenthesis, leaving you with example#6144) has left the server!; the second one: ${t%)*} cuts off the right parenthesis and everything after that.
Alternatively, you can also use awk:
awk -F'[)(]' '{print $2}' logfile
-F'[)(]' tells awk to use either parenthesis as the field delimiter, so it splits the input string into three tokens: <#44541545451865156>, example#6144, and has left the server!; then {print $2} instructs it to print the second token.
cut would also do:
cut -d'(' -f 2 logfile | cut -d')' -f 1

Try this:
sed -e 's/^.*(\([^()]*\)).*$/\1/' <logfile
The /^.*(\([^()]*\)).*$/ is a regular expression or regex. Regexes are hard to read until you get used to them, but are most useful for extracting text by pattern, as you are doing here.

Related

How to use sed or awk or something similar to replace every odd occurrence of character? [duplicate]

This question already has answers here:
Replace every n'th occurrence in huge line in a loop
(4 answers)
Closed 4 years ago.
I have the following string:
"1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55".
How to use awk, or sed to get
"1.0,2.0,3.0,4.0,5.0,6.0,13.05,24233.55"?
I tried to use
sed 's/,/./g' <<< "1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55"
1.0.2.0.3.0.4.0.5.0.6.0.13.05.24233.55
and also
sed 's/,/./2' <<< "1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55"
1,0.2,0,3,0,4,0,5,0,6,0,13,05,24233,55
Which replaced the second item only. I need every odd occurrence changed.
For future, what would be the code the replace every odd occurrence of, by . ?
Thanks for your help
With any sed that supports EREs via -E, e.g. GNU sed and OSX/BSD sed:
$ echo "1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55" | sed -E 's/,([^,]+(,|$))/.\1/g'
1.0,2.0,3.0,4.0,5.0,6.0,13.05,24233.55
The above was inspired by #PesaThe's comment to my original answer.
try this:
for the end:
sed 's/[,]$/?/' YourFile
putting the , between [] allow you to remove most of the regex behavior taking litteral value (not for some char like ^ that need to be manage another way
putting the $ is telling to refere to end of string
the g in your test mean change every occurence, you only wanted 1 and at the end
for the internal:
sed -e 's/,/./1;p' \
-e ':a' \
-e 's/^\(\([^.]*[.][^,]*,\)*\)\([^,]*\),\([^,]*\)/\1\3.\4/
/[^,]*,[^,.]*,/ ta' YourFile
you need a loop and a special test due to alternance existing

Substring in linux based on first occurrence [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 5 years ago.
I have a raw unformatted Strings like below in a file.
"],"id":"1785695Jkc","vector":"profile","
"],"id":"jashj24231","vector":"profile","
"],"id":"3201298301","vector":"profile","
"],"id":"1123798749","vector":"profile","
I wanted to extract only the id values like below
1785695Jkc
I tried the below command
grep -o -P '(?<="],"id":").*(?=",")' myfile.txt >new.txt
but that takes the last occurance of the "," like below
1785695Jkc","vector":"profile
but I would need to split on the first occurrence only.
to extract only the id values like above which seem to be alphanumeric strings of length 10, use:
$ awk 'match($0,/[[:alnum:]]{10}/){print substr($0,RSTART,RLENGTH)}' file
1785695Jkc
jashj24231
3201298301
1123798749
If the definition of values like is not correct, please be more specific on the requirement.
Btw, changing your grep a bit works also:
$ grep -o -P '(?<="],"id":")[^"]*'
sed 's/"],"id":"\(.*\)","vector.*/\1/' myfile.txt
that assumes that all lines will start with "],"id":" as your input shows.
Oh, and this is GNU sed btw, your sed may use extended regular expressions, in which case lose the quoting of the brackets.
You can extract just the column you want using cut:
cut -f 2 -d , <filename> | cut -f 2 -d : | tr -d '"'
The first cut will take the id-value pair ("id": "jashj24231") and the second one extracts from that just the value ("jashj24231"). Finally tr removes the enclosing quotes.

I have a requirement of searching a pattern from a file and displaying the pattern only in the screen,not the whole line .How can I do it in linux? [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 5 years ago.
I have a requirement of searching a pattern like x=<followed by any values> from a file and displaying the pattern i.e x=<followed by any values>, only in the screen, not the whole line. How can I do it in Linux?
I have 3 answers, from simple (but with caveats) to complex (but foolproof):
1) If your pattern never appears more than once per line, you could do this (assuming your shell is
PATTERN="x="
sed "s/.*\($PATTERN\).*/\1/g" your_file | grep "$PATTERN"
2) If your pattern can appear more than once per line, it's a bit harder. One easy but hacky way to do this is to use a special characters that will not appear on any line that has your pattern, eg, "#":
PATTERN="x="
SPECIAL="#"
grep "$PATTERN" your_file | sed "s/$PATTERN/$SPECIAL/g" \
| sed "s/[^$SPECIAL]//g" | sed "s/$SPECIAL/$PATTERN/g"
(This won't separate the output pattern per line, eg. you'll see x=x=x= if a source line had 3 times "x=", this is easy to fix by adding a space in the last sed)
3) Something that always works no matter what:
PATTERN="x="
awk "NF>1{for(i=1;i<NF;i++) printf FS; print \"\"}" \
FS="$PATTERN" your_file

find words in two quotes unix

I would like to display the last word in these lines I tried to look for example the word value but no answer, so I thought to look for the words between quotes but my file contains other words between quotes that I have I need not actually want to display the values ​​of the select tag knowing that my html file is.
grep '*' hosts.html | awk '{print $NF}'
For example:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
I would have
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
You need to set the field separator to > you do this with the -F option:
$ awk -F'>' '{print $NF}' hosts.html
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
Note: I'm not sure what you are trying to achieve by grep '*' hosts.html?
Interpreting the comment liberally, you have input lines which might contain:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
and you would like the names which are repeated on a line as the output:
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
This can be done using sed and capturing parentheses.
sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p"
The -n says "don't print unless I say to do so". The s///p command prints if the substitute works. The pattern looks for a stream of 'anything' (.*), a single quote, captures what's inside up to the next single quote ('\([^']*\)') followed by any text, the captured text (the first \1), and anything. The replacement text is what was captured (the second \1).
Example:
$ cat data
www and wotnot
value='www.visit-tunisia.com'>www.visit-tunisia.com
blah
value='www.watania1.tn'>www.watania1.tn
hooplah
value='www.watania2.tn'>www.watania2.tn
if 'nothing' is required, nothing will be done.
$ sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p" data
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
nothing
$
Clearly, you can refine the [^']* part of the match if you want to. I used double quotes around the expression since the pattern matches on single quotes. Life is trickier if you need to allow both single and double quotes; at that point, I'd put the script into a file and run sed -f script data to make life easier.
sed 's/.*>\(.*\)/\1/g' your_file

Removing a portion of a string that has forward slashes in it

I'm stumped with how to remove a portion of a string that has forward slashes and question marks in it.
Example: /diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN
and I need the output to be RXMWANT8WFYJNF7K6DXXXJLJVN
I've tried tr and sed but tr removes some of the characters I need in the output. sed is giving me trouble because of the forward slashes.
What's a quick method to remove the /diag/PeerManager/list?deviceid= portion of my string?
thanks!
echo "/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN" | sed -n 's:/[a-zA-Z]/[a-zA-Z]/[a-zA-Z]?[a-zA-Z]=::p'
This should do the trick. I chose the colon as the delimiter as it will not cause any issues with the forward slash. This makes a lot of assumptions about the type of input it will be receiving, specifically that it will only contain three backslashes with lower and uppercase letters between them, a series of letters ending in a question mark, another series of letters ending in an equals sign. This then removes those items and prints the remaining characters (your device id).
This worked for me:
sed 's/.*deviceid=\([^&]*\).*/\1/'
Example:
$ echo '/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN' | sed 's/.*deviceid=\([^&]*\).*/\1/'
RXMWANT8WFYJNF7K6DXXXJLJVN
This is not the most robust solution, but if you have a fixed set of input that will never change, it's probably good enough.
One way using awk, if there is only a single occurrence of an = on each line:
awk -F= '{ print $2 }' file.txt
Results:
RXMWANT8WFYJNF7K6DXXXJLJVN
Use Equals Sign as Field Delimiter
If you know that your GET query string will always have only one parameter (in this case, deviceid) then you can just use the equals sign as a field delimiter with the standard cut utility. For example:
$ echo '/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN' |
cut -d= -f2-
RXMWANT8WFYJNF7K6DXXXJLJVN
How about:
$ echo /diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN | sed 's/^.*=//'
RXMWANT8WFYJNF7K6DXXXJLJVN

Resources