Substring in linux based on first occurrence [duplicate] - linux

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 5 years ago.
I have a raw unformatted Strings like below in a file.
"],"id":"1785695Jkc","vector":"profile","
"],"id":"jashj24231","vector":"profile","
"],"id":"3201298301","vector":"profile","
"],"id":"1123798749","vector":"profile","
I wanted to extract only the id values like below
1785695Jkc
I tried the below command
grep -o -P '(?<="],"id":").*(?=",")' myfile.txt >new.txt
but that takes the last occurance of the "," like below
1785695Jkc","vector":"profile
but I would need to split on the first occurrence only.

to extract only the id values like above which seem to be alphanumeric strings of length 10, use:
$ awk 'match($0,/[[:alnum:]]{10}/){print substr($0,RSTART,RLENGTH)}' file
1785695Jkc
jashj24231
3201298301
1123798749
If the definition of values like is not correct, please be more specific on the requirement.
Btw, changing your grep a bit works also:
$ grep -o -P '(?<="],"id":")[^"]*'

sed 's/"],"id":"\(.*\)","vector.*/\1/' myfile.txt
that assumes that all lines will start with "],"id":" as your input shows.
Oh, and this is GNU sed btw, your sed may use extended regular expressions, in which case lose the quoting of the brackets.

You can extract just the column you want using cut:
cut -f 2 -d , <filename> | cut -f 2 -d : | tr -d '"'
The first cut will take the id-value pair ("id": "jashj24231") and the second one extracts from that just the value ("jashj24231"). Finally tr removes the enclosing quotes.

Related

How to use sed or awk or something similar to replace every odd occurrence of character? [duplicate]

This question already has answers here:
Replace every n'th occurrence in huge line in a loop
(4 answers)
Closed 4 years ago.
I have the following string:
"1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55".
How to use awk, or sed to get
"1.0,2.0,3.0,4.0,5.0,6.0,13.05,24233.55"?
I tried to use
sed 's/,/./g' <<< "1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55"
1.0.2.0.3.0.4.0.5.0.6.0.13.05.24233.55
and also
sed 's/,/./2' <<< "1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55"
1,0.2,0,3,0,4,0,5,0,6,0,13,05,24233,55
Which replaced the second item only. I need every odd occurrence changed.
For future, what would be the code the replace every odd occurrence of, by . ?
Thanks for your help
With any sed that supports EREs via -E, e.g. GNU sed and OSX/BSD sed:
$ echo "1,0,2,0,3,0,4,0,5,0,6,0,13,05,24233,55" | sed -E 's/,([^,]+(,|$))/.\1/g'
1.0,2.0,3.0,4.0,5.0,6.0,13.05,24233.55
The above was inspired by #PesaThe's comment to my original answer.
try this:
for the end:
sed 's/[,]$/?/' YourFile
putting the , between [] allow you to remove most of the regex behavior taking litteral value (not for some char like ^ that need to be manage another way
putting the $ is telling to refere to end of string
the g in your test mean change every occurence, you only wanted 1 and at the end
for the internal:
sed -e 's/,/./1;p' \
-e ':a' \
-e 's/^\(\([^.]*[.][^,]*,\)*\)\([^,]*\),\([^,]*\)/\1\3.\4/
/[^,]*,[^,.]*,/ ta' YourFile
you need a loop and a special test due to alternance existing

Get text only within parenthesis from a file in linux terminal [duplicate]

This question already has an answer here:
How can I extract the content between two brackets?
(1 answer)
Closed 4 years ago.
I have a large log file I need to sort, I want to extract the text between parentheses. The format is something like this:
<#44541545451865156> (example#6144) has left the server!
How would I go about extracting "example#6144"?
This sed should work here:
sed -E -n 's/.*\((.*)\).*$/\1/p' file_name
There are many ways to skin this cat.
Assuming you always have only one lexeme in parentheses, you can use bash parameter expansion:
while read t; do echo $(t=${t#*(}; echo ${t%)*}); done <logfile
The first substitution: ${t#*(} cuts off everything up and including the left parenthesis, leaving you with example#6144) has left the server!; the second one: ${t%)*} cuts off the right parenthesis and everything after that.
Alternatively, you can also use awk:
awk -F'[)(]' '{print $2}' logfile
-F'[)(]' tells awk to use either parenthesis as the field delimiter, so it splits the input string into three tokens: <#44541545451865156>, example#6144, and has left the server!; then {print $2} instructs it to print the second token.
cut would also do:
cut -d'(' -f 2 logfile | cut -d')' -f 1
Try this:
sed -e 's/^.*(\([^()]*\)).*$/\1/' <logfile
The /^.*(\([^()]*\)).*$/ is a regular expression or regex. Regexes are hard to read until you get used to them, but are most useful for extracting text by pattern, as you are doing here.

how to use sed command properly to replace values containing / delimiter [duplicate]

This question already has answers here:
Using different delimiters in sed commands and range addresses
(3 answers)
Closed 5 years ago.
File: abc.properties
tomcat.home=/opt/tomcat
Set to /usr/local/tomcat. Following cmd is working.
sed -i "/tomcat.home=/ s/=.*/="usr\\/local\\/tomcat"/" abc.properties
Set to $WORKSPACE/tomcat. Following cmd is NOT working since value of the $WORKSPACE is having / delimeters.
sed -i "/tomcat.home=/ s/=.*/="$WORKSPACE\\/tomcat"/" abc.properties
Anyone has an idea how to success above cmd.
Thank you and appreciate your support...
Sed lets you use any character you want as the delimiter. Whatever follows the s is used as the separator:
sed -Ee 's/foo/bar/'
sed -Ee 's|foo|bar|'
sed -Ee 's#foo#bar#'
^- All of those are equivalent.
The other option is to escape all your / as \/, but that gets nightmarish fast. Prefer to just pick a separator character that doesn't collide with characters you're trying to use for something else.

I have a requirement of searching a pattern from a file and displaying the pattern only in the screen,not the whole line .How can I do it in linux? [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 5 years ago.
I have a requirement of searching a pattern like x=<followed by any values> from a file and displaying the pattern i.e x=<followed by any values>, only in the screen, not the whole line. How can I do it in Linux?
I have 3 answers, from simple (but with caveats) to complex (but foolproof):
1) If your pattern never appears more than once per line, you could do this (assuming your shell is
PATTERN="x="
sed "s/.*\($PATTERN\).*/\1/g" your_file | grep "$PATTERN"
2) If your pattern can appear more than once per line, it's a bit harder. One easy but hacky way to do this is to use a special characters that will not appear on any line that has your pattern, eg, "#":
PATTERN="x="
SPECIAL="#"
grep "$PATTERN" your_file | sed "s/$PATTERN/$SPECIAL/g" \
| sed "s/[^$SPECIAL]//g" | sed "s/$SPECIAL/$PATTERN/g"
(This won't separate the output pattern per line, eg. you'll see x=x=x= if a source line had 3 times "x=", this is easy to fix by adding a space in the last sed)
3) Something that always works no matter what:
PATTERN="x="
awk "NF>1{for(i=1;i<NF;i++) printf FS; print \"\"}" \
FS="$PATTERN" your_file

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Resources