Linux sed expression to convert the camelCase keys to underscore strings - linux

I could not get the regex to convert only the key from a key value pair from camel case to underscore sting.
The expressions like sed -E 's/\B[A-Z]/_\U&/g' converts the full value, but I would like to limit the conversion only to the key here.
$ echo UserPoolId="eu-west-1_6K6Q2bT9c" | sed -E 's/\B[A-Z]/_\U&/g'
User_Pool_Id=eu-west-1_6_K6_Q2b_T9c
but i would like to get User_Pool_Id=eu-west-1_6K6Q2bT9c

With GNU awk for the 3rd arg to match() and gensub():
$ echo 'UserPoolId="eu-west-1_6K6Q2bT9c"' |
awk 'match($0,/([^=]+=)"(.*)"/,a) { $0=gensub(/([[:lower:]])([[:upper:]])/,"\\1_\\2","g",a[1]) a[2]} 1'
User_Pool_Id=eu-west-1_6K6Q2bT9c
I don't know if it's what you'd want for this case or not but anyway:
$ echo 'UserPoolID="eu-west-1_6K6Q2bT9c"' |
awk 'match($0,/([^=]+=)"(.*)"/,a) { $0=gensub(/([[:lower:]])([[:upper:]])/,"\\1_\\2","g",a[1]) a[2]} 1'
User_Pool_ID=eu-west-1_6K6Q2bT9c
Note that ID remains as _ID and isn't converted to _I_D.

If you have only one = sign and you want to modify the camel case before the = sign, with GNU sed you can iterate until all substitutions are done:
echo UserPoolId="eu-west-1_6K6Q2bT9c" | sed -E ':a;s/([a-z])([A-Z].*=.*)/\1_\2/;ta'
User_Pool_Id=eu-west-1_6K6Q2bT9c
:a sets label a, ta branches to label a if the previous s command substituted something. The s command in the loop inserts a _ between a lower case and an upper case before the equal sign.
In your example this will first insert a _ between User and Pool, and then between Pool and Id.

Doing this in sed is somewhat challenging because you need a more complex regex and a more complex script. Perhaps a better solution would be to use the shell's substitution facilities to isolate the part you want to operate on.
string='UserPoolId="eu-west-1_6K6Q2bT9c"'
prefix=${string%%=*}
suffix=${string#"$prefix"}
sed -E -e 's/\B[A-Z]/_\U&/g' -e "s/\$/$suffix/" <<<"$prefix"
Bash also has built-in parameter expansion to convert the first character of a string to upper case, but perhaps this is sufficient to solve your immediate problem.

This might work for you (GNU sed):
sed 's/=/&\n/;h;s/\B[[:upper:]]/_&/g;G;s/\n.*\n//' file
Introduce a newline after the = and copy the result to the hold space.
Insert underscores in the required places.
Append the copy to the current line and remove the middle, leaving the answer.

Related

Line numbering in Grep

I have command in Grep:
cat nastava.html | grep '<td>[A-Z a-z]*</td><td>[0-9/]*</td>' | sed 's/[ \t]*<td>\([A-Z a-z]*\)<\/td><td>\([0-9]\{1,3\}\)\/[0-9]\{2\}\([0-9]\{2\}\)<\/td>.*/\1 mi\3\2 /'
|sort|grep -n ".*" | sed -r 's/(.*):(.*)/\1. \2/' >studenti.txt
I don't understand second line, sort is ok, grep -n means to num that sorted list, but why do we use here ".*"? It won't work without it, and i don't understand why.
The grep is used purely for the side effect of the line numbering with the -n option here, so the main thing is really to use a regular expression which matches all the input lines. As such, .* is not very elegant -- ^ would work without scanning every line, and $ trivially matches every line as well. Since you know the input lines are not empty, thus contain at least one character, the simple regular expression . would work perfectly, too.
However, as the end goal is to perform line numbering, a better solution is to use a dedicated tool for this purpose.
... | sort | nl -ba -s '. '
The -ba option specifies to number all lines (the default is to only add a line number to non-empty lines; we know there are no empty lines, so it's not strictly necessary here, but it's good to know) and the -s option specifies the separator string to put after the number.
A possible minor complication is that the line number format is whitespace-padded, so in the end, this solution may not work for you if you specifically want unpadded numbers. (But a sed postprocessor to fix that up is a lot simpler than the postprocessor for grep you have now -- just sed 's/^ *//' will remove leading whitespace).
... As an aside, the ugly cat | grep | sed pipeline can be abbreviated to just
sed -n 's%[ \t]*<td>\([A-Z a-z]*\)</td><td>\([0-9]\{1,3\}\)/[0-9]\{2\}\([0-9]\{2\}\)</td>.*%\1 mi\3\2 %p' nastava.html
The cat was never necessary in the first place, and the sed script can easily be refactored to only print when a substitution was performed (your grep regular expression was not exactly equivalent to the one you have in the sed script but I assume that was the intent). Also, using a different separator avoids having to backslash the slashes.
... And of course, if nastava.html is your own web page, the whole process is umop apisdn. You should have the students results in a machine-readable form, and generate a web page from that, rather than the other way around.
grep needs a regular expression to match. You can't run grep with no expression at all. If you want to number all the lines, just specify an expression that matches anything. I'd probably use ^ instead of .*.

how to extract the data using sed command

I have the data this
&mac=1E-30-6C-A2-47-5F&ip=172.16.1.127&msk=255.255.255.0&gw=172.16.1.1&pdns=0.0.0.0&sdns=0.0.0.0&Speed=0&PortNo=10001&PerMatFram=0&ComPort=0
I want to extract the data string and store it in a variable using sed commond like
ip=172.16.1.127
mac=xyz
How to use sed with the above string?
I have tried using like this
IP=`echo "$QUERY_STRING" | sed -n '/&ip=/,/&)/g'
but it is not giving any data.
Probably easier to do it in two steps, first trimming the left side, and then the right side.
sed 's/.*mac=//' | sed 's/\&.*//'
This will:
Step 1:
Replace anything up until (and including) "mac=" with nothing
Step 2:
Replace anything after (and including) the first ampersand (&) it encounters with nothing.
If you data do not contain any special characters, you can use the following:
eval $( echo "$QUERY_STRING" | sed 's/&/ /g' )
That would create variables directly from the query string as mac=1E-30-6C-A2-47-5F etc. Be careful, though, as an attacker might request a page with the following query string:
&IFS=.&PWD=/&UID=0
See also How to parse $QUERY_STRING from a bash CGI script.

find words in two quotes unix

I would like to display the last word in these lines I tried to look for example the word value but no answer, so I thought to look for the words between quotes but my file contains other words between quotes that I have I need not actually want to display the values ​​of the select tag knowing that my html file is.
grep '*' hosts.html | awk '{print $NF}'
For example:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
I would have
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
You need to set the field separator to > you do this with the -F option:
$ awk -F'>' '{print $NF}' hosts.html
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
Note: I'm not sure what you are trying to achieve by grep '*' hosts.html?
Interpreting the comment liberally, you have input lines which might contain:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
and you would like the names which are repeated on a line as the output:
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
This can be done using sed and capturing parentheses.
sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p"
The -n says "don't print unless I say to do so". The s///p command prints if the substitute works. The pattern looks for a stream of 'anything' (.*), a single quote, captures what's inside up to the next single quote ('\([^']*\)') followed by any text, the captured text (the first \1), and anything. The replacement text is what was captured (the second \1).
Example:
$ cat data
www and wotnot
value='www.visit-tunisia.com'>www.visit-tunisia.com
blah
value='www.watania1.tn'>www.watania1.tn
hooplah
value='www.watania2.tn'>www.watania2.tn
if 'nothing' is required, nothing will be done.
$ sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p" data
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
nothing
$
Clearly, you can refine the [^']* part of the match if you want to. I used double quotes around the expression since the pattern matches on single quotes. Life is trickier if you need to allow both single and double quotes; at that point, I'd put the script into a file and run sed -f script data to make life easier.
sed 's/.*>\(.*\)/\1/g' your_file

Replace a phrase in a file with a string which contains special Characters

I am using sed -e "s/foo/$bar/" -e "s/some/$text/" file.whatever to replace a phrase in a certain file. The problem is that the $bar string contains multiple special characters like /. So when I try to replace something in a text file using the following code...
#!/bin/bash
bar="http://stackoverflow.com/"
sed -e "s/foo/$bar/" -e "s/some/$text/ file.whatever
...then I get an error saying : sed: unknown option to s is there anything I can do about it?
You can use any delimiter. s#some#SOME# for example. Another good delimiter is vertical-bar. Other chars can work but have special significance for some contexts such as regular expressions.
You can get this difficulty in sed regardless of what delimiters you use, especially if you don't know what the string contains. I'd pick a different method for passing the shell variables into the helper interpreter.
awk -v rep1="$bar" -v rep2="$text" '{sub(/foo/, rep1); sub(/some/, rep2); print}'
or
perl -spe 's/foo/$rep1/; s/some/$rep2/' -- -rep1="$bar" -rep2="$text"
Correctness trumps brevity in this case.
(reference for Perl example)

Escape file name for use in sed substitution

How can I fix this:
abc="a/b/c"; echo porc | sed -r "s/^/$abc/"
sed: -e expression #1, char 7: unknown option to `s'
The substitution of variable $abc is done correctly, but the problem is that $abc contains slashes, which confuse sed. Can I somehow escape these slashes?
Note that sed(1) allows you to use different characters for your s/// delimiters:
$ abc="a/b/c"
$ echo porc | sed -r "s|^|$abc|"
a/b/cporc
$
Of course, if you go this route, you need to make sure that the delimiters you choose aren't used elsewhere in your input.
The GNU manual for sed states that "The / characters may be uniformly replaced by any other single character within any given s command."
Therefore, just use another character instead of /, for example ::
abc="a/b/c"; echo porc | sed -r "s:^:$abc:"
Do not use a character that can be found in your input. We can use : above, since we know that the input (a/b/c/) doesn't contain :.
Be careful of character-escaping.
If using "", Bash will interpret some characters specially, e.g. ` (used for inline execution), ! (used for accessing Bash history), $ (used for accessing variables).
If using '', Bash will take all characters literally, even $.
The two approaches can be combined, depending on whether you need escaping or not, e.g.:
abc="a/b/c"; echo porc | sed 's!^!'"$abc"'!'
You don't have to use / as pattern and replace separator, as others already told you. I'd go with : as it is rather rarely used in paths (it's a separator in PATH environment variable). Stick to one and use shell built-in string replace features to make it bullet-proof, e.g. ${abc//:/\\:} (which means replace all : occurrences with \: in ${abc}) in case of : being the separator.
$ abc="a/b/c"; echo porc | sed -r "s:^:${abc//:/\\:}:"
a/b/cporc
backslash:
abc='a\/b\/c'
space filling....
As for the escaping part of the question I had the same issue and resolved with a double sed that can possibly be optimized.
escaped_abc=$(echo $abc | sed "s/\//\\\AAA\//g" | sed "s/AAA//g")
The triple A is used because otherwise the forward slash following its escaping backslash is never placed in the output, no matter how many backslashes you put in front of it.

Resources