regex replace in linux for multiple files - linux

I have 200k files in a single folder in linux server where i need to transform this files using regex
here is the regex to find in text file
([^\s]+?.*)=((.*(?=,$))+|.*).*
now I need to replace it with below substitution value
"$1":"$2",
the above regex is working fine when i used them in python. the server which i am working does not
support python, so i need to use bash commands. i have tried below bash command but it is not working
command:
sed -r 's/([^\s]+?.*)=((.*(?=,$))+|.*).*/"$1":"$2",/g' *20200502*
the above bash command is not working

Fixed your regex:
sed -E 's/([^[:space:]]+?.*)=((.*(=?,$))+|.*).*/"\1":"\2",/g' *20200502*
Replaced inappropriate [^\s] by its POSIX ERE syntax equivalent [^[:space:]].
Fixed the misplaced optional marker ?=,$ with =?,$ instead.
Fixed the invalid capture group reference syntax "$1":"$2" with "\1":"\2".

Kinda difficult to test but just analyzing your approach this might work:
sed -i -E "s/([^\s]+?.*)=((.*(?=,$))+|.*).*/\1$1\2$2/" *20200502*
\1 and \2 in the second part are references to the groups captured in the first part
-E for extended regular expressions (+ and grouping)

Related

how to escape file path in bash script variable

I would like to escape a file path that is stored in a variable in a bash script.
I read several threads about escaping back ticks or but it seems not working as it should:
I have this variable:
The variables value is entered during the bash script execution as user parameter
CONFIG="/home/teams/blabla/blabla.yaml"
I would need to change this to: \/home\/teams\/blabla\/blabla.yaml
How can I do that with in the script via sed or so (not manually)?
With GNU bash and its Parameter Expansion:
echo "${CONFIG//\//\\/}"
Output:
\/home\/teams\/blabla\/blabla.yaml
Using the solution from this question, in your case it will look like this:
CONFIG=$(echo "/home/teams/blabla/blabla.yaml" | sed -e 's/[]\/$*.^[]/\\&/g')
echo "/home/teams/blabla/blabla.yaml" | sed 's/\//\\\//g'
\/home\/teams\/blabla\/blabla.yaml
explanation:
backslash is used to set the following letter/symbol as an regular expression or vice versa. double backslash is used when you need a backslash as letter.
Why does that need escaping? Is this an XY Problem?
If the issue is that you are trying to use that variable in a substitution regex, then the examples given should work, but you might benefit by removing some of the "leaning toothpick syndrom", which many tools can do just by using a different match delimiter. sed, for example:
$: sed "s,SOME_PLACEHOLDER_VALUE,$CONFIG," <<< SOME_PLACEHOLDER_VALUE
/home/teams/blabla/blabla.yaml
Be very careful about this, though. Commas are perfectly valid characters in a filename, as are almost anything but NULLs. Know your data.

convert this linux statement into a statement which is supported by windows command prompt

This is my statement supported by unix environment
"cat document.xml | grep \'<w:t\' | sed \'s/<[^<]*>//g\' | grep -v \'^[[:space:]]*$\'"
But I want to execute that statement in windows command prompt .
How do I do that? and what are the commands which are similar to cat, grep,sed .
please tell me the exact code supported for windows similar to above command
The double quotes around the pipeline in your question are a syntax error, and the backslashed single quotes should apparently really not have backslashes, but I assume it's just an artefact of a slightly imprecise presentation.
Here's what the code does.
cat document.xml |
This is a useless use of cat but its purpose is to feed the contents of this file into the pipeline.
grep '<w:t' |
This looks for lines containing the literal string <w:t (probably the start of a tag in the XML format in the file). The single quotes quote the string so that it is not interpreted by the shell (otherwise the < would be interpreted as a redirection operator); they are consumed by the shell, and not passed through to grep.
sed 's/<[^<]*>//g' |
This replaces every pair of open/close brokets with an empty string. The regular expression [^<]* matches zero or more occurrences of a character which can be anything except <. If the XML is well-formed, these should always occur in pairs, and so we effectively remove all XML tags.
grep -v '^[[:space:]]*$'
This removes any line which is empty or consists entirely of whitespace.
Because sed is a superset of grep, the program could easily be rephrased as a single sed script. Perhaps the easiest solution for your immediate problem would be to obtain a copy of sed for your platform.
sed -e '/<w:t/!d' -e 's/<[^<]*>//g' -e '/[^[:space]]/!d' document.xml
I understand quoting rules on Windows may be different; try with double quotes instead of single, or put the script in a file and use sed -f file document.xml where file contains the script itself, like this:
/<w:t/!d
s/<[^<]*>//g
/[^[:space]]/!d
This is a rather crude way to extract the CDATA from an XML document, anyway; perhaps some XML processor would be the proper way forward. E.g. xmlstarlet appears to be available for Windows. It works even if the XML input doesn't have the beginning and ending <w:t> tags on the same line, with nothing else on it. (In fact, parsing XML with line-oriented tools is a massive antipattern.)
May try with "powershell" ?
It is included since Win8 I think,
for sure on W10 it is.
I've just tested a "cat" command and it works.
"grep" don't but may be adapt like this :
PowerShell equivalent to grep -f
and
https://communary.wordpress.com/2014/11/10/grep-the-powershell-way/
The equivalent of grep on windows would be findstr and the equivalent of cat would be type.

replace unknown line in file linux command

I am trying to change a line with a pattern in a textual file using Linux bash.
I tried the sed command:
sed -i 's/old/new/' < file.txt
The issue with this command line I have to specify the exact "old" word. I want to change thousands of files where the old word has a pattern like this: old1(, old2(,old3(,....old10000(
I would like to change the oldxxx( in all files to old1(
Any ideas how to do this?
You can use something like:
sed -i 's/old[0-9]\{1,\}(/old1(/' file.txt
This matches "old" followed by one or more digits and a "(" and replaces it with "old1(".
If your version of sed supports extended regular expressions, you can use:
sed -r -i 's/old[0-9]+\(/old1(/' file.txt
instead, which does the same thing. On some versions of sed, the -E switch is used instead of -r.
If you have more than one instance of the pattern "oldXX(" on the same line, you may also want to the g modifier (s/.../.../g) to do a global replacement.

Replacing strings with special characters with linux sed

I've read lots of posts to understand how to correctly escape white spaces and special characters inside strings using sed, but still i can't make it, here's what i'm trying to achieve.
I have a file containing the some strings like this one:
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.jsessionid=some_value"
and i'm trying to replace 'some_value' using the following:
sed -i "s/^\(JAVA_OPTS=\"\$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*\$/\1$DORG_APACHE_CATALINA_JSESSIONID/" $JBOSS_CONFIGURATION/jboss.configuration
$JBOSS_CONFIGURATION is a variable containing an absolute Linux path.
jboss.configuration is a file i'm pointing as the target for replace
operations.
$DORG_APACHE_CATALINA_JSESSIONID contains the value i want instead
of 'some_value'.
Please note that the pattern:
JAVA_OPTS="$JAVA_OPTS -D
Is always present, and org.apache.catalina.jsessionid is an example of a variable value i'm trying to replace with this script.
What's missing/wrong ? i tried also escaping whitespaces using \s without success,
and echoing the whole gives me the following:
echo "s/^\(JAVA_OPTS=\"\$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*\$/\1$DORG_APACHE_CATALINA_JSESSIONID/"
s/^\(JAVA_OPTS="$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*$/\1/
is echo interpreting the search pattern as sed does ?
any info/help/alternative ways of doing it are highly welcome,
thank you all
echo 'JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.jsessionid=some_value"' | (export DORG_APACHE_CATALINA_JSESSIONID=FOO/BAR/FOOBAR; sed "s/^\(JAVA_OPTS=\"\$JAVA_OPTS[ \t]*-Dorg\.apache\.catalina\.jsessionid*=\s*\).*\$/\1${DORG_APACHE_CATALINA_JSESSIONID////\/}\"/")
Note the bash expansion (in order to escape any / that may trip up sed) and the extra \" after $DORG_APACHE_CATALINA_JSESSIONID in order to properly close the double quote. Other than that your sed expression works for me and the above command outputs the follwoing result:
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.jsessionid=FOO/BAR/FOOBAR"
You can use sed like this:
sed -r '/\$JAVA_OPTS -D/{s/^(.+=).*$/\1'"$DORG_APACHE_CATALINA_JSESSIONID"'/;}' $JBOSS_CONFIGURATION/jboss.configuration
You can specify a pattern that'll match the desired string rather than trying to specify it exactly.
The following should work for you:
sed -i 's#^\(JAVA_OPTS.*Dorg.apache.catalina.jsessionid\)=\([^"]*\)"#\1='"$DORG_APACHE_CATALINA_JSESSIONID"'"#' $JBOSS_CONFIGURATION/jboss.configuration
sed 's/=\w.*$/='"$DORG_APACHE_CATALINA_JSESSIONID"'/' $JBOSS_CONFIGURATION/jboss.configuration

Replace version number in file with sed in Bash script

In my project.pro file I have:
DEFINES += VERSION=\\\"1.13.1\\\"
I'd like to replace whatever the current version number is, with a new one in a Bash script:
VERSION_MAJOR=1
VERSION_MINOR=14
VERSION_PATCH=1
sed -i "s/\([0-9]+.[0-9]+.[0-9]+\)/\1${VERSION_MAJOR}.${VERSION_MINOR}.${VERSION_PATCH}/" project.pro
Why is that not working?
So far I have managed to get either no matches at all or then some weird replace-only-the-last-number substitutions.
You may use this sed:
sed -i.bak -E "s/[0-9]+\.[0-9]+\.[0-9]+/$VERSION_MAJOR.$VERSION_MINOR.$VERSION_PATCH/" project.pro
Few problems in your attempt:
Without extended regex mode (-E), + cannot be used unescaped.
dot needs to be escaped in a regex
No need to use a capture group and back-reference \1.
PS: .bak is extension of backup file so that you can get original file, in case of a wrong substitution.

Resources