Replace control characters and spaces with escape sequences

Replace control characters and spaces with escape sequences - linux

I want to replace control characters (ASCII 0-31) and spaces (ASCII 32) with hex escape codes. For example:
$ escape 'label=My Disc'
label=My\x20Disc
$ escape $'multi\nline\ttabbed string'
multi\x0Aline\x09tabbed\x20string
$ escape '\'
\\
For context, I'm writing a script which statuses a DVD drive. Its output is designed to be parsed by another program. My idea is to print each piece of info as a separate space-separated word. For example:
$ ./discStatus --monitor
/dev/dvd: no-disc
/dev/dvd: disc blank writable size=0 capacity=2015385600
/dev/dvd: disc not-blank not-writable size=2015385600 capacity=2015385600
I want to add the disc's label to this output. To fit with the parsing scheme I need to escape spaces and newlines. I might as well do all the other control characters as well.
I'd prefer to stick to bash, sed, awk, tr, etc., if possible. I can't think of a really elegant way to do this with those tools, though. I'm willing to use perl or python if there's no good solution with basic shell constructs and tools.

Here's a Perl one-liner I came up with. It uses /e to run code in the replacements.
perl -pe 's/([\x00-\x20\\])/sprintf("\\x%02X", ord($1))/eg'
A slight deviation from the example in my question: it emits \x5C for backslashes instead of \\.

I would use a higher-level language. There are three different types of replacement going on (single character to multicharacter for the control characters and space, identity for other printable characters, and the special case of doubling the backslash), which I think is too much for awk, sed, and the like to handle simply.
Here's my approach for Python
def translate(c):
cp = ord(c)
if cp in range(33):
return '\\x%02x'%(cp,)
elif c == '\\':
return r'\\'
else:
return c
if __name__ == '__main__':
import sys
print ''.join( map(translate, sys.argv[1]) )
If speed is a concern, you can replace the translate function with a prebuilt dictionary mapping each character to its desired string representation.

Wow, it looks like a fairly trivial sed script along the lines of
's|\n|\\n|' for each character you want to substitute.

Related

Replace each column with different spacing using sed

I am trying to replace a different pattern for each column of my input file.
Input file
this- START
this- START
Result I want
/this/ -START-
/this/ -START-
My code
sed 's|^\([a-zA-Z]*\)-\s\([a-zA-Z]*\)$|/\1/ -\2-|' inputfile
Output
/this/ -START-
this- START
The first input works but the 2nd input with a huge amount of spaces does not. How can I deal with both of them using the same line of code?

sed uses POSIX Basic Regular Expressions, which are, like the name suggests, very basic, without a lot of the syntactical sugar or features of other RE packages you might be more used to. But they can still handle this:
$ cat input.txt
this- START
this- START
$ sed 's!^\([a-zA-Z]*\)-[[:space:]]\{1,\}\([a-zA-Z]*\)$!/\1/ -\2-!' input.txt
/this/ -START-
/this/ -START-
The key here is in the [[:space:]]\{1,\} portion: [:space:] inside a []character class matches any whitespace character, like \s in other RE implementations, and \{1,\} matches 1 or more of the preceeding atom, like + in pretty much every other flavor (Which also support this notation, though without needing the backslashes). So combined it matches 1 or more whitespace characters. And since regular expressions are greedy, it matches the longest sequence of whitespace characters instead of stopping after seeing just one.
If you only have spaces, not spaces and/or tabs between columns, it can be simplified to \{1,\} (Note the leading literal space; it's not obvious in rendered markdown). And you can use [[:alpha:]] instead of [a-zA-Z] to match all alphabetic characters. Makes a difference if matching non-English text. And you might want to use \{1,\} instead of * to avoid matching 0-length/missing columns if they can show up in your input.

How can I escape all non-alphanumeric characters in AWK?

I inherited a very large AWK script that matches against .csv files, and I've found it does not match some alphanumeric characters, especially + ( ).
While I realize this would be easy in sed:
sed 's/\([^A-z0-9]\)/\\\1/g'
I can't seem to find a way to call on the matched character the same way in AWK.
For instance a sample input is:
select.awk 'Patient data +(B/U)'
I would like to escape the non-alphanumeric characters, and turn the line into:
Patient\ data\ \+\(B\/U\)
I have seen some people pass very obscure non-alphanumeric characters as well, which I would like to escape.

gsub(/[^[:alnum:]]/, "\\\\&", arg)

the gnu variant has more feature,
awk '{n=gensub(/[^[:alnum:]]/,"\\\\&","g"); print n}' d.csv

How can I remove a newline (\n) at the end of a string?

The problem
I have multiple property lines in a single string separated by \n like this:
LINES2="Abc1.def=$SOME_VAR\nAbc2.def=SOMETHING_ELSE\n"$LINES
The LINES variable
might contain an undefined set of characters
may be empty. If it is empty, I want to avoid the trailing \n.
I am open for any command line utility (sed, tr, awk, ... you name it).
Tryings
I tried this to no avail
sed -z 's/\\n$//g' <<< $LINES2
I also had no luck with tr, since it does not accept regex.
Idea
There might be an approach to convert the \n to something else. But since $LINES can contain arbitrary characters, this might be dangerous.
Sources
I skim read through the following questions
How can I replace a newline (\n) using sed?
sed with literal string--not input file

Here's one solution:
LINES2="Abc1.def=$SOME_VAR"$'\n'"Abc2.def=SOMETHING_ELSE${LINES:+$'\n'$LINES}"
The syntax ${name:+value} means "insert value if the variable name exists and is not empty." So in this case, it inserts a newline followed by $LINES if $LINES is not empty, which seems to be precisely what you want.
I use $'\n' because "\n" is not a newline character. A more readable solution would be to define a shell variable whose value is a single newline.
It is not necessary to quote strings in shell assignment statements, since the right-hand side of an assignment does not undergo word-splitting nor glob expansion. Not quoting would make it easier to interpolate a $'\n'.
It is not usually advisable to use UPPER-CASE for shell variables because the shell and the OS use upper-case names for their own purposes. Your local variables should normally be lower case names.
So if I were not basing the answer on the command in the question, I would have written:
lines2=Abc1.def=$someVar$'\n'Abc2.def=SOMETHING_ELSE${lines:+$'\n'$lines}

linux bash replace placeholder with unknown text which can contain any characters

If I want to replace for example the placeholder {{VALUE}} with another string which can contain any characters, what's the best way to do it?
Using sed s/{{VALUE}}/$(value)/g might fail if $(value) contains a slash...

oldValue='{{VALUE}}'
newValue='new/value'
echo "${var//$oldValue/$newValue}"
but oldValue is not a regexp but works like a glob pattern, otherwise :
echo "$var" | sed 's/{{VALUE}}/'"${newValue//\//\/}"'/g'

Sed also works like 's|something|someotherthing|g' (or with other delimiters for that matter), but if you can't control the input string, you'll have to use some function to escape it before passing it to sed..

The question asked basically duplicates How can I escape forward slashes in a user input variable in bash?, Escape a string for sed search pattern, Using sed in a makefile; how to escape variables?, Use slashes in sed replace, and many other questions. “Use a different delimiter” is the usual answer. Pianosaurus's answer and Ben Blank's answer list characters (backslash and ampersand) that need to be escaped in the shell, besides whatever character is used as an alternate delimiter. However, they don't address the quoting-a-quote problem that will occur if your “string which can contain any characters” contains a double quote. The same kind of problem can affect the ${parameter/pattern/string} shell variable expansion mentioned in a previous answer.
Some other questions besides the few mentioned above suggest using awk, and that is usually a good approach to changes that are more complicated than are easy to do with sed. Also consider perl and python. Besides single- and double-quoted strings, python has u'...' unicode quoting, r'...' raw quoting,ur'...' quoting, and triple quoting with ''' or """ delimiters. The question as stated doesn't provide enough context for specific awk/perl/python solutions.

How to implement Caesar cipher-like text substitution in Vim?

I was doing some puzzle where each English letter is replaced by the one two letters down the alphabet. For example, the word apple is to be transformed into crrng, as a + 2 → c, b + 2 → d, etc.
In Python, I was able to implement this transformation using the maketrans()
string method. I wonder: Is it possible to do the same via search and replace in Vim?

1. If the alphabetic characters are arranged sequentially in the target
encoding (as is the case for ASCII and some alphabets in UTF-8, like
English), one can use the following substitution command:
:%s/./\=nr2char(char2nr(submatch(0))+2)/g
(Before running the command, make sure that the encoding option
is set accordingly.)
However, this replacement implements a non-circular letter shift.
A circular shift can be implemented by two substitutions separately
handling lowercase and uppercase letters:
:%s/\l/\=nr2char(char2nr('a') + (char2nr(submatch(0)) - char2nr('a') + 2) % 26)/g
:%s/\u/\=nr2char(char2nr('A') + (char2nr(submatch(0)) - char2nr('A') + 2) % 26)/g
2. Another way is to translate characters using the tr() function.
Let us assume that the variable a contains lowercase characters
of an alphabet arranged in correct order, and the variable a1 hold
the string of characters corresponding to those in a (below is
an example for English letters).
:let a = 'abcdefghijklmnopqrstuvwxyz'
:let a1 = a[2:] . a[:1]
To avoid typing the whole alphabet by hand, the value of a can be
produced as follows:
:let a = join(map(range(char2nr('a'), char2nr('z')), 'nr2char(v:val)'), '')
Then, to replace each letter on a line by the letter two positions down
the alphabet, one can use the following substitution:
:%s/.*/\=tr(submatch(0), a . toupper(a), a1 . toupper(a1))

Yes, \= will execute the function
%s/\(.\)/\=nr2char(char2nr(submatch(1)) + 2)/g

Can't think of anything in vim, but you could use the unix command line utility 'tr' (stands for translate, I believe).

The puzzle you describe is widely known as the caesar cipher, and is normally implemented via the tr command or sed -e y/. Since y is not available in vim, you'll need a pretty dirty hack like ib proposed, but calling tr is much nicer work.
Especially considering the corner case of y and z: I assume these should be mapped to a and b, respectively?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace control characters and spaces with escape sequences - linux

Here's a Perl one-liner I came up with. It uses /e to run code in the replacements. perl -pe 's/([\x00-\x20\\])/sprintf("\\x%02X", ord($1))/eg' A slight deviation from the example in my question: it emits \x5C for backslashes instead of \\.

Wow, it looks like a fairly trivial sed script along the lines of 's|\n|\\n|' for each character you want to substitute.

Related

Replace each column with different spacing using sed

How can I escape all non-alphanumeric characters in AWK?

How can I remove a newline (\n) at the end of a string?

linux bash replace placeholder with unknown text which can contain any characters

How to implement Caesar cipher-like text substitution in Vim?

Categories

Resources