using special characters in parameters and variables in batch without external file use - string

Before you go marking this as a duplicate hear me out
My question;
A: has different requirements than all the others (which are basically "whats an escape character?") Including: not having to use an external file to enter in parameters to functions
B: questions the existence of this mess rather than accepting 'no' or 'its complicated' as the answer
C: understands that there are escape characters that already exist and ways to work around that
D: comes from a different skill level and isn't a 2-7 years old question
E: requires the use of quotes rather than something like [ because quotes are the only thing that works with spaced strings
Also before ya'll say I didn't try stuff
I read these (all of it including comments and such):
Batch character escaping
http://www.robvanderwoude.com/escapechars.php
https://blogs.msdn.microsoft.com/oldnewthing/20091029-00/?p=16213
using batch echo with special characters
Escape angle brackets in a Windows command prompt
Pass, escape and recognize Special Character in Windows Batch File
I didn't understand that all fully, because to fully understand all that I'd have to be much better at batch but here is what I gleaned:
So I understand theres a whole table of escape sequences, that ^ is the most used one, that you can use Delayed Expansion to do this task so that the characters don't get parsed immediately, just 'expanded' at runtime, but then that the Enable Delayed expansion thing doesn't always work because with pipe characters, the other files/things being piped to/from don't inherit the expansion status so
A: you have to enable it there too
B: that forces you to use that expansion
C: it requires multiple escape characters for each parsing pass of the CLI which apparently is hard to determine and ugly to look at.
This all seems rather ridiculous, why wasn't there some sort of creation to set a string of odd inputs to literal rather than process the characters. Why wasn't it just a simple flag upon some super duper special character (think alt character) that would almost never appear unless you set the font to wingdings. Why does each parsing pass of pipe characters remove the escape characters? That just makes everything insane because the user now has to know how many times that string is used. Why hasn't a tool been developed to auto scan through odd inputs and auto escape them? We have a table of the rules, is it really that hard? Has it been done already? What would it require that's 'hard'?
Down the rabbit hole we go
How did I get here you ask? Well it all started when I made a simple trimming function and happened upon one of the biggest problems in batch, escaping characters when receiving inputs. The problem is alot of my inputs to my trimming function had quotes. Quotes are escaped by using "" in place of " so something like
::SETUP
::parenthesis is just to deliniate where it goes, it isn't
::actually in
::the code
set "var=(stuff goes here)"
call :TrimFunc "%var%",var
:TrimFunc
::the + are just to display the spacing otherwise I can't tell
echo beginning param1 is +%~1+
::code goes here for func
gotoEOF
::END SETUP
::NOTE the + characters aren't part of the actual value, just the
::display when I run this function
set "var=""a"""
::got +"a"+
will work but
set "var="a "
::got +"a+
::expected +"a +
set "var="a ""
::got +"a+
::expected +"a "+
set "var="a " "
::got +"a+
::expected +"a " +
set "var="a"
::got +"a",var+
::expected +"a+
will not work as expected. oddly,
set "var="a""
::got +"a"+
seemes to work despite not being escaped fully. Adding any spaces seems to disrupt this edge case.
Oddly enough I've tried doing:
set 'var="a"'
::got ++
::expected +"a"+
But I have no idea what changing ' to " actually does when its the one that contains the argument (not the ones that are supposed to be literal).
To see what would happen and
What I want:
Surely there must be some sort of universal escape character thing such that I can do this (assume the special character was *)
set *var=""something " "" " """*
call :TrimFunc "%var%",var
echo +%~1+
would net me
+""something " "" " """+
with no problems. In fact, why can't I have some universal escape character that can just be used to take in all the other characters inside it literally instead of the command line trying to process them? Perhaps I'm thinking about this wrong but this seems to be a recurring problem with weird inputs all over. I just want my vairbales, pipes and strings and all that to just STAY LITERAL WHEN THEY'RE SUPPOSED TO. I just want a way to have any input and not have any weird output, it should just treat everything literally untill I want it not to because it was enclosed by the mystical super special character that I just invented in my mind.
::noob rant
I don't see why this was never a thing. Whats preventing the makers of this highly useful language from simply creating a flag and some character that is never used to be the supremo escape character. Why do we need like 10 different ways of escaping characters? I should be able to programatically escape inputs if necessary, it should NEVER be the users job to escape their inputs, thats absolutely ridiculous and probably a violation of every good coding standard in existence
::END noob rant
Anyways. I'd be happy to be enlightened as to why the above is or isn't a thing. I just want to be able to use quotes IN A STRING (kinda important) And I can't comprehend why its not as simple as having one giant "treat these things as literals" flag that ALWAYS JUST WORKS(tm).
By the way, In case you're wondering why my function takes in the name of the variable it's writing to, I couldn't figure out how to get labels inside labels working without using delayed expansion. Using that means the variable I'm making is local not global so I use the name of the global variable to basically set it (using it's name) to the local value on return like this:
endlocal & set "%~2=%String%"
Feel free to yell at me on various things because I am 99% certain I'm doing something horribly syntactically wrong, have some bad misunderstandings or am simply way to naive to truly understand the complexity of this problem but to me it seems amazingly superfluous.
Why can't the last quote be used like the special character it is but any preceeding ones are taken literally (maybe depending upon a flag)
for example
set "var="a ""
why doesn't the ending two quotes act specially and the ones in between act literally? Can the CLI not tell where the line ends? Can it not tell the difference between the first and last quotes and the ones in between? This seems simple to implement to me.
As long as I can echo things properly and save their literal value from parameter to variable I'm happy.

Firstly, I'm don't really understand why you will want to use set "var=something" instead of set var=something, it doesn't seems to have difference.
Secondly, hope that helps, which I have (recently? IDK.) invented a method to deal with the annoying quotes. Hope this batch inspires or helps you to do sth similar.
#echo off
title check for length of string
color de
mode con: cols=90 lines=25
goto t
:t
set str=error
set /p str=Please enter four characters:
set len=0
goto sl
:sl
call set this=%%str:~%len%%%
if not "%this%" == "" (set /a len+=1 & rem debug" == "" (set /a len+=1
goto sl)
if not "%len%" == "4" (set n= not) else (set n=)
echo This is%n% a four character string.
pause >nul
exit
Which in your case:
if not "%var%" == "" (call :TrimFunc "%var%",var & rem debug" == "" (call :TrimFunc "%var%,var)
)
Hope that helps. Add oil~ (My computer doesn't support delayexpansion with unknown reason. Therefore, most of the codes are a bit clumsy.)
P.S.: If you are simply removing text and not replacing text, why not use set var=%var:string=%? If the string required is a variable too, then you can try this: call set var=%%var:%string%=%%

Related

Expect/Tcl: 'string map' removing content in string?

SETUP: Expect/Tcl Script in Linux
USE CASE:
Using expect to wait for the report of some status to be used in a $user_command.
expect -re "notify (.+)\n"
set status $expect_out(1,string)
send [string map [list SESSION "$status"] "$user_command"]
So when the application sends "notify running", then status is set to running.
For that a keyword STATUS in $user_command needs to be replaced with $status, such that, for example
"log STATUS to file"
becomes
"log running to file"
To see what is happing, I wrote
expect_tty -re "(.+)\n"
set status $expect_out(1,string)
send_user [string map [list SESSION "$status"] "$user_command"]
which works fine when running isolatedly. The output is
log someUserInput to file
when typing someUserInput to responde to expect_tty. However, as part of a larger script, the string map command it removes anything before the string replacement, so that the output becomes
" to file"
(without a newline) I checked for the uniqueness of variables in the script, so that this is not an issue.
QUESTION:
What is going on here? How can I make the script robust?
The string map command is exactly deterministic. At each character of its input string, in order, it considers whether any of the from strings in the mapping list match, in order, and if so it performs the replacement (with the paired to string) and goes on to consider the character immediately after the replaced substring. (The empty string is a special case: it's never matched.) The code to implement it is really quite stupid, but happens to be very cache friendly on modern computers so it's still very fast; more sophisticated and supposedly “faster” implementations have been tried, but found to be slower in practice with the kinds of maps usually encountered in the wild.
If the replacement is failing to apply, it's usually because the input string is not quite what you expect. In most programs this is rare, but it's more common with expect programs because the output of the terminal emulation engine inside them can include metacharacters for things like moving the cursor around and changing the color. (Often the easiest fix for that is to tell the spawned program that its terminal type is one that doesn't support such complex features, perhaps by setting the TERM environment variable to dumb.)
Thanks to #glennjackman:
The problem is due to applications reporting newline as \r\n, so that (.*)\n in
expect_tty -re "(.+)\n"
matches in expression 1 (the thing inside the brackets) something that includes \r at the end. With \r removing anything before it, string map seems to cut anything before the replaced string. The solution is to expect something that excludes \r, i.e.
expect -re ``(\[^\r\n\]+)``
which collects anything until end of line, whatsoever the format may be.

Find space escape

Writing a small script in bash (MacOS in fact) and I want to use find, with multiple sources. Not normally a problem, but the list of source directories to search is held as a string in a variable. Again, not normally a problem, but some of them contain spaces in their name.
I can construct the full command string and if entered directly at the command prompt (copy and paste in fact) it works as required and expected. But when I try and run it within the script, it flunks out on the spaces in the name and I have been unable to get around this.
I cannot quote the entire source string as that is then just seen as one single item which of course does not exist. I escape each space with a backslash within the string held in the variable and it is simply lost. If I use double backslash, they both remain in place and again it fails. Any method of quoting I have tried is basically ignored, the quotes are seen as normal characters and splitting is done at each space.
I have so far only been able to use eval on the whole command string to get it to work but I felt there ought to be a better solution than this.
Ironically, if I use AppleScript I CAN create a suitable command string and run it perfectly with doShellScript (ok, that's using JXA, but it's the same with actual AppleScript). However, I have so far been unable to find the correct escape mechanism just in a bash script, without resorting to eval.
Anyone suggest a solution to this?
If possible, don't store all paths in one string. An array is safer and more convenient:
paths=("first path" "second path" "and so on")
find "${paths[#]}"
The find command will expand to
find "first path" "second path" "and so on"
If you have to use the string and don't want to use eval, split the string into an array:
string="first\ path second\ path and\ so\ on"
read -a paths <<< "$string"
find "${paths[#]}"
Paths inside string should use \ to escape spaces; wraping paths inside"" or '' will not work. eval might be the better option here.

Less "active" emacs string highlighting?

The problem is when I type ' or " (in a number of modes including java, python, ruby, c) it somehow immediately wants to make-highlight the rest of the file after as a string. Would be so less annoying if it rather did nothing and waited for closing quote char. Could be ok for triple quotes in python but for ordinary?
Googling didn't help much since I'm having troubles with concise and distinctive formulation.
You can use AutoPairs in order to automatically add the second " when you type the first " (same thing for parentheses, braces, etc.).
The syntax highlighting will not be disturbed.
http://www.emacswiki.org/emacs/AutoPairs
Actually, Emacs does wait already. So you may simply want to adjust the time it waits, which is controlled by jit-lock-context-time.

Decrypt obfuscated perl script

Had some spam issues on my server and, after finding out and removing some Perl and PHP scripts I'm down to checking what they really do, although I'm a senior PHP programmer I have little experience with Perl, can anyone give me a hand with the script here:
http://pastebin.com/MKiN8ifp
(It was one long line of code, script was called list.pl)
The start of the script is:
$??s:;s:s;;$?::s;(.*); ]="&\%[=.*.,-))'-,-#-*.).<.'.+-<-~-#,~-.-,.+,~-{-,.<'`.{'`'<-<--):)++,+#,-.{).+,,~+{+,,<)..})<.{.)-,.+.,.)-#):)++,+#,-.{).+,,~+{+,,<)..})<*{.}'`'<-<--):)++,+#,-.{).+:,+,+,',~+*+~+~+{+<+,)..})<'`'<.{'`'<'<-}.<)'+'.:*}.*.'-|-<.+):)~*{)~)|)++,+#,-.{).+:,+,+,',~+*+~+~+{+<+,)..})
It continues with precious few non-punctuation characters until the very end:
0-9\;\\_rs}&a-h;;s;(.*);$_;see;
Replace the s;(.*);$_;see; with print to get this. Replace s;(.*);$_;see; again with print in the first half of the payload to get this, which is the decryption code. The second half of the payload is the code to decrypt, but I can't go any further with it, because as you see, the decryption code is looking for a key in an envvar or a cookie (so that only the script's creator can control it or decode it, presumably), and I don't have that key. This is actually reasonably cleverly done.
For those interested in the nitty gritty... The first part, when de-tangled looks like this:
$? ? s/;s/s;;$?/ :
s/(.*)/...lots of punctuation.../;
The $? at the beginning of the line is the pre-defined variable containing the child error, which no doubt serves only as obfuscation. It will be undefined, as there can be no child error at this point.
The questionmark following it is the start of a ternary operator
CONDITION ? IF_TRUE : IF_FALSE
Which is also added simply to obfuscate. The expression returned for true is a substitution regex, where the / slash delimiter has been replaced with colon s:pattern:replacement:. Above, I have put back slashes. The other expression, which is the one that will be executed is also a substitution regex, albeit an incredibly long one. The delimiter is semi-colon.
This substitution replaces .* in $_ - the default input and pattern-searching space - with a rather large amount of punctuation characters, which represents the bulk of the code. Since .* matches any string, even the empty string, it will simply get inserted into $_, and is for all intents and purposes identical to simply assigning the string to $_, which is what I did:
$_ = q;]="&\%[=.*.,-))'-,-# .......;;
The following lines are a transliteration and another substitution. (I inserted comments to point out the delimiters)
y; -"[%-.:<-#]-`{-}#~\$\\;{\$()*.0-9\;\\_rs}&a-h;;
#^ ^ ^ ^
#1 2 3
(1,2,3 are delimiters, the semi-colon between 2 and 3 is escaped)
The basic gist of it is that various characters and ranges -" (space to double quote), and something that looks like character classes (with ranges) [%-.:<-#], but isn't, get transliterated into more legible characters e.g. curly braces, dollar sign, parentheses,0-9, etc.
s;(.*);$_;see;
The next substitution is where the magic happens. It is also a substitution with obfuscated delimiters, but with three modifers: see. s does nothing in this case, as it only allows the wildcard character . to match newline. ee means to evaluate the expression twice, however.
In order to see what I was evaluating, I performed the transliteration and printed the result. I suspect that I somewhere along the line got some characters corrupted, because there were subtle errors, but here's the short (cleaned up) version:
s;(.*);73756220656e6372797074696f6e5f6 .....;; # very long line of alphanumerics
s;(..);chr(hex($1));eg;
s;(.*);$_;see;
s;(.*);704b652318371910023c761a3618265 .....;; # another long line
s;(..);chr(hex($1));eg;
&e_echr(\$_);
s;(.*);$_;see;
The long regexes are once again the data containers, and insert data into $_ to be evaluated as code.
The s/(..)/chr(hex($1))/eg; is starting to look rather legible. It is basically reading two characters at the time from $_ and converting it from hex to corresponding character.
The next to last line &e_echr(\$_); stumped me for a while, but it is a subroutine that is defined somewhere in this evaluated code, as hobbs so aptly was able to decode. The dollar sign is prefixed by backslash, meaning it is a reference to $_: I.e. that the subroutine can change the global variable.
After quite a few evaluations, $_ is run through this subroutine, after which whatever is contained in $_ is evaluated a last time. Presumably this time executing the code. As hobbs said, a key is required, which is taken from the environment %ENV of the machine where the script runs. Which we do not have.
Ask the B::Deparse module to make it (a little more) readable.

Why doesn't Vims errorformat take regular expressions?

Vims errorformat (for parsing compile/build errors) uses an arcane format from c for parsing errors.
Trying to set up an errorformat for nant seems almost impossible, I've tried for many hours and can't get it. I also see from my searches that alot of people seem to be having the same problem. A regex to solve this would take minutesto write.
So why does vim still use this format? It's quite possible that the C parser is faster but that hardly seems relevant for something that happens once every few minutes at most. Is there a good reason or is it just an historical artifact?
It's not that Vim uses an arcane format from C. Rather it uses the ideas from scanf, which is a C function. This means that the string that matches the error message is made up of 3 parts:
whitespace
characters
conversion specifications
Whitespace is your tabs and spaces. Characters are the letters, numbers and other normal stuff. Conversion specifications are sequences that start with a '%' (percent) character. In scanf you would typically match an input string against %d or %f to convert to integers or floats. With Vim's error format, you are searching the input string (error message) for files, lines and other compiler specific information.
If you were using scanf to extract an integer from the string "99 bottles of beer", then you would use:
int i;
scanf("%d bottles of beer", &i); // i would be 99, string read from stdin
Now with Vim's error format it gets a bit trickier but it does try to match more complex patterns easily. Things like multiline error messages, file names, changing directory, etc, etc. One of the examples in the help for errorformat is useful:
1 Error 275
2 line 42
3 column 3
4 ' ' expected after '--'
The appropriate error format string has to look like this:
:set efm=%EError\ %n,%Cline\ %l,%Ccolumn\ %c,%Z%m
Here %E tells Vim that it is the start of a multi-line error message. %n is an error number. %C is the continuation of a multi-line message, with %l being the line number, and %c the column number. %Z marks the end of the multiline message and %m matches the error message that would be shown in the status line. You need to escape spaces with backslashes, which adds a bit of extra weirdness.
While it might initially seem easier with a regex, this mini-language is specifically designed to help with matching compiler errors. It has a lot of shortcuts in there. I mean you don't have to think about things like matching multiple lines, multiple digits, matching path names (just use %f).
Another thought: How would you map numbers to mean line numbers, or strings to mean files or error messages if you were to use just a normal regexp? By group position? That might work, but it wouldn't be very flexible. Another way would be named capture groups, but then this syntax looks a lot like a short hand for that anyway. You can actually use regexp wildcards such as .* - in this language it is written %.%#.
OK, so it is not perfect. But it's not impossible either and makes sense in its own way. Get stuck in, read the help and stop complaining! :-)
I would recommend writing a post-processing filter for your compiler, that uses regular expressions or whatever, and outputs messages in a simple format that is easy to write an errorformat for it. Why learn some new, baroque, single-purpose language unless you have to?
According to :help quickfix,
it is also possible to specify (nearly) any Vim supported regular
expression in format strings.
However, the documentation is confusing and I didn't put much time into verifying how well it works and how useful it is. You would still need to use the scanf-like codes to pull out file names, etc.
They are a pain to work with, but to be clear: you can use regular expressions (mostly).
From the docs:
Pattern matching
The scanf()-like "%*[]" notation is supported for backward-compatibility
with previous versions of Vim. However, it is also possible to specify
(nearly) any Vim supported regular expression in format strings.
Since meta characters of the regular expression language can be part of
ordinary matching strings or file names (and therefore internally have to
be escaped), meta symbols have to be written with leading '%':
%\ The single '\' character. Note that this has to be
escaped ("%\\") in ":set errorformat=" definitions.
%. The single '.' character.
%# The single '*'(!) character.
%^ The single '^' character. Note that this is not
useful, the pattern already matches start of line.
%$ The single '$' character. Note that this is not
useful, the pattern already matches end of line.
%[ The single '[' character for a [] character range.
%~ The single '~' character.
When using character classes in expressions (see |/\i| for an overview),
terms containing the "\+" quantifier can be written in the scanf() "%*"
notation. Example: "%\\d%\\+" ("\d\+", "any number") is equivalent to "%*\\d".
Important note: The \(...\) grouping of sub-matches can not be used in format
specifications because it is reserved for internal conversions.
lol try looking at the actual vim source code sometime. It's a nest of C code so old and obscure you'll think you're on an archaeological dig.
As for why vim uses the C parser, there are plenty of good reasons starting with that it's pretty universal. But the real reason is that sometime in the past 20 years someone wrote it to use the C parser and it works. No one changes what works.
If it doesn't work for you the vim community will tell you to write your own. Stupid open source bastards.

Resources