tcl string command does not work with regexp backreference - string

see code below, as an example, I am trying to find use regsub with backrefeence as means of selectively using string toupper. I am not getting what I have expected.See simple example below (yes, I know that I can use string toupper $string 0 0, however, this is just for showing the principle, in a simple example).
> puts [ regsub {^(.)} "min" "s\\1" ]
smin
> puts [ regsub {^(.)} "min" [ string toupper "\\1" ] ]
min
As can be seen, string toupper applied on backreference does not work, but the backrefernce can be used in a double quote operation.
I am using TCL ver. 8.6

The string toupper is working, but not doing what you want; string toupper "\\1" is just the string \1 so the regsub isn't having much effect. The problem is that regsub doesn't have any way of doing “run this command at the substitution sites”; I've been wanting to fix that for years, but have never actually done so (too many projects for this one to make it to the fore).
Instead, you need to regsub in a command substitution into the string and then subst the result, but in order to do that, you need to first make the string otherwise safe to subst with string map. Fortunately, that's actually pretty simple.
I've split this apart to make it easier for you to examine exactly what each stage is doing:
set str "here is an example with \$dollars, \[brackets\] and \\backslashes"
# Quote all of Tcl's metacharacters; there's not that many
set safer [string map {[ \\[ ] \\] \\ \\\\ $ \\$} $str]
# Inject the command we want to run where we want to run it
set withsubst [regsub {^(.)} $safer {[string toupper {\1}]}]
# Perform the substitution
set result [subst $withsubst]

Related

Is it possible to use various variables as a replacement entry in the string map of the tcl/tk?

I want to insert a variable with 'word' entry to be override within 'string map'. We usually do this:
# Input variable
set str "Is First Name"
# Result new strig
puts [string map -nocase { {First Name} {Last Name} } $str]
what I need is to replace it through a second variable. For instance:
set replace "Last Name"
puts [string map -nocase { {First Name} $replace } $str]
I have the following output:
Before i asked, i did research, but I didn't find anything of the sort. I even tried to adapt other examples but to no avail.
Remember Rule 6: no substitution is done inside braces. So you have to build the list some other way that does allow for variable substitution, like using list:
string map -nocase [list {First Name} $replace] $str
You could also use regsub instead of string map:
regsub -all -nocase {***=First Name} $str $replace
(The leading ***= in the regular expression means it's treated as an exact string and RE metacharacters lose their normal meaning)

TCL: How to remove all letters/numbers from a string?

I am using tcl programming language and trying to remove all the letters or numbers from a string. From this example, I know a general way to remove all the letters from a string (e.x. set s abcdefg0123456) is
set new_s [string trim $s "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXXYZ"]
If I want to remove all numbers from a string in general, I can do
set new_s [string trim $s "0123456789"]
Is there a more straightforward way to remove all letters/numbers?
I also notice if I want to remove a portion of numbers (e.x. 012) instead of all numbers, the following does NOT work.
set new_s [string trim $s "012"]
Can someone explain why?
Use regular expressions:
set s abcdefg0123456
regsub -all {\d+} $s {} new_s ;# Remove all digits
regsub -all {[[:alpha:]]+} $s {} new_s ;# Remove all letters
To answer your other question: string trim (and string trimleft and string trimright as “half” versions) removes a set of characters from the ends of a string (and returns the new string; it's a pure functional operation). It doesn't do anything to the interior of the string. It doesn't know anything about patterns. The default set of characters removed is “whitespace” (spaces, newlines, tabs, etc.)
When you do:
set new_s [string trim $s "012"]
You are setting the removal set to 0, 1 and 2, but it is still only the ends that get removed. Thus it will leave x012101210y entirely alone, but turn 012101210 into the empty string.

Tcl - How to replace ? with -

(You'd think this would be easy, but I'm stumped.)
I'm converting an iOS note to a text file, and the note contains "0." and "?" whenever there is a list or bullet.
This was a bulleted list
? item 20
? Item 21
? Item 22
I'm having so much problem replacing the "?"
I don't want to replace a legitimate question mark at the end of a sentence,
but I want to replace the "?" bullets with "-" (preferably anywhere in the line, not just at the beginning)
I tried these searches - no luck
set line "? item 20"
set index_bullet [string first "(\s|\r|\n)(\?)" $line]
set index_bullet [string first "(!\w)(\?)" $line]
set index_bullet [string first ^\? $line]
This works, but it would match any question mark
set index_bullet [string first \? $line]
Does anyone know what I'm doing wrong?
How do I find and replace only question mark bullets with a "-"?
Thank you very much in advance
If you're really wanting to replace a question mark where you've got a regular expression that describes the rule, the regsub command is the right way. (The string first command finds literal substrings only. The string match command uses globbing rules.) In this case, we'll use the -all option so that every instance is replaced:
set line "? item 20"
set replaced [regsub -all {(\s|^)\?(\s)} $line {\1-\2}]
puts "'$line' --> '$replaced'"
# Prints: '? item 20' --> '- item 20'
The main tricks to using regular expressions in Tcl are, as much as possible, to keep REs and their replacements in braces so that the you can use Tcl metacharacters (e.g., backslash or square brackets) without having to fiddle around a lot.
Also, \s by default will match a newline.
It seems likely that a character used to indicate a list item is the first character on the line or the first character after optional whitespace. To match a question mark at the beginning of a line:
string match {\?*} $line
or
string match \\?* $line
The braces or doubled backslash keeps the question mark from being treated as a string match metacharacter.
To find a question mark after optional whitespace:
string match {\?*} [string trimleft $line]
The command returns 1 if it finds a match, and 0 if it doesn't.
To do this with string first, use
if {[string first ? [string trimleft $line]] eq 0} ...
but in that case, keep in mind that the index returned from string first isn't the true location of the question mark. (Use
== instead of eq if you have an older Tcl).
When you have determined that the line contains a question mark in the first non-whitespace position, a simple
set line [regsub {\?} $line -]
will perform a single substitution regardless of where it is.
Documentation:
regsub,
string,
Syntax of Tcl regular expressions
I figured it out.
I did it in two steps:
1) First find the "?"
set index_bullet [string first "\?" $line]
2) Then filter out "?" that is not a bullet
set index_question_mark [string first "\w\?" $line]
I have a solution, but please post if you have a better way of doing this.
Thanks!

regsub not working properly with string tolower

Im trying to make the first letter of that pattern lowercase.
set filestr {"FooBar": "HelloWorld",}
regsub -all {([A-Z])([A-Za-z]+":)} $filestr "[string tolower "\\1"]\\2" newStr
However the string tolower is not doing anything
This is a 2 step process in Tcl:
set tmp [regsub -all {([A-Z])([A-Za-z]+":)} $filestr {[string tolower "\1"]\2}]
"[string tolower "F"]ooBar": "HelloWorld",
Here we have added the syntax for lower casing the letter. Note how I have used non-interpolating braces instead of double quotes for the replacement part. Now we apply the subst command to actually apply the command:
set newStr [subst $tmp]
"fooBar": "HelloWorld",
In Tcl 8.7, you can do this in a single step with the new command substitution capability of regsub:
set filestr {"FooBar": "HelloWorld",}
# The backslash in the RE is just to make the highlighting here not suck
regsub -all -command {([A-Z])([A-Za-z]+\":)} $filestr {apply {{- a b} {
string cat [string tolower $a] $b
}}} newStr
If you'd wanted to convert the entire word to lower case, you'd have been able to use this simpler version:
regsub -all -command {[A-Z][A-Za-z]+(?=\":)} $filestr {string tolower} newStr
But it doesn't work here because you need to match the whole word and pass it all through the transformation command; using lookahead constraints for the remains of the word allows those remains to be matched on the internal search for a match.

Substring extraction in TCL

I'm trying to extract a sequence of characters from a string in TCL.
Say, I have "blahABC:blahDEF:yadamsg=abcd".
I want to extract the substring starting with "msg=" until I reach the end of the string.
Or rather I am interested in extracting "abcd" from the above example string.
Any help is greatly appreciated.
Thanks.
Regular expressions are the tools for these kind of tasks.
The general syntax in Tcl is:
regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?
A simple solution for your task would be:
set string blahblah&msg=abcd&yada
# match pattern for a =, 0-n characters which are not an & and one &. The grouping with {} is necessary due to special charactaer clash between tcl and re_syntax
set exp {=([^&]*)&}
# -> is an idiom. In principle it is the variable containing the whole match, which is thrown away and only the submatch is used
b
regexp $exp $string -> subMatch
set $subMatch
A nice tool to experiment and play with regexps ist Visual Regexp (http://laurent.riesterer.free.fr/regexp/). I'd recommend to download it and start playing.
The relevant man pages are re_syntax, regexp and regsub
Joachim
Another approach: split the query parameter using & as the separator, find the element starting with "msg=" and then get the text after the =
% set string blahblah&msg=abcd&yada
blahblah&msg=abcd&yada
% lsearch -inline [split $string &] {msg=*}
msg=abcd
% string range [lsearch -inline [split $string &] {msg=*}] 4 end
abcd
Code
proc value_of {key matches} {
set index [lsearch $matches "yadamsg"]
if {$index != -1} {
return [lindex $matches $index+1]
}
return ""
}
set x "blahABC:blahDEF:yadamsg=abcd:blahGHI"
set matches [regexp -all -inline {([a-zA-Z]+)=([^:]*)} $x]
puts [value_of "yadamsg" $matches]
Output:
abcd
update
upvar not needed. see comments.

Resources