how to perform substring extraction and substitution in tcl - string

I am trying to extract a substring from a string in Tcl. I wrote the code and able to do it, but I was wondering if there is any other efficient way to do it. So the exact problem is I have a string
name_ext_10a.string_10a.string.string.string
and I want to extract "name_ext", and then remove that "_" and replace it with "."; I finally want the output to be "name.ext". I wrote something like this:
set _File "[string replace $_File [string last "_" $_File] [string length $_File] "" ]"
set _File "[string replace $_File [string last "_" $_File] [string length $_File] "" ]"
set _File "[string replace $_File [string last "_" $_File] [string last "_" $_File] "." ]"
which gives me the exact output I want, but I was wondering if there is any other efficient way to do this in Tcl.

You could split that filename using underscore as a separator, and then join the first 2 elements with a dot:
% set f name_ext_10a.string_10a.string.string.string
name_ext_10a.string_10a.string.string.string
% set out [join [lrange [split $f _] 0 1] .]
name.ext
EDIT
So if "name" can have an arbitrary number of underscores:
set f "foo_bar_baz_ext_10a.string_10a.string.string.string"
set pieces [split $f _]
set name [join [lrange $pieces 0 end-3] _]
set out [join [list $name [lindex $pieces end-2]] .] ;#==> foo_bar_baz.ext
But this is getting complex. One regex should suffice -- I assume "string" can be any sequence of non-underscore chars.
set string {[^_]+}
set regex "^(.+)_($string)_10a.${string}_10a.$string.$string.$string\$"
regexp $regex $f -> name ext
set out "$name.$ext" ;#==> foo_bar_baz.ext

One way to do the extraction is with regsub:
regsub {^([^_]+)_([^_]+)_.*} $_File {\1.\2} _File
The regular expression contains ([^_]+) components, which match a sequence of non-underscore characters, plus an anchor and some underscores, and a trailing non-capturing .* which matches everything else (so we can discard it). The regsub replaces that (which is the whole string) with the concatenation of the two matched non-underscore sections with a . between, and writes it back to the _File variable where the string came from.
Note that I put the regular expression and replacement in braces. This is because they contain Tcl metacharacters (square brackets and backslashes) which I want Tcl to pass into regsub verbatim.

Related

TCL: How to remove all letters/numbers from a string?

I am using tcl programming language and trying to remove all the letters or numbers from a string. From this example, I know a general way to remove all the letters from a string (e.x. set s abcdefg0123456) is
set new_s [string trim $s "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXXYZ"]
If I want to remove all numbers from a string in general, I can do
set new_s [string trim $s "0123456789"]
Is there a more straightforward way to remove all letters/numbers?
I also notice if I want to remove a portion of numbers (e.x. 012) instead of all numbers, the following does NOT work.
set new_s [string trim $s "012"]
Can someone explain why?
Use regular expressions:
set s abcdefg0123456
regsub -all {\d+} $s {} new_s ;# Remove all digits
regsub -all {[[:alpha:]]+} $s {} new_s ;# Remove all letters
To answer your other question: string trim (and string trimleft and string trimright as “half” versions) removes a set of characters from the ends of a string (and returns the new string; it's a pure functional operation). It doesn't do anything to the interior of the string. It doesn't know anything about patterns. The default set of characters removed is “whitespace” (spaces, newlines, tabs, etc.)
When you do:
set new_s [string trim $s "012"]
You are setting the removal set to 0, 1 and 2, but it is still only the ends that get removed. Thus it will leave x012101210y entirely alone, but turn 012101210 into the empty string.

Tcl search /and remove

I have a hierarchy string
Aa/bb/cc/dd
Ff/gg/hh/ii
I can get length but don't know how to get the index of last "/"
How to get the output one hierarchy above?
Aa/bb/cc
Ff/gg/hh
To get rid of the last component like that, use one of:
regsub {/[^/]+$} $input "" — not for filenames!
join [lrange [split $input "/"] 0 end-1] "/" — not for filenames!
file dirname $input — for filenames
That has to be lifted to work over a list of values. The lmap command is convenient for that; for example:
set outputList [lmap value $inputList {file dirname $value}]

regsub not working properly with string tolower

Im trying to make the first letter of that pattern lowercase.
set filestr {"FooBar": "HelloWorld",}
regsub -all {([A-Z])([A-Za-z]+":)} $filestr "[string tolower "\\1"]\\2" newStr
However the string tolower is not doing anything
This is a 2 step process in Tcl:
set tmp [regsub -all {([A-Z])([A-Za-z]+":)} $filestr {[string tolower "\1"]\2}]
"[string tolower "F"]ooBar": "HelloWorld",
Here we have added the syntax for lower casing the letter. Note how I have used non-interpolating braces instead of double quotes for the replacement part. Now we apply the subst command to actually apply the command:
set newStr [subst $tmp]
"fooBar": "HelloWorld",
In Tcl 8.7, you can do this in a single step with the new command substitution capability of regsub:
set filestr {"FooBar": "HelloWorld",}
# The backslash in the RE is just to make the highlighting here not suck
regsub -all -command {([A-Z])([A-Za-z]+\":)} $filestr {apply {{- a b} {
string cat [string tolower $a] $b
}}} newStr
If you'd wanted to convert the entire word to lower case, you'd have been able to use this simpler version:
regsub -all -command {[A-Z][A-Za-z]+(?=\":)} $filestr {string tolower} newStr
But it doesn't work here because you need to match the whole word and pass it all through the transformation command; using lookahead constraints for the remains of the word allows those remains to be matched on the internal search for a match.

tcl : lsort -dictionary replace new lines by spaces

I use the following command to sort the content of a string
set local_object [lsort -dictionary $list_object]
this comand will replace new lines by spaces
how to avoid that ?
lsort assumes that its argument is a Tcl list. Any whitespace including newlines can separate elements of that list, but will not be preserved in the output. If you want to format the output list with one element per line you could do:
set local_object [join [lsort -dictionary $list_object] "\n"]
It all depends on how your list is built. Any string can be interpreted as list. All the white spaces are considered as a delimiter if you're treating string as list.
set str "d b a\n c"
set lst [lsort -dictionary [split $str " "]]
foreach word $lst {
puts $word
}
a
b
c
d
Split has preserved newline and used single space as a delimiter.

Substring extraction in TCL

I'm trying to extract a sequence of characters from a string in TCL.
Say, I have "blahABC:blahDEF:yadamsg=abcd".
I want to extract the substring starting with "msg=" until I reach the end of the string.
Or rather I am interested in extracting "abcd" from the above example string.
Any help is greatly appreciated.
Thanks.
Regular expressions are the tools for these kind of tasks.
The general syntax in Tcl is:
regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?
A simple solution for your task would be:
set string blahblah&msg=abcd&yada
# match pattern for a =, 0-n characters which are not an & and one &. The grouping with {} is necessary due to special charactaer clash between tcl and re_syntax
set exp {=([^&]*)&}
# -> is an idiom. In principle it is the variable containing the whole match, which is thrown away and only the submatch is used
b
regexp $exp $string -> subMatch
set $subMatch
A nice tool to experiment and play with regexps ist Visual Regexp (http://laurent.riesterer.free.fr/regexp/). I'd recommend to download it and start playing.
The relevant man pages are re_syntax, regexp and regsub
Joachim
Another approach: split the query parameter using & as the separator, find the element starting with "msg=" and then get the text after the =
% set string blahblah&msg=abcd&yada
blahblah&msg=abcd&yada
% lsearch -inline [split $string &] {msg=*}
msg=abcd
% string range [lsearch -inline [split $string &] {msg=*}] 4 end
abcd
Code
proc value_of {key matches} {
set index [lsearch $matches "yadamsg"]
if {$index != -1} {
return [lindex $matches $index+1]
}
return ""
}
set x "blahABC:blahDEF:yadamsg=abcd:blahGHI"
set matches [regexp -all -inline {([a-zA-Z]+)=([^:]*)} $x]
puts [value_of "yadamsg" $matches]
Output:
abcd
update
upvar not needed. see comments.

Resources