Parse all matches in a string in tcl

Parse all matches in a string in tcl - string

I have a string as follows
set temp "
temp : value
temp1 : value1
tempvalue
abc = 23:445:726
abcdef = 456::985
abcdef = 123:45:7
abcdef = 098:45:56:8
"
In this I want an output in which the values after "=" should be set to one variable. Output should be
"456::985 123:45:7 098:45:56:8".
I used
set result [regexp "abcdef\\s*=\\s*(\\S*)" $temp match v1]
but not able to get all
I have got the answer using regexp with -inline -all and -line, to store the result in list and then traverse through it to get the values. I need a one liner
set result [regexp -inline -all -line "abcdef\\s*=\\s*(\\S*)" $temp]
Output is
{abcdef = 456::985} 456::985 {abcdef = 123:45:7} 123:45:7 {abcdef = 098:45:56:8} 098:45:56:8
Then traverse through this to set them all in one string. But i want to know if there is any easy way to do this.
Thanks in advance.

Given this example you don't need regexp. Split the lines into pieces and create a new list.
set r {}
foreach line [split $temp \n] {
if {[string trim $line] eq ""} continue; # skip blank lines
lappend r [string trim [lindex [split $line =] end]]
}
puts $r
That will give one list with just the bits after the equals sign. If you treat it as a string, then it works as a string with each element of the list separated by a space.

Here is another approach: think of each line as 3 tokens: key, equal sign, and value:
set result {}
foreach {key eq_sign value} $temp { lappend result $value }
This approach is simple to understand, but it will not work if the value contains spaces.

Since you're using line-oriented matching, take advantage of the line anchor:
% regexp -inline -all -line {\S+$} $temp
456::985 123:45:7 098:45:56:8
So, to save the values as a string:
set values [join [regexp -inline -all -line {\S+$} $temp]]
If there may not be whitespace after the equal, use the pattern {[^=\s]+$}
My whole answer assumes there's only going to be one equal sign per line.
Responding to the updated sample input:
foreach {m word} [regexp -inline -all -line {=\s*(\S+)$} $temp] {
lappend words $word
}
puts [join $words]
23:445:726 456::985 123:45:7 098:45:56:8

Related

Powershell - matching a string which might contain whitespace

Using Powershell version 3 & reading the contents of a file, which I then need to see if I have one of several strings that are contained in the file and replace them if they are. The issue in my case is that one of the strings I need to match on may have a variable amount of blank spaces in it (or none at all).
The string I'm matching on has double quotes in it, which is followed by a colon (:) then whitespace (or none) and then any number of statuses (can be either alpha or numeric) followed by a comma. For simplicity, I'm just using a number in the code below.
$txt = (Get-Content $file)
$oldstr = "`"status`": 1,"
$newstr = '`"status`": 0,"
if (($txt.Contains($old1)) -or ($txt.Contains($oldstr)) -or ($txt.Contains($old2))) {
$txt.Replace($oldstr, $newstr).Replace($old1, $new1).replace($old2, $new2)| Set-Content -Path $file
}
The problem I'm having is matching the $oldstr which may have none, one or more spaces between the colon and the status code, which in this example is a number but it may also be several different numbers or strings. The $newstr has no need to replicate the whitespace from the $oldstr. Also, in the above example, it is using one of three conditions in the Contains. The actual data may contain none, one, two, or all three of those strings.
How can you do the match/contains and the replace of the strings which can have whitespace in them?

Use a regular expression with the -replace operator:
PS C:\> '"status": 0' -replace '"status":\s*0','"status": 1'
"status": 1
PS C:\> '"status": 0' -replace '"status":\s*0','"status": 1'
"status": 1
PS C:\> '"status":0' -replace '"status":\s*0','"status": 1'
"status": 1
In the pattern I use above:
"status": just matches the literal string "status":
\s* matches 0 or more whitespace characters
0 matches a literal 0

Here is an interessant solution to having several match/replace pairs with a hashtable converted into a combined regex. But I didn't get a Regex into the hash key, so I do both the table and a RegEx to the $_ in the foreach.
# Build hashtable of search and replace values.
$file = ".\testfile.txt"
$replacements = #{
'something2' = 'somethingelse2'
'something3' = 'somethingelse3'
'morethings' = 'morethingelses'
'blabla' = 'blubbblubb'
}
# Join all keys from the hashtable into one regular expression.
[regex]$r = #($replacements.Keys | foreach { [regex]::Escape( $_ ) }) -join '|'
[scriptblock]$matchEval = { param( [Text.RegularExpressions.Match]$matchInfo )
# Return replacement value for each matched value.
$matchedValue = $matchInfo.Groups[0].Value
$replacements[$matchedValue]
}
$fileCont = Get-Content $file
# Perform replace over every line in the file and append to log.
$Newfile = $fileCont | ForEach {
$r.Replace( ( $_ -replace '"status":\s*0','"status": 1'), $matchEval )
}
$fileCont
"----"
$Newfile
Gives this output form my testfile.txt
> .\Replace-Array.ps1
"Status": 0, something2,morethings
"Status": 0, something3, blabla
----
"status": 1, somethingelse2,morethingelses
"status": 1, somethingelse3, blubbblubb

Finding a word in a string in tcl

I have this string:
1, RotD50, 88, 0.1582, 1.2264, -, 7.4, 23.6, 0.2, "San Fernando", 1971, "Santa Felita Dam (Outlet)", 6.61, Reverse, 24.69, 24.87, 389.0, 0.125, 1.2939, RSN88_SFERN_FSD172.AT2, RSN88_SFERN_FSD262.AT2 , RSN88_SFERN_FSD-UP.AT2
I want to find the indices of RSN88_SFERN_FSD172.AT2 and RSN88_SFERN_FSD262.AT2
I have tried a few scripts (like the following) but want to see if someone can help me with a rigorous script?
set currentdirc [pwd]
set fp [open _SearchResults.csv]
set count 1
foreach line [split [read $fp] \n] {
foreach word [split $line] {
set word [string trim $word ","]
set index [lsearch -exact $word "Horizontal-1 Acc.Filename"]
puts "$index"
}
}

You are going to need this:
package require csv
As before, break the data into lines and iterate over those lines. Trim the data first to avoid empty lines before or after.
foreach line [split [string trim [read $fp]] \n] {
Instead of trying to split the csv data using the split command, use the dedicated command ::csv::split from the csv package in Tcllib. You probably have it in your Tcl installation already.
set words [::csv::split $line]
When your line is split, there is unwanted whitespace around many data fields. Let's trim it off.
set words [lmap word $words {string trim $word}]
Finally, you can search for data in the list of words. Searching in each word as you did is pointless.
set index [lsearch $words RSN88_SFERN_FSD262.AT2]
Putting it together:
foreach line [split [string trim [read $fp]] \n] {
set words [::csv::split $line]
set words [lmap word $words {string trim $word}]
set index [lsearch $words RSN88_SFERN_FSD262.AT2]
puts $index
}
Documentation:
csv (package),
foreach,
lmap (for Tcl 8.5),
lmap,
lsearch,
package,
puts,
read,
set,
split,
string

I would use the csv package to do that task, since you are dealing with a csv file. Splitting blindly will split 1, RotD50, 88, 0.1582, 1.2264, -, 7.4, 23.6, 0.2, "San Fernando" things into, for example (each element on their own line):
1,
RotD50,
88,
0.1582,
1.2264,
-,
7.4,
23.6,
0.2,
"San
Fernando"
So my suggestion is:
set currentdirc [pwd]
set fp [open [file join $currentdirc _SearchResults.csv] r]
package require csv
foreach line [split [read $fp] \n] {
set words [::csv::split $line]
set index [lsearch -exact $words "Horizontal-1 Acc.Filename"]
puts $index
}
Also the list of words is the whole line. So if you want to loop through the words, you would do if {$word eq "Horizontal-1 Acc.Filename"} instead and you would have to use count (that I removed in my suggestion) to keep track of the index.
If for some reason you cannot use the csv package, you can try using this instead of the line containing ::csv::split:
set all [regexp -all -inline -- {\"[^\"]+\"|[^,]+} $line]
set words [lmap w $all {set w [string trim $w {\" }]}]
(I'm using \" for the quotes only for the sake of proper syntax highlighting, you can safely use " alone)

tcl : lsort -dictionary replace new lines by spaces

I use the following command to sort the content of a string
set local_object [lsort -dictionary $list_object]
this comand will replace new lines by spaces
how to avoid that ?

lsort assumes that its argument is a Tcl list. Any whitespace including newlines can separate elements of that list, but will not be preserved in the output. If you want to format the output list with one element per line you could do:
set local_object [join [lsort -dictionary $list_object] "\n"]

It all depends on how your list is built. Any string can be interpreted as list. All the white spaces are considered as a delimiter if you're treating string as list.
set str "d b a\n c"
set lst [lsort -dictionary [split $str " "]]
foreach word $lst {
puts $word
}
a
b
c
d
Split has preserved newline and used single space as a delimiter.

Substring extraction in TCL

I'm trying to extract a sequence of characters from a string in TCL.
Say, I have "blahABC:blahDEF:yadamsg=abcd".
I want to extract the substring starting with "msg=" until I reach the end of the string.
Or rather I am interested in extracting "abcd" from the above example string.
Any help is greatly appreciated.
Thanks.

Regular expressions are the tools for these kind of tasks.
The general syntax in Tcl is:
regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?
A simple solution for your task would be:
set string blahblah&msg=abcd&yada
# match pattern for a =, 0-n characters which are not an & and one &. The grouping with {} is necessary due to special charactaer clash between tcl and re_syntax
set exp {=([^&]*)&}
# -> is an idiom. In principle it is the variable containing the whole match, which is thrown away and only the submatch is used
b
regexp $exp $string -> subMatch
set $subMatch
A nice tool to experiment and play with regexps ist Visual Regexp (http://laurent.riesterer.free.fr/regexp/). I'd recommend to download it and start playing.
The relevant man pages are re_syntax, regexp and regsub
Joachim

Another approach: split the query parameter using & as the separator, find the element starting with "msg=" and then get the text after the =
% set string blahblah&msg=abcd&yada
blahblah&msg=abcd&yada
% lsearch -inline [split $string &] {msg=*}
msg=abcd
% string range [lsearch -inline [split $string &] {msg=*}] 4 end
abcd

Code
proc value_of {key matches} {
set index [lsearch $matches "yadamsg"]
if {$index != -1} {
return [lindex $matches $index+1]
}
return ""
}
set x "blahABC:blahDEF:yadamsg=abcd:blahGHI"
set matches [regexp -all -inline {([a-zA-Z]+)=([^:]*)} $x]
puts [value_of "yadamsg" $matches]
Output:
abcd
update
upvar not needed. see comments.

how to perform substring extraction and substitution in tcl

I am trying to extract a substring from a string in Tcl. I wrote the code and able to do it, but I was wondering if there is any other efficient way to do it. So the exact problem is I have a string
name_ext_10a.string_10a.string.string.string
and I want to extract "name_ext", and then remove that "_" and replace it with "."; I finally want the output to be "name.ext". I wrote something like this:
set _File "[string replace $_File [string last "_" $_File] [string length $_File] "" ]"
set _File "[string replace $_File [string last "_" $_File] [string length $_File] "" ]"
set _File "[string replace $_File [string last "_" $_File] [string last "_" $_File] "." ]"
which gives me the exact output I want, but I was wondering if there is any other efficient way to do this in Tcl.

You could split that filename using underscore as a separator, and then join the first 2 elements with a dot:
% set f name_ext_10a.string_10a.string.string.string
name_ext_10a.string_10a.string.string.string
% set out [join [lrange [split $f _] 0 1] .]
name.ext
EDIT
So if "name" can have an arbitrary number of underscores:
set f "foo_bar_baz_ext_10a.string_10a.string.string.string"
set pieces [split $f _]
set name [join [lrange $pieces 0 end-3] _]
set out [join [list $name [lindex $pieces end-2]] .] ;#==> foo_bar_baz.ext
But this is getting complex. One regex should suffice -- I assume "string" can be any sequence of non-underscore chars.
set string {[^_]+}
set regex "^(.+)_($string)_10a.${string}_10a.$string.$string.$string\$"
regexp $regex $f -> name ext
set out "$name.$ext" ;#==> foo_bar_baz.ext

One way to do the extraction is with regsub:
regsub {^([^_]+)_([^_]+)_.*} $_File {\1.\2} _File
The regular expression contains ([^_]+) components, which match a sequence of non-underscore characters, plus an anchor and some underscores, and a trailing non-capturing .* which matches everything else (so we can discard it). The regsub replaces that (which is the whole string) with the concatenation of the two matched non-underscore sections with a . between, and writes it back to the _File variable where the string came from.
Note that I put the regular expression and replacement in braces. This is because they contain Tcl metacharacters (square brackets and backslashes) which I want Tcl to pass into regsub verbatim.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Parse all matches in a string in tcl - string

Here is another approach: think of each line as 3 tokens: key, equal sign, and value: set result {} foreach {key eq_sign value} $temp { lappend result $value } This approach is simple to understand, but it will not work if the value contains spaces.

Related

Powershell - matching a string which might contain whitespace

Finding a word in a string in tcl

tcl : lsort -dictionary replace new lines by spaces

Substring extraction in TCL

how to perform substring extraction and substitution in tcl

Categories

Resources