Substring extraction in TCL - string

I'm trying to extract a sequence of characters from a string in TCL.
Say, I have "blahABC:blahDEF:yadamsg=abcd".
I want to extract the substring starting with "msg=" until I reach the end of the string.
Or rather I am interested in extracting "abcd" from the above example string.
Any help is greatly appreciated.
Thanks.

Regular expressions are the tools for these kind of tasks.
The general syntax in Tcl is:
regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?
A simple solution for your task would be:
set string blahblah&msg=abcd&yada
# match pattern for a =, 0-n characters which are not an & and one &. The grouping with {} is necessary due to special charactaer clash between tcl and re_syntax
set exp {=([^&]*)&}
# -> is an idiom. In principle it is the variable containing the whole match, which is thrown away and only the submatch is used
b
regexp $exp $string -> subMatch
set $subMatch
A nice tool to experiment and play with regexps ist Visual Regexp (http://laurent.riesterer.free.fr/regexp/). I'd recommend to download it and start playing.
The relevant man pages are re_syntax, regexp and regsub
Joachim

Another approach: split the query parameter using & as the separator, find the element starting with "msg=" and then get the text after the =
% set string blahblah&msg=abcd&yada
blahblah&msg=abcd&yada
% lsearch -inline [split $string &] {msg=*}
msg=abcd
% string range [lsearch -inline [split $string &] {msg=*}] 4 end
abcd

Code
proc value_of {key matches} {
set index [lsearch $matches "yadamsg"]
if {$index != -1} {
return [lindex $matches $index+1]
}
return ""
}
set x "blahABC:blahDEF:yadamsg=abcd:blahGHI"
set matches [regexp -all -inline {([a-zA-Z]+)=([^:]*)} $x]
puts [value_of "yadamsg" $matches]
Output:
abcd
update
upvar not needed. see comments.

Related

TCL: How to remove all letters/numbers from a string?

I am using tcl programming language and trying to remove all the letters or numbers from a string. From this example, I know a general way to remove all the letters from a string (e.x. set s abcdefg0123456) is
set new_s [string trim $s "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXXYZ"]
If I want to remove all numbers from a string in general, I can do
set new_s [string trim $s "0123456789"]
Is there a more straightforward way to remove all letters/numbers?
I also notice if I want to remove a portion of numbers (e.x. 012) instead of all numbers, the following does NOT work.
set new_s [string trim $s "012"]
Can someone explain why?
Use regular expressions:
set s abcdefg0123456
regsub -all {\d+} $s {} new_s ;# Remove all digits
regsub -all {[[:alpha:]]+} $s {} new_s ;# Remove all letters
To answer your other question: string trim (and string trimleft and string trimright as “half” versions) removes a set of characters from the ends of a string (and returns the new string; it's a pure functional operation). It doesn't do anything to the interior of the string. It doesn't know anything about patterns. The default set of characters removed is “whitespace” (spaces, newlines, tabs, etc.)
When you do:
set new_s [string trim $s "012"]
You are setting the removal set to 0, 1 and 2, but it is still only the ends that get removed. Thus it will leave x012101210y entirely alone, but turn 012101210 into the empty string.

Drop (substract) last n characters from a variable string length

I need to drop (or replace to nothing) last n characters of a string in powershell code. The variant could be with substraction string form a string (didn't find my answer).
I have something like this (string):
something/something/../lastsomething/NAME
where NAME is a variable text I can extract beforehand and manipulate ($name or $name.length). And the whole string can be counted - $string.length.
How can I substract this NAME from a string ($string)? I've searched many ways, including trim,replace,substring - but all of these mostly work with static words or regex, or with the begining of a string.
I need to get this:
something/something/../lastsomething
I've tried even such constructions:
$string.split('($NAME)')[0]
and
$string.split('[string]($NAME)')[0]
and other with get-AD* functions with join to bypass the strings, but nothing did the trick.
A simple solution is take the substring from beginning (0) to the last occurence of /.
$t = 'something/something/../lastsomething/NAME'
$t.Substring(0, $t.LastIndexOf('/'))
EDIT from your comment the real question is how to get
-replace '($_.Name)',' '
working. The single quotes don't expand variables - so use double quotes.
To force evaluation of $_.Name you have to enclose it with $()
-replace "/$($_.Name)"
With an unknown last element /Name
> $String = 'something/something/../lastsomething/NAME'
> $String.Split('/')[-1]
NAME
> $string = $string -replace "/$($String.Split('/')[-1])"
> $string
something/something/../lastsomething
A much simpler solution is :
> Split-Path $string
something\something\..\lastsomething
> Split-Path $string -Leaf
NAME
but it changes slashes to backslashes
You can replace it with '' (nothing ... empty string) and because -replace works with regular expressions you can make sure that you only get a "match" at the end of the string like this:
$var = '/NAME'
'something/Name/something/../lastsomething/NAME' -replace "$var$",''

Tcl - How to replace ? with -

(You'd think this would be easy, but I'm stumped.)
I'm converting an iOS note to a text file, and the note contains "0." and "?" whenever there is a list or bullet.
This was a bulleted list
? item 20
? Item 21
? Item 22
I'm having so much problem replacing the "?"
I don't want to replace a legitimate question mark at the end of a sentence,
but I want to replace the "?" bullets with "-" (preferably anywhere in the line, not just at the beginning)
I tried these searches - no luck
set line "? item 20"
set index_bullet [string first "(\s|\r|\n)(\?)" $line]
set index_bullet [string first "(!\w)(\?)" $line]
set index_bullet [string first ^\? $line]
This works, but it would match any question mark
set index_bullet [string first \? $line]
Does anyone know what I'm doing wrong?
How do I find and replace only question mark bullets with a "-"?
Thank you very much in advance
If you're really wanting to replace a question mark where you've got a regular expression that describes the rule, the regsub command is the right way. (The string first command finds literal substrings only. The string match command uses globbing rules.) In this case, we'll use the -all option so that every instance is replaced:
set line "? item 20"
set replaced [regsub -all {(\s|^)\?(\s)} $line {\1-\2}]
puts "'$line' --> '$replaced'"
# Prints: '? item 20' --> '- item 20'
The main tricks to using regular expressions in Tcl are, as much as possible, to keep REs and their replacements in braces so that the you can use Tcl metacharacters (e.g., backslash or square brackets) without having to fiddle around a lot.
Also, \s by default will match a newline.
It seems likely that a character used to indicate a list item is the first character on the line or the first character after optional whitespace. To match a question mark at the beginning of a line:
string match {\?*} $line
or
string match \\?* $line
The braces or doubled backslash keeps the question mark from being treated as a string match metacharacter.
To find a question mark after optional whitespace:
string match {\?*} [string trimleft $line]
The command returns 1 if it finds a match, and 0 if it doesn't.
To do this with string first, use
if {[string first ? [string trimleft $line]] eq 0} ...
but in that case, keep in mind that the index returned from string first isn't the true location of the question mark. (Use
== instead of eq if you have an older Tcl).
When you have determined that the line contains a question mark in the first non-whitespace position, a simple
set line [regsub {\?} $line -]
will perform a single substitution regardless of where it is.
Documentation:
regsub,
string,
Syntax of Tcl regular expressions
I figured it out.
I did it in two steps:
1) First find the "?"
set index_bullet [string first "\?" $line]
2) Then filter out "?" that is not a bullet
set index_question_mark [string first "\w\?" $line]
I have a solution, but please post if you have a better way of doing this.
Thanks!

search a specific sub string pattern in a string using perl

I'm a newbie to perl, I went through this Check whether a string contains a substring to how to check a substring is present in a string, Now my scenario is little different
I have a string like
/home/me/Desktop/MyWork/systemfile/directory/systemfile64.elf ,
In the end this might be systemfile32.elf or systemfile16.elf,so In my perl script I need to check whether this string contains a a substring in the format systemfile*.elf.
How can I achieve this in perl ?
I'm planing to do like this
if(index($mainstring, _serach_for_pattern_systemfile*.elf_ ) ~= -1) {
say" Found the string";
}
You can use the pattermatching
if ($string =~ /systemfile\d\d\.elf$/){
# DoSomething
}
\d stands for a digit (0-9)
$ stands for end of string
Well
if( $mainstring =~ m'/systemfile(16|32)\.elf$' ) {
say" Found the string";
}
does the job.
For your informations :
$string =~ m' ... '
is the same than
$string =~ / ... /
which checks the string against the given regular expression. This is one of the most useful features of the Perl language.
More info at http://perldoc.perl.org/perlre.html
(I did use the m'' syntax to improve readability, because of the presence of another '/' character in the regexp. I could also write /\/systemfile\d+\.elf$/
if ($string =~ /systemfile.*\.elf/) {
# Do something with the string.
}
That should match only the strings you seek (given that every time, a given string is stored in $string). Inside the curly brackets you should write your logic.
The . stands for "any character" and the * stands for "as many times you see the last character". So, .* means "any character as many times you see it". If you know that the string will end in this pattern, then it will be safer to add $ at the end of the pattern to mark that the string should end with this:
$string =~ /systemfile.*\.elf$/
Just don't forget to chomp $string to avoid any line-breaks that might mess with your desired output.
use strict;
use warnings;
my $string = 'systemfile16.elf';
if ($string =~ /^systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
will match systemfile'anythinghere'.elf if you have a set directory.
if you want to search entire string, including directory then:
my $string = 'c:\\windows\\system\\systemfile16.elf';
if ($string =~ /systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
if you only want to match 2 systemfile then 2 numeric characters .elf then use the other methods mentioned above by other answers. but if you want systemanything.elf then use one of these.

Best way to extract string between underscores using Perl

I have something like:
$string = '/mfsi_rpt/files/mfsi/reports/bval/bval_parlcont_pck_m_20130430.pdf';
I would like to extract the parlcont from the string (the word between the 2nd and 3rd underscore).
What is the best way to achieve this using Perl?
You can match this with a regular expression, by combining greedy and non-greedy matches, and using capturing parenthesis to extract the part you're interested in:
if( $string =~ m:.+/.*?_(.+?)_:) {
print "$1\n";
}
The ".+/" is a greedy match, which will gobble up everything up to the last / to get past the directory components.
Then the ".*?_" is non-greedy, so it will take everything up to the first _
Then "(.+?)_" is another non-greedy to match and capture everything up to the next _
It would be nice if you first take out the filename from the file path using File::Basename then you can use split to take out the desired name.
use strict;
use File::Basename;
my $string = "/mfsi_rpt/files/mfsi/reports/bval/bval_parlcont_pck_m_20130430.pdf";
my $data = ( split( /_/, basename($string) ))[1];
Output:
parlcont

Resources