Lua pattern for parsing strings with optional part - string

I have to parse a string in the form value, value, value, value, value. The two last values are optional. This is my code, but it works only for the required arguments:
Regex = "([^,])+, ([^,])+, ([^,])+"
I'm using string.match to get the value into variables.

Since you're splitting the string by a comma, use gmatch:
local tParts = {}
for sMatch in str:gmatch "([^,]+)" do
table.insert( tParts, sMatch )
end
Now, once the parts are stored inside the table; you can check if the table contains matched groups at indexes 4 and 5 by:
if tParts[4] and tParts[5] then
-- do your job
elseif tParts[3] then
-- only first three matches were there
end

In Lua you can't make a capturing group optional, and also you are not able to use a logical OR operator. So the answer is: It isn't possible.

Related

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

Matlab: How to delete prefix from strings

Problem: From TrajCompact, i find all the prefix and the value after prefix, using regexp, with this code:
[digits{1:2}] = ndgrid(0:4);
for k=1:25
matches(:,k)=regexp(TrajCompact(:,1),sprintf('%d%d.*',digits{1}(k),digits{2}(k)),'match','once');
end
I want only the postfix of matches, how can delete the prefix from matches?
Method using regular expressions
You can put the .* section in a group by enclosing it in parenthesis (i.e. (.*)). Matlab has some peculiar 'token' nomenclature for this. In any case, an example of how it works:
[match, group] = regexp('25blah',sprintf('%d%d(.*)',2,5),'match','once','tokens');
Then:
match would be a char array containing '25blah'
group would be a 1x1 cell array containing the string 'blah'.
That is, the variable group would hold what you're looking for.
Hack method
Since your prefix is always two digits, you could also just take everything from the 3rd character of the match onwards:
my_string = match(3:end);
other comments
You may want to require the prefix to occur at the beginning of the string by adding ^ to the beginning of your regular expression. Eg., make the line:
[match, group] = regexp('25blah',sprintf('^%d%d(.*)',2,5),'match','once','tokens');
As it is, your current regular expression would match strings like zzzzzzzzz25stuff. I'm not sure if you want that (assuming it can occur in your data).

split string by char

scala has a standard way of splitting a string in StringOps.split
it's behaviour somewhat surprised me though.
To demonstrate, using the quick convenience function
def sp(str: String) = str.split('.').toList
the following expressions all evaluate to true
(sp("") == List("")) //expected
(sp(".") == List()) //I would have expected List("", "")
(sp("a.b") == List("a", "b")) //expected
(sp(".b") == List("", "b")) //expected
(sp("a.") == List("a")) //I would have expected List("a", "")
(sp("..") == List()) // I would have expected List("", "", "")
(sp(".a.") == List("", "a")) // I would have expected List("", "a", "")
so I expected that split would return an array with (the number a separator occurrences) + 1 elements, but that's clearly not the case.
It is almost the above, but remove all trailing empty strings, but that's not true for splitting the empty string.
I'm failing to identify the pattern here. What rules does StringOps.split follow?
For bonus points, is there a good way (without too much copying/string appending) to get the split I'm expecting?
For curious you can find the code here.https://github.com/scala/scala/blob/v2.12.0-M1/src/library/scala/collection/immutable/StringLike.scala
See the split function with the character as an argument(line 206).
I think, the general pattern going on over here is, all the trailing empty splits results are getting ignored.
Except for the first one, for which "if no separator char is found then just send the whole string" logic is getting applied.
I am trying to find if there is any design documentation around these.
Also, if you use string instead of char for separator it will fall back to java regex split. As mentioned by #LRLucena, if you provide the limit parameter with a value more than size, you will get your trailing empty results. see http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)
You can use split with a regular expression. I´m not sure, but I guess that the second parameter is the largest size of the resulting array.
def sp(str: String) = str.split("\\.", str.length+1).toList
Seems to be consistent with these three rules:
1) Trailing empty substrings are dropped.
2) An empty substring is considered trailing before it is considered leading, if applicable.
3) First case, with no separators is an exception.
split follows the behaviour of http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
That is split "around" the separator character, with the following exceptions:
Regardless of anything else, splitting the empty string will always give Array("")
Any trailing empty substrings are removed
Surrogate characters only match if the matched character is not part of a surrogate pair.

Algorithms for "shortening" strings?

I am looking for elegant ways to "shorten" the (user provided) names of object. More precisely:
my users can enter free text (used as "name" of some object), they can use up to 64 chars (including whitespaces, punctuation marks, ...)
in addition to that "long" name; we also have a "reduced" name (exactly 8 characters); required for some legacy interface
Now I am looking for thoughts on how to generate these "reduced" names, based on the 64-char name.
With "elegant" I am wondering about any useful ideas that "might" allow the user to recognize something with value within the shortened string.
Like, if the name is "Production Test Item A5"; then maybe "PTIA5" might (or might not) tell the user something useful.
Apply a substring method to the long version, trim it, in case there are any whitespace characters at the end, optionally remove any special characters from the very end (such as dashes) and finally add a dot, in case you want to indicate your abbreviation that way.
Just a quick hack to get you started:
String longVersion = "Aswaghtde-5d";
// Get substring 0..8 characters
String shortVersion = longVersion.substring(0, (longVersion.length() < 8 ? longVersion.length() : 8));
// Remove whitespace characters from end of String
shortVersion = shortVersion.trim();
// Remove any non-characters from end of String
shortVersion = shortVersion.replaceAll("[^a-zA-Z0-9\\s]+$", "");
// Add dot to end
shortVersion = shortVersion.substring(0, (shortVersion.length() < 8 ? shortVersion.length() : shortVersion.length() - 1)) + ".";
System.out.println(shortVersion);
I needed to shorten names to function as column names in a database. Ideally, the names should be recognizable for users. I set up a dictionary of patterns for commonly occuring words with corresponding "abbreviations". This was applied ONLY to those names which were over the limit of 30 characters.

Splitting a Comma-Separated String in Scala: Missing Trailing Empty Strings?

I have a data file in csv format.
I am trying to split each line using the basic split command line.split(',')
But when I get a string like this "2,,",
instead of returning an array as thus Array(2,"","")
I just get an Array: Array(2).
I am most definitely missing something basic, could someone help point out the correct way to split a comma separated string here?
This is inherited from Java. You can achieve behavior you want by using the split(String regex, int limit) overload:
"2,,".split(",", -1) // = Array(2, "", "")
Note the String instead of Char.
As explained by the Java Docs, the limit parameter is used as follows:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
Source
Using split(separator: Char) will call the overload above, using a limit of zero.

Resources