split string by char - string

scala has a standard way of splitting a string in StringOps.split
it's behaviour somewhat surprised me though.
To demonstrate, using the quick convenience function
def sp(str: String) = str.split('.').toList
the following expressions all evaluate to true
(sp("") == List("")) //expected
(sp(".") == List()) //I would have expected List("", "")
(sp("a.b") == List("a", "b")) //expected
(sp(".b") == List("", "b")) //expected
(sp("a.") == List("a")) //I would have expected List("a", "")
(sp("..") == List()) // I would have expected List("", "", "")
(sp(".a.") == List("", "a")) // I would have expected List("", "a", "")
so I expected that split would return an array with (the number a separator occurrences) + 1 elements, but that's clearly not the case.
It is almost the above, but remove all trailing empty strings, but that's not true for splitting the empty string.
I'm failing to identify the pattern here. What rules does StringOps.split follow?
For bonus points, is there a good way (without too much copying/string appending) to get the split I'm expecting?

For curious you can find the code here.https://github.com/scala/scala/blob/v2.12.0-M1/src/library/scala/collection/immutable/StringLike.scala
See the split function with the character as an argument(line 206).
I think, the general pattern going on over here is, all the trailing empty splits results are getting ignored.
Except for the first one, for which "if no separator char is found then just send the whole string" logic is getting applied.
I am trying to find if there is any design documentation around these.
Also, if you use string instead of char for separator it will fall back to java regex split. As mentioned by #LRLucena, if you provide the limit parameter with a value more than size, you will get your trailing empty results. see http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)

You can use split with a regular expression. I´m not sure, but I guess that the second parameter is the largest size of the resulting array.
def sp(str: String) = str.split("\\.", str.length+1).toList

Seems to be consistent with these three rules:
1) Trailing empty substrings are dropped.
2) An empty substring is considered trailing before it is considered leading, if applicable.
3) First case, with no separators is an exception.

split follows the behaviour of http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
That is split "around" the separator character, with the following exceptions:
Regardless of anything else, splitting the empty string will always give Array("")
Any trailing empty substrings are removed
Surrogate characters only match if the matched character is not part of a surrogate pair.

Related

Python - Replacing repeated consonants with other values in a string

I want to write a function that, given a string, returns a new string in which occurences of a sequence of the same consonant with 2 or more elements are replaced with the same sequence except the first consonant - which should be replaced with the character 'm'.
The explanation was probably very confusing, so here are some examples:
"hello world" should return "hemlo world"
"Hannibal" should return "Hamnibal"
"error" should return "emror"
"although" should return "although" (returns the same string because none of the characters are repeated in a sequence)
"bbb" should return "mbb"
I looked into using regex but wasn't able to achieve what I wanted. Any help is appreciated.
Thank you in advance!
Regex is probably the best tool for the job here. The 'correct' expression is
test = """
hello world
Hannibal
error
although
bbb
"""
output = re.sub(r'(.)\1+', lambda g:f'm{g.group(0)[1:]}', test)
# '''
# hemlo world
# Hamnibal
# emror
# although
# mbb
# '''
The only real complicated part of this is the lambda that we give as an argument. re.sub() can accept one as its 'replacement criteria' - it gets passed a regex object (which we call .group(0) on to get the full match, i.e. all of the repeated letters) and should output a string, with which to replace whatever was matched. Here, we use it to output the character 'm' followed by the second character onwards of the match, in an f-string.
The regex itself is pretty straightforward as well. Any character (.), then the same character (\1) again one or more times (+). If you wanted just alphanumerics (i.e. not to replace duplicate whitespace characters), you could use (\w) instead of (.)

how to move special char in a string from beginning to the end in crystal reports

in my program I (after many procedures) get tokenized words. Unfortunately due to reversing them they hold punctuation characters at the beginning of a word eg. "BARGE "UR 106
How to move that " from the beginning to the end -> BARGE "UR 106"
another example: (.REIS (HASAN M should be -> REIS (HASAN M.)
Up to now I've tried:
{DOCHS14.SHEM__ONIA} startswith ["\"","\(","\."\,"\(\."]
then
Local StringVar str:={DOCHS14.SHEM__ONIA}[0]
TrimLeft ({DOCHS14.SHEM__ONIA})
{DOCHS14.SHEM__ONIA}&str;
But that gives me errors:
A number, currency amount, boolean, date, time, date-time, or string is expected here.
How to fix that? or is there another way to solve this problem?
There are multiple issues in your formula.
startswith expects a string not an array of strings.
Only the double qoute must be escaped, but you did it wrong. (See here)
While a solution with startswith is also possible, I have used the Left-function instead. In your third example one have to check two characters, so this must be checked first and output another result.
if Left({DOCHS14.SHEM__ONIA}, 2) = "(." Then
Mid({DOCHS14.SHEM__ONIA}, 3) + ".)"
else if Left({DOCHS14.SHEM__ONIA}, 1) in ["""", "(", ".", ","] Then
Mid({DOCHS14.SHEM__ONIA}, 2) + Left({DOCHS14.SHEM__ONIA}, 1)
else
{DOCHS14.SHEM__ONIA}

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

How to check if two strings can be made equal by using recursion?

I am trying to practice recursion, but at the moment I don't quite understand it well...
I want to write a recursive Boolean function which takes 2 strings as arguments, and returns true if the second string can be made equal to the first by replacing some letters with a certain special character.
I'll demonstrate what I mean:
Let s1 = "hello", s2 = "h%lo", where '%' is the special character.
The function will return true since '%' can replace "el", causing the two strings to be equal.
Another example:
Let s1 = "hello", s2 = "h%l".
The function will return false since an 'o' is lacking in the second string, and there is no special character that can replace the 'o' (h%l% would return true).
Now the problem isn't so much with writing the code, but with understanding how to solve the problem in general, I don't even know where to begin.
If someone could guide me in the right direction I would be very grateful, even by just using English words, I'll try to translate it to code (Java)...
Thank you.
So this is relatively easy to do in Python. The method I chose was to put the first string ("hello") into an array then iterate over the second string ("h%lo") comparing the elements to those in the array. If the element was in the array i.e. 'h', 'l', 'o' then I would pop it from the array. The resulting array is then ['e','l']. The special character can be found as it is the element which does not exist in the initial array.
One can then substitute the special character for the joined array "el" in the string and compare with the first string.
In the first case this will give "hello" == "hello" -> True
In the second case this will give "hello" == "helol" -> False
I hope this helps and makes sense.

Splitting a Comma-Separated String in Scala: Missing Trailing Empty Strings?

I have a data file in csv format.
I am trying to split each line using the basic split command line.split(',')
But when I get a string like this "2,,",
instead of returning an array as thus Array(2,"","")
I just get an Array: Array(2).
I am most definitely missing something basic, could someone help point out the correct way to split a comma separated string here?
This is inherited from Java. You can achieve behavior you want by using the split(String regex, int limit) overload:
"2,,".split(",", -1) // = Array(2, "", "")
Note the String instead of Char.
As explained by the Java Docs, the limit parameter is used as follows:
The limit parameter controls the number of times the pattern is
applied and therefore affects the length of the resulting array. If
the limit n is greater than zero then the pattern will be applied at
most n - 1 times, the array's length will be no greater than n, and
the array's last entry will contain all input beyond the last matched
delimiter. If n is non-positive then the pattern will be applied as
many times as possible and the array can have any length. If n is zero
then the pattern will be applied as many times as possible, the array
can have any length, and trailing empty strings will be discarded.
Source
Using split(separator: Char) will call the overload above, using a limit of zero.

Resources