How to split a String delimited by space in groovy - string

I am trying to separate my String using spaces and extract the second text of the String value.
Here is my code -
String title = 'testing abcdefgh.abc.cde.fgh test testing issue'
String[] test = title.split(" ");
String[] newTest = test[1];
println newTest
Here is the output i am getting - [a, b, c, d, ., a, b, c, ., c, d, e, ., f, g, h]
NOw the output i am looking for is abcd.abc.cde.fgh, but i am getting [a, b, c, d, ., a, b, c, ., c, d, e, ., f, g, h]
I have used ("\s+"), ("\s"), just a space inside brackets, single and double quotes enclosing a space, nothing works.

It's because here
String[] newTest = test[1]
You tell groovy to stick the string you want into a string array
So groovy tries to do what you say, and splits the characters out into separate strings
What you want is just a string, not a string array. Try
String newTest = test[1]
Instead

I think the problem is that in Java you have to escape a backslash in a String literal. So while the pattern you want is \s+, you have to put this in Java as "\\s+"

Related

What does "/" signify in split(self, /, sep=None, maxsplit=-1)?

str.split = split(self, /, sep=None, maxsplit=-1)
Return a list of the words in the string, using sep as the delimiter string.
sep
The delimiter according which to split the string.
None (the default value) means split according to any whitespace,
and discard empty strings from the result.
maxsplit
Maximum number of splits to do.
-1 (the default value) means no limit.
The /, which would seem to be a 2nd argument is a new notation to me. What is it doing there?
From What's New in Python 3.8:
Positional-only parameters
There is a new function parameter syntax / to indicate that some function parameters must be specified positionally and cannot be used as keyword arguments.
In the following example, parameters a and b are positional-only, while c or d can be positional or keyword, and e or f are required to be keywords:
def f(a, b, /, c, d, *, e, f):
print(a, b, c, d, e, f)
One use case for this notation is that it allows pure Python functions to fully emulate behaviors of existing C coded functions.

Copying string failure in Fortran

I would like to have a procedure which makes a local copy b of input character a (of not assumed length) into an allocatable array of characters. I do have the following code
program test_copystr
character(len=6) :: str
str = 'abc'
call copystr(str)
contains
subroutine copystr(a)
character(len=*), intent(in) :: a
!> Local variables
integer :: i
character, allocatable :: b(:)
allocate(b(len_trim(a)))
do i=1, len_trim(a)
b(i) = a(i:i)
end do
print *, b
b(1:len_trim(a)) = a(1:len_trim(a))
print *, b
end subroutine copystr
end program test_copystr
where I'm trying to assign a to b in two different ways. The result is
abc
aaa
I thought that both assignments should yield the same output. Can anyone explain me that difference? (To compile this code I'm using gfortran 5.2.0 compiler.)
As you know b is an array of characters while a is a scalar; when the subroutine is called it is a 6-character string. These are different things. The statement
b(1:len_trim(a)) = a(1:len_trim(a))
specifies the array section b(1:3) on the lhs, that is all 3 elements of b, and the substring a(1:3) on the rhs. Now, when assigning a substring of length 3 to a single character such as any element of b Fortran assigns only the first character of the string.
In this case every element of b is set to the first character of a. It is as if the compiler generates the 3 statements
b(1) = 'abc'
b(2) = 'abc'
b(3) = 'abc'
to implement the array assignment. This is what Fortran's array syntax does with an array on the lhs and a scalar (expression) on the rhs, it broadcasts the scalar to each element of the array.
The first method you use, looping across the elements of b and the characters of a is the regular way make an array of characters equivalent to a string. But you could try transfer -- see my answer to this question Removing whitespace in string

Lua: Substitute list of characters in string

Is it possible to substitute characters according to a list in Lua, like tr in Perl? For example, I would like to substitute A to B and B to A (e.g. AABBCC becomes BBAACC).
In Perl, the solution would be $str ~= tr/AB/BA/. Is there any native way of doing this in Lua? If not, I think the best solution would be iterating through the entire string, since separate substitutions need to use a special symbol to distinguish characters that were already substituted and characters that weren't.
Edit: my goal was to calculate the reverse complement of a DNA string, as described here.
string.gsub can take a table as the third argument. The table is queried for each match, using the first capture as the key, and the associated value is used as the replacement string. If the value is nil, the match is not changed.
So you can build a helper table like this:
local s = "AABBCC"
local t = {A = "B", B = "A"}
local result = string.gsub(s, "[AB]", t)
print(result)
or this same one-liner:
print((string.gsub("AABBCC", "[AB]", {A = "B", B = "A"})))
Output:
BBAACC
For a one character pattern like "[AB]", "." can work as well because whatever not found in the table won't be changed. (But I don't think that's more efficient) But for some more complicated cases, a good pattern is needed.
Here is an example from Programming in Lua: this function substitutes the value of the global variable varname for every occurrence of $varname in a string:
function expand (s)
return (string.gsub(s, "$(%w+)", _G))
end
The code below will replace each character with a desired mapping (or leave alone if no mapping exists). You could modify the second parameter to string.gsub in tr to be more specific if you know the exact range of characters.
s = "AABBCC"
mappings = {["A"]="B",["B"]="A"}
function tr(s,mappings)
return string.gsub(s,
"(.)",
function(m)
-- print("found",m,"replace with",mappings[m],mappings[m] or m)
if mappings[m] == nil then return m else return mappings[m] end
end
)
end
print(tr(s,mappings))
Outputs
henry#henry-pc:~/Desktop$ lua replace.lua
found A replace with B B
found A replace with B B
found B replace with A A
found B replace with A A
found C replace with nil C
found C replace with nil C
BBAACC 6

scala string.split does not work

Following is my REPL output. I am not sure why string.split does not work here.
val s = "Pedro|groceries|apple|1.42"
s: java.lang.String = Pedro|groceries|apple|1.42
scala> s.split("|")
res27: Array[java.lang.String] = Array("", P, e, d, r, o, |, g, r, o, c, e, r, i, e, s, |, a, p, p, l, e, |, 1, ., 4, 2)
If you use quotes, you're asking for a regular expression split. | is the "or" character, so your regex matches nothing or nothing. So everything is split.
If you use split('|') or split("""\|""") you should get what you want.
| is a special regular expression character which is used as a logical operator for OR operations.
Since java.lang.String#split(String regex); takes in a regular expression, you're splitting the string with "none OR none", which is a whole another speciality about regular expression splitting, where none essentially means "between every single character".
To get what you want, you need to escape your regex pattern properly. To escape the pattern, you need to prepend the character with \ and since \ is a special String character (think \t and \r for example), you need to actually double escape so that you'll end up with s.split("\\|").
For full Java regular expression syntax, see java.util.regex.Pattern javadoc.
Split takes a regex as first argument, so your call is interpreted as "empty string or empty string". To get the expected behavior you need to escape the pipe character "\\|".

Sentence that uses every base64 character

I am trying to construct a sentence/letter combination that will return every base64 character, but failing to find a word for purposes of unit testing.
The unit tests I have so far are failing to hit the lines that handle the + and / characters. While I can sling a them at the encoder/decoder directly it would be nice to have a human readable source (the base64 equivalent of 'the quick brown dog').
Here is a Base64 encoded test string that includes all 64 possible Base64 symbols:
char base64_encoded_test[] =
"U28/PHA+VGhpcyA0LCA1LCA2LCA3LCA4LCA5LCB6LCB7LCB8LCB9IHRlc3RzIEJhc2U2NCBlbmNv"
"ZGVyLiBTaG93IG1lOiBALCBBLCBCLCBDLCBELCBFLCBGLCBHLCBILCBJLCBKLCBLLCBMLCBNLCBO"
"LCBPLCBQLCBRLCBSLCBTLCBULCBVLCBWLCBXLCBYLCBZLCBaLCBbLCBcLCBdLCBeLCBfLCBgLCBh"
"LCBiLCBjLCBkLCBlLCBmLCBnLCBoLCBpLCBqLCBrLCBsLCBtLCBuLCBvLCBwLCBxLCByLCBzLg==";
char base64url_encoded_test[] =
"U28_PHA-VGhpcyA0LCA1LCA2LCA3LCA4LCA5LCB6LCB7LCB8LCB9IHRlc3RzIEJhc2U2NCBlbmNv"
"ZGVyLiBTaG93IG1lOiBALCBBLCBCLCBDLCBELCBFLCBGLCBHLCBILCBJLCBKLCBLLCBMLCBNLCBO"
"LCBPLCBQLCBRLCBSLCBTLCBULCBVLCBWLCBXLCBYLCBZLCBaLCBbLCBcLCBdLCBeLCBfLCBgLCBh"
"LCBiLCBjLCBkLCBlLCBmLCBnLCBoLCBpLCBqLCBrLCBsLCBtLCBuLCBvLCBwLCBxLCByLCBzLg==";
It decodes to a string composed entirely of relatively human-readable text:
char test_string[] = "So?<p>"
"This 4, 5, 6, 7, 8, 9, z, {, |, } tests Base64 encoder. "
"Show me: #, A, B, C, D, E, F, G, H, I, J, K, L, M, "
"N, O, P, Q, R, S, T, U, V, W, X, Y, Z, [, \\, ], ^, _, `, "
"a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s.";
This decoded string contains only letters in the limited range of isprint()'able 7-bit ASCII characters (space through '~').
Since I did it, I would argue that it is possible :-).
You probably can't do that.
/ in base64 encodes 111111 (6 '1' bits).
As all ASCII (which are the type-able and printable characters) are in the range of 0-127 (i.e. 00000000 and 01111111), the only ASCII character that could be encoded using '/' is the ASCII character with the code 127, which is the non-printable DEL character.
If you allow values higher than 127, you could have a printable but non-typeable string.
When attempting to encode/decode, this is the one place where I break the rule of unit testing a single method at once. You can have methods for encoding or decoding separately, but the only way to tell if you're doing it correctly is to use both encoding and decoding in a single assert. I would use the following psuedo code.
Generate a random string using Path.GetRandomFilename() this string is cryptographically strong
Pass the string to the encode method
Pass the output of the encode to the decode method
Assert.AreEqual(input from GetRandomFilename, output from Decode)
You can loop over this as many times as you want in order to say it's tested. You can also cover some specific cases; however, since encoding (sometimes) differs based on the positioning of the letters, you're better off going with a random string and just calling encode/decode about 50 or so times.
If you find that encoding/decoding fails in accepted scenarios, create unit tests for those and filter out the strings that contain those characters/character combinations. Also, document those failures in XMLDocs comments, code comments, and any documentation your app has.
What I came up with, may prove not unuseful. Needs to be entered exactly as is: I include a link to a screenshot showing all the usually invisible characters below, as well as the Base64 data string to which it converts, and a table of the relevant statistics pertinent to each of the 64 characters therein.
<HTML><HEAD></HEAD><BODY><PRE>
Did
THE
THE QUICK BROWN FOX
jump
over
the
lazy
dogs
or
was
he
pushed
?
</PRE><B>hmm.</B></BODY><HTML>
ÿß®Þ~c*¯/
This encodes to the Base64 string:
PEhUTUw+PEhFQUQ+PC9IRUFEPjxCT0RZPjxQUkU+DQpEaWQJDQoNCiBUSEUJDQoNCiAgVEhFIFFVSUNLIEJST1dOIEZPWAkNCg0KICAganVtcAkNCg0KICAgIG92ZXIJDQoNCiAgICAgdGhlCQ0KDQogICAgICBsYXp5CQ0KDQogICAgICAgZG9ncwkNCg0KICAgICAgICBvcgkNCg0KICAgICAgICAgd2FzCQ0KDQogICAgICAgICAgaGUJDQoNCiAgICAgICAgICAgcHVzaGVkCQ0KDQogICAgICAgICAgICA/CQ0KDQo8L1BSRT48Qj5obW0uPC9CPjwvQk9EWT48SFRNTD4NCg0KDQoNCg0KDQoNCg//367efmMqry/==
which contains
5--/'s
4--+'s
3--='s
14--0's
3--1's
3--2's
2--3's
4--4's
3--5's
2--6's
2--7's
4--8's
6--9's
5--a's
27--A's
2--b's
5--B's
5--c's
4--C's
4--d's
14--D's
2--e's
10--E's
2--f's
8--F's
36--g's
6--G's
5--h's
2--H's
5--i's
30--I's
5--j's
6--J's
8--k's
12--K's
2--l's
3--L's
2--m's
4--M's
3--n's
14--N's
13--o's
2--O's
3--p's
9--P's
2--q's
24--Q's
2--r's
5--R's
2--s's
6--S's
2--t's
7--T's
2--u's
1--U's
3--v's
6--V's
4--w's
5--W's
3--x's
6--X's
2--y's
4--Y's
3--z's
5--Z's

Resources