Kotlin String.split, ignore when delimiter is inside a quote - string

I have a string:
Hi there, "Bananas are, by nature, evil.", Hey there.
I want to split the string with commas as the delimiter. How do I get the .split method to ignore the comma inside the quotes, so that it returns 3 strings and not 5.

You can use regex in split method
According to this answer the following regex only matches , outside of the " mark
,(?=(?:[^\"]\"[^\"]\")[^\"]$)
so try this code:
str.split(",(?=(?:[^\\\"]*\\\"[^\\\"]*\\\")*[^\\\"]*\$)".toRegex())

You can use split overload that accepts regular expressions for that:
val text = """Hi there, "Bananas are, by nature, evil.", Hey there."""
val matchCommaNotInQuotes = Regex("""\,(?=([^"]*"[^"]*")*[^"]*$)""")
println(text.split(matchCommaNotInQuotes))
Would print:
[Hi there, "Bananas are, by nature, evil.", Hey there.]
Consider reading this answer on how the regular expression works in this case.

You have to use a regular expression capable of handling quoted values. See Java: splitting a comma-separated string but ignoring commas in quotes and C#, regular expressions : how to parse comma-separated values, where some values might be quoted strings themselves containing commas
The following code shows a very simple version of such a regular expression.
fun main(args: Array<String>) {
"Hi there, \"Bananas are, by nature, evil.\", Hey there."
.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)".toRegex())
.forEach { println("> $it") }
}
outputs
> Hi there
> "Bananas are, by nature, evil."
> Hey there.
Be aware of the regex backtracking problem: https://www.regular-expressions.info/catastrophic.html. You might be better off writing a parser.

If you don't want regular expressions:
val s = "Hi there, \"Bananas are, by nature, evil.\", Hey there."
val hold = s.substringAfter("\"").substringBefore("\"")
val temp = s.split("\"")
val splitted: MutableList<String> = (temp[0] + "\"" + temp[2]).split(",").toMutableList()
splitted[1] = "\"" + hold + "\""
splitted is the List you want

Related

Escape triple quote within kotlin raw string

I'm trying to create a raw string that contains three quotes in itself.
The resulting string x should contain something like """abc""".
I've been able to create the string with the following code, but was wondering if there's a simpler solution for this.
val x = """${'"'.toString().repeat(3)}abc${'"'.toString().repeat(3)}"""
There's no easy way to use a triple quote directly in a string literal.
One workaround I've sometimes used is to make an interim variable to hold the triple-quote string.
val quotes = "\"\"\""
val result = "${quotes}abc${quotes}"
I think a simpler way would be to escape them manually, so like:
val x = "\"\"\"abc\"\"\""

Templates escaping in Kotlin multiline strings

If I want to use $ sign in multiline strings, how do I escape it?
val condition = """ ... $eq ... """
$eq is parsed as a reference to a variable. How to escape $, so that it will not be recognized as reference to variable? (Kotlin M13)
From the documentation
A raw string is delimited by a triple quote ("""), contains no
escaping and can contain newlines and any other character
You would need to use a standard string with newlines
" ...\n \$eq \n ... "
or you could use the literal representation
""" ... ${'$'}eq ... "
Funny, but that works:
val eq = "\$eq"
print("""... $eq ..."""") // just like you asked :D
Actually, if eq is a number (a price, or sth), then you probably want to calculate it separately, and an additional external calculation as I suggested won't hurt.
In the case where you know ahead of time what $-variables you want (like when querying Mongo, as it looks like you might be doing), you can create a little helper object that defines those variables. You also get some protection against accidentally misspelling one of your operators, which is neat.
object MongoString {
inline operator fun invoke(callback: MongoString.() -> String) = callback()
val eq = "\$eq"
val lt = "\$lt"
// ... and all the other operators ...
}
fun test() {
val query = MongoString { """{"foo": {$lt: 10}}""" }
}
I wrote simple versions for update and query strings for mongo here: https://gist.github.com/Yona-Appletree/29be816ca74a0d93cdf9e6f5e23dda15

Convert underscores to spaces in Matlab string?

So say I have a string with some underscores like hi_there.
Is there a way to auto-convert that string into "hi there"?
(the original string, by the way, is a variable name that I'm converting into a plot title).
Surprising that no-one has yet mentioned strrep:
>> strrep('string_with_underscores', '_', ' ')
ans =
string with underscores
which should be the official way to do a simple string replacements. For such a simple case, regexprep is overkill: yes, they are Swiss-knifes that can do everything possible, but they come with a long manual. String indexing shown by AndreasH only works for replacing single characters, it cannot do this:
>> s = 'string*-*with*-*funny*-*separators';
>> strrep(s, '*-*', ' ')
ans =
string with funny separators
>> s(s=='*-*') = ' '
Error using ==
Matrix dimensions must agree.
As a bonus, it also works for cell-arrays with strings:
>> strrep({'This_is_a','cell_array_with','strings_with','underscores'},'_',' ')
ans =
'This is a' 'cell array with' 'strings with' 'underscores'
Try this Matlab code for a string variable 's'
s(s=='_') = ' ';
If you ever have to do anything more complicated, say doing a replacement of multiple variable length strings,
s(s == '_') = ' ' will be a huge pain. If your replacement needs ever get more complicated consider using regexprep:
>> regexprep({'hi_there', 'hey_there'}, '_', ' ')
ans =
'hi there' 'hey there'
That being said, in your case #AndreasH.'s solution is the most appropriate and regexprep is overkill.
A more interesting question is why you are passing variables around as strings?
regexprep() may be what you're looking for and is a handy function in general.
regexprep('hi_there','_',' ')
Will take the first argument string, and replace instances of the second argument with the third. In this case it replaces all underscores with a space.
In Matlab strings are vectors, so performing simple string manipulations can be achieved using standard operators e.g. replacing _ with whitespace.
text = 'variable_name';
text(text=='_') = ' '; //replace all occurrences of underscore with whitespace
=> text = variable name
I know this was already answered, however, in my case I was looking for a way to correct plot titles so that I could include a filename (which could have underscores). So, I wanted to print them with the underscores NOT displaying with as subscripts. So, using this great info above, and rather than a space, I escaped the subscript in the substitution.
For example:
% Have the user select a file:
[infile inpath]=uigetfile('*.txt','Get some text file');
figure
% this is a problem for filenames with underscores
title(infile)
% this correctly displays filenames with underscores
title(strrep(infile,'_','\_'))

Multiline string literal in Matlab?

Is there a multiline string literal syntax in Matlab or is it necessary to concatenate multiple lines?
I found the verbatim package, but it only works in an m-file or function and not interactively within editor cells.
EDIT: I am particularly after readbility and ease of modifying the literal in the code (imagine it contains indented blocks of different levels) - it is easy to make multiline strings, but I am looking for the most convenient sytax for doing that.
So far I have
t = {...
'abc'...
'def'};
t = cellfun(#(x) [x sprintf('\n')],t,'Unif',false);
t = horzcat(t{:});
which gives size(t) = 1 8, but is obviously a bit of a mess.
EDIT 2: Basically verbatim does what I want except it doesn't work in Editor cells, but maybe my best bet is to update it so it does. I think it should be possible to get current open file and cursor position from the java interface to the Editor. The problem would be if there were multiple verbatim calls in the same cell how would you distinguish between them.
I'd go for:
multiline = sprintf([ ...
'Line 1\n'...
'Line 2\n'...
]);
Matlab is an oddball in that escape processing in strings is a function of the printf family of functions instead of the string literal syntax. And no multiline literals. Oh well.
I've ended up doing two things. First, make CR() and LF() functions that just return processed \r and \n respectively, so you can use them as pseudo-literals in your code. I prefer doing this way rather than sending entire strings through sprintf(), because there might be other backslashes in there you didn't want processed as escape sequences (e.g. if some of your strings came from function arguments or input read from elsewhere).
function out = CR()
out = char(13); % # sprintf('\r')
function out = LF()
out = char(10); % # sprintf('\n');
Second, make a join(glue, strs) function that works like Perl's join or the cellfun/horzcat code in your example, but without the final trailing separator.
function out = join(glue, strs)
strs = strs(:)';
strs(2,:) = {glue};
strs = strs(:)';
strs(end) = [];
out = cat(2, strs{:});
And then use it with cell literals like you do.
str = join(LF, {
'abc'
'defghi'
'jklm'
});
You don't need the "..." ellipses in cell literals like this; omitting them does a vertical vector construction, and it's fine if the rows have different lengths of char strings because they're each getting stuck inside a cell. That alone should save you some typing.
Bit of an old thread but I got this
multiline = join([
"Line 1"
"Line 2"
], newline)
I think if makes things pretty easy but obviously it depends on what one is looking for :)

What characters can I use to quote this ruby string?

I'm embedding JRuby in Java, because I need to call some Ruby methods with Java strings as arguments. The thing is, I'm calling the methods like this:
String text = ""; // this can span over multiple lines, and will contain ruby code
Ruby ruby = Ruby.newInstance();
RubyRuntimeAdapter adapter = JavaEmbedUtils.newRuntimeAdapter();
String rubyCode = "require \"myscript\"\n" +
"str = build_string(%q~"+text+"~)\n"+
"str";
IRubyObject object = adapter.eval(ruby, codeFormat);
The thing is, I don't know what strings I can use as delimiters, because if the ruby code I'm sending to build_string will contain ruby code. Right know I'm using ~,but I think this could break my code. What characters can I use as delimiters to make sure my code will work no matter what the ruby code is?
use the heredoc format:
"require \"myscript\"\n" +
"str = build_string(<<'THISSHOUDLNTBE'\n" + text + "\nTHISSHOULDNTBE\n)\n"+
"str";
this however assumes you won't have "THISSHOULDNTBE" on a separate line in the input.
Since string text contain contain any character, there is no character left to use for quotation escaping like the ~ you're using now. You would still need to escape the tilde in string text in java and append that one to the string you're building.
Something like (untested, not a Java guru):
String rubyCode = "require \"myscript\"\n" +
"str = build_string(%q~" + text.replaceAll("~", "\\~") + "~)\n"+
"str";

Resources