How to break a big lua string into small ones - string

I have a big string (a base64 encoded image) and it is 1050 characters long. How can I append a big string formed of small ones, like this in C
function GetIcon()
return "Bigggg string 1"\
"continuation of string"\
"continuation of string"\
"End of string"

According to Programming in Lua 2.4 Strings:
We can delimit literal strings also by matching double square brackets [[...]]. Literals in this bracketed form may run for several lines, may nest, and do not interpret escape sequences. Moreover, this form ignores the first character of the string when this character is a newline. This form is especially convenient for writing strings that contain program pieces; for instance,
page = [[
<HTML>
<HEAD>
<TITLE>An HTML Page</TITLE>
</HEAD>
<BODY>
Lua
[[a text between double brackets]]
</BODY>
</HTML>
]]
This is the closest thing to what you are asking for, but using the above method keeps the newlines embedded in the string, so this will not work directly.
You can also do this with string concatenation (using ..):
value = "long text that" ..
" I want to carry over" ..
"onto multiple lines"

Most answers here solves this issue at run-time and not at compile-time.
Lua 5.2 introduces the escape sequence \z to solve this problem elegantly without incurring any run-time expense.
> print "This is a long \z
>> string with \z
>> breaks in between, \z
>> and is spanning multiple lines \z
>> but still is a single string only!"
This is a long string with breaks in between, and is spanning multiple lines but still is a single string only!
\z skips all subsequent characters in a string literal1 until the first non-space character. This works for non-multiline literal text too.
> print "This is a simple \z string"
This is a simple string
From Lua 5.2 Reference Manual
The escape sequence '\z' skips the following span of white-space characters, including line breaks; it is particularly useful to break and indent a long literal string into multiple lines without adding the newlines and spaces into the string contents.
1: All escape sequences, including \z, work only on short literal strings ("…", '…') and, understandably, not on long literal strings ([[...]], etc.)

I'd put all chunks in a table and use table.concat on it. This avoids the creation of new strings at every concatenation. for example (without counting overhead for strings in Lua):
-- bytes used
foo="1234".. -- 4 = 4
"4567".. -- 4 + 4 + 8 = 16
"89ab" -- 16 + 4 + 12 = 32
-- | | | \_ grand total after concatenation on last line
-- | | \_ second operand of concatenation
-- | \_ first operand of concatenation
-- \_ total size used until last concatenation
As you can see, this explodes pretty rapidly. It's better to:
foo=table.concat{
"1234",
"4567",
"89ab"}
Which will take about 3*4+12=24 bytes.

Have you tried the
string.sub(s, i [, j]) function.
You may like to look here:
http://lua-users.org/wiki/StringLibraryTutorial

This:
return "Bigggg string 1"\
"continuation of string"\
"continuation of string"\
"End of string"
C/C++ syntax causes the compiler to see it all as one large string. It is generally used for readability.
The Lua equivalent would be:
return "Bigggg string 1" ..
"continuation of string" ..
"continuation of string" ..
"End of string"
Do note that the C/C++ syntax is compile-time, while the Lua equivalent likely does the concatenation at runtime (though the compiler could theoretically optimize it). It shouldn't be a big deal though.

Related

What does "?\s" mean in Elixir?

In the Elixir-documentation covering comprehensions I ran across the following example:
iex> for <<c <- " hello world ">>, c != ?\s, into: "", do: <<c>>
"helloworld"
I sort of understand the whole expression now, but I can't figure out what the "?\s" means.
I know that it somehow matches and thus filters out the spaces, but that's where my understanding ends.
Edit: I have now figured out that it resolves to 32, which is the character code of a space, but I still don't know why.
erlang has char literals denoted by a dollar sign.
Erlang/OTP 22 [erts-10.6.1] [...]
Eshell V10.6.1 (abort with ^G)
1> $\s == 32.
%%⇒ true
The same way elixir has char literals that according to the code documentation act exactly as erlang char literals:
This is exactly what Erlang does with Erlang char literals ($a).
Basically, ?\s is exactly the same as ?  (question mark followed by a space.)
# ⇓ space here
iex|1 ▶ ?\s == ?
warning: found ? followed by code point 0x20 (space), please use ?\s instead
There is nothing special with ?\s, as you might see:
for <<c <- " hello world ">>, c != ?o, into: "", do: <<c>>
#⇒ " hell wrld "
Also, ruby as well uses ?c notation for char literals:
main> ?\s == ' '
#⇒ true
? is a literal that gives you the following character's codepoint( https://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html#utf-8-and-unicode). For characters that cannot be expressed literally (space is just one of them, but there are more: tab, carriage return, ...) the escaped sequence should be used instead. So ?\s gives you a codepoint for space:
iex> ?\s
32

java String.format - how to put a space between two characters

I am searching for a way to use a formatter to put a space between two characters. i thought it would be easy with a string formatter.
here is what i am trying to accomplish:
given: "AB" it will produce "A B"
Here is what i have tried so far:
"AB".format("%#s")
but this keep returning "AB" i want "A B". i thought the number sign could be used for space.
i also tried this:
"26".format("%#d") but its still prints "26"
is there anyway to do this with string.formatter.
It is kind of possible with the string formatter although not directly with a pattern.
jshell> String.format("%1$c %2$c", "AB".chars().boxed().toArray())
$10 ==> "A B"
We need to turn the string into an object array so it can be passed in as varargs and the formatter pattern can extract characters based on index (1$ and 2$) and format them as characters (c).
A much simpler regex solution is the following which scales to any number of characters:
jshell> "ABC^&*123".replaceAll(".", "$0 ").trim()
$3 ==> "A B C ^ & * 1 2 3"
All single characters are replaced with them-self ($0) followed by a space. Then the last extra space is removed with the trim() call.
I could not find way to do this using String#format. But here is a way to accomplish this using regex replacement:
String input = "AB";
String output = input.replaceAll("(?<=[A-Z])(?=[A-Z])", " ");
System.out.println(output);
The regex pattern (?<=[A-Z])(?=[A-Z]) will match every position in between two capital letters, and interpolate a space at that point. The above script prints:
A B

Standard ML string to a list

Is there a way in ML to take in a string and output a list of those string where a separation is a space, newline or eof, but also keeping strings inside strings intact?
EX) hello world "my id" is 5555
-> [hello, world, my id, is, 5555]
I am working on a tokenizing these then into:
->[word, word, string, word, int]
Sure you can! Here's the idea:
If we take a string like "Hello World, \"my id\" is 5555", we can split it at the quote marks, ignoring the spaces for now. This gives us ["Hello World, ", "my id", " is 5555"]. The important thing to notice here is that the list contains three elements - an odd number. As long as the string only contains pairs of quotes (as it will if it's properly formatted), we'll always get an odd number of elements when we split at the quote marks.
A second important thing is that all the even-numbered elements of the list will be strings that were unquoted (if we start counting from 0), and the odd-numbered ones were quoted. That means that all we need to do is tokenize the ones that were unquoted, and then we're done!
I put some code together - you can continue from there:
fun foo s =
let
val quoteSep = String.tokens (fn c => c = #"\"") s
val spaceSep = String.tokens (fn c => c = #" ") (* change this to include newlines and stuff *)
fun sepEven [] = []
| sepEven [x] = (* there were no quotes in the string *)
| sepEven (x::y::xs) = (* x was unquoted, y was quoted *)
in
if length quoteSep mod 2 = 0
then (* there was an uneven number of quote marks - something is wrong! *)
else (* call sepEven *)
end
String.tokens brings you halfway there. But if you really want to handle quotes like you are sketching then there is no way around writing an actual lexer. MLlex, which comes with SML/NJ and MLton (but is usable with any SML) could help. Or you just write it by hand, which should be easy enough in this case as well.

Why does newLISP limit string literals to 2048 characters?

I'm trying to write usage instructions for this newLISP program I've made but it keeps complaining about the string being too long.
ERR: string token too long : "$$$$$$$$&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"
I spent some ten minutes cursing newLISP and coming up with paranoid theories like, maybe you shouldn't have quotes in the string or maybe it'll work when I use raw strings({}), until I started chopping the string. It reached a point where the message dissappeared leaving the help message very unhelpful. Turns out newLISP doesn't like strings that have more than 2048(2^11) characters. Soo,
Why put a limit on the number of characters in a string literal?
Why 2048 characters?
Increasing cell memory to 128MB(saw it the manual) doesn't change anything. The only solution that works now(a hackish one), is splitting the help string into two strings each under 2048 characters then concatenating them with string.
The other strange thing is that any string that has 2048+ characters is printed differently in the repl:
> (dup "&" 2048)
[text]&&&&&&&&&&&&&& .....
......
&&&&&&&&&&&&&&&[/text]
> (dup "&" 2040)
"&&&&&&&&&&&&&&&&&&& .....
.....
&&&&&&&&&&&&&&&"
There are three ways to do strings:
in quotes - escape characters are processed - limited to 2048 chars
in braces - no escape characters are processed - limited to 2048 chars
in tags - no escapes are processed - unlimited length
From the manual:
Quoted strings cannot exceed 2,048 characters. Longer strings should use the [text] and [/text] tag delimiters. newLISP automatically uses these tags for string output longer than 2,048 characters.
Strings can be quite long:
> (quiet (set 's (dup "&" 10E8))) ; don't bother to show the string :)
> (length s)
1000000000
> (10000 20 s)
"&&&&&&&&&&&&&&&&&&&&"
>
The only problem you'll have is when you want to process source code in strings that might contain a [/text] tag before you want the string to really end. It doesn't look like you're at that point yet... :)

Multiline string literal in Matlab?

Is there a multiline string literal syntax in Matlab or is it necessary to concatenate multiple lines?
I found the verbatim package, but it only works in an m-file or function and not interactively within editor cells.
EDIT: I am particularly after readbility and ease of modifying the literal in the code (imagine it contains indented blocks of different levels) - it is easy to make multiline strings, but I am looking for the most convenient sytax for doing that.
So far I have
t = {...
'abc'...
'def'};
t = cellfun(#(x) [x sprintf('\n')],t,'Unif',false);
t = horzcat(t{:});
which gives size(t) = 1 8, but is obviously a bit of a mess.
EDIT 2: Basically verbatim does what I want except it doesn't work in Editor cells, but maybe my best bet is to update it so it does. I think it should be possible to get current open file and cursor position from the java interface to the Editor. The problem would be if there were multiple verbatim calls in the same cell how would you distinguish between them.
I'd go for:
multiline = sprintf([ ...
'Line 1\n'...
'Line 2\n'...
]);
Matlab is an oddball in that escape processing in strings is a function of the printf family of functions instead of the string literal syntax. And no multiline literals. Oh well.
I've ended up doing two things. First, make CR() and LF() functions that just return processed \r and \n respectively, so you can use them as pseudo-literals in your code. I prefer doing this way rather than sending entire strings through sprintf(), because there might be other backslashes in there you didn't want processed as escape sequences (e.g. if some of your strings came from function arguments or input read from elsewhere).
function out = CR()
out = char(13); % # sprintf('\r')
function out = LF()
out = char(10); % # sprintf('\n');
Second, make a join(glue, strs) function that works like Perl's join or the cellfun/horzcat code in your example, but without the final trailing separator.
function out = join(glue, strs)
strs = strs(:)';
strs(2,:) = {glue};
strs = strs(:)';
strs(end) = [];
out = cat(2, strs{:});
And then use it with cell literals like you do.
str = join(LF, {
'abc'
'defghi'
'jklm'
});
You don't need the "..." ellipses in cell literals like this; omitting them does a vertical vector construction, and it's fine if the rows have different lengths of char strings because they're each getting stuck inside a cell. That alone should save you some typing.
Bit of an old thread but I got this
multiline = join([
"Line 1"
"Line 2"
], newline)
I think if makes things pretty easy but obviously it depends on what one is looking for :)

Resources