clojure - avoid extra whitespace when combining string and variable - string

I'm writing a program that uses printl-str to return commands of an assembly language. I need to use variables in my code and I'm having this issue where the function will return extra whitespace where I don't want it:
(defn pushConstant [constant]
(println-str "#" constant "\r\nD=A\r\n#SP\r\nA=M\r\nM=D\r\n#SP\r\nM=M+1"))
Where instead of having, assuming that constant = 17
#17
D=A
#SP
A=M
M=D
#SP
M=M+1
I'm having:
# 17
D=A
#SP
A=M
M=D
#SP
M=M+1
Which is problematic for my assembly code. I have this issue in so many cases like this. I'll be glad to hear advice on how to avoid this extra whitespace between the String and the variable.

Frankly, I'd implement that to look more like the following:
(defn pushConstant [constant]
(->> [(str "#" constant)
"D=A"
"#SP"
"A=M"
"M=D"
"#SP"
"M=M+1"]
(interpose "\r\n")
(apply str)))
That way you don't have one big ugly format string, but break down your operations into small, readable pieces.
That said, the piece that makes a difference for you here is (str "#" constant), combining your # with the argument with no added whitespace.

Create the string using str which only concatenates (println interleaves spaces):
(defn pushConstant [constant]
(println-str (str "#" constant "\r\nD=A\r\n#SP\r\nA=M\r\nM=D\r\n#SP\r\nM=M+1")))

Related

How to distinguish escaped characters from non-escaped e.g. "\x27" from "x27" in a string in Common Lisp?

Solving Advent of Code 2015 task 8 part2 I encountered the problem to have to distinguish in a string the occurrence of "\x27" from plain "x27".
But I don't see a way how I can do it. Because
(length "\x27") ;; is 3
(length "x27") ;; is also 3
(subseq "\x27" 0 1) ;; is "x"
(subseq "x27" 0 1) ;; is "x"
Neither print, prin1, princ made a difference.
# nor does `coerce`
(coerce "\x27" 'list)
;; (#\x #\2 #\7)
So how then to distinguish in a string when "\x27" or any of such
hexadecimal representation occurs?
It turned out, one doesn't need to solve this to solve the task. However, now I still would like to know whether there is a way to distinguish "\x" from "x" in common lisp.
The string literal "\x27" is read as the same as "x27", because \ is an escape character in string literals. If you want a string with the contents \x27, you need to write the literal as "\\x27" (i. e. escape the escape character). This has nothing to do with the strings themselves. If you read a string from a file containing \x27 (e. g. with read-line), then the four-character string \x27 results.
By the time that the Lisp reader gets to work, \x is the same as x. There may be some way to turn this off - I wouldn't be surprised - but the original text talks about Santa's file.
So, I created my own file, like this:
x27
\x27
And I read the data into special variables like this:
(defun read-line-crlf (stream)
(string-right-trim '(#\Return) (read-line stream nil)))
(defun read-lines (filename)
(with-open-file (stream filename)
(setf x (read-line-crlf stream))
(setf x-esc (read-line-crlf stream))
))
The length of x is then 3, and the length of x-esc is 4. The returned string must be trimmed on Windows, or an external format declared, because otherwise SBCL will leave half of the CR-LF on the end of the read strings.

General method to trim non-printable characters in Clojure

I encountered a bug where I couldn't match two seemingly 'identical' strings together. For example, the following two strings fail to match:
"sample" and "​sample".
To replicate the issue, one can run the following in Clojure.
(= "sample" "​sample") ; returns false
After an hour of frustrated debugging, I discovered that there was a zero-width space at the front of the second string! Removing it from this particular example via a backspace is trivial. However I have a database of strings that I'm matching, and it seems like there are multiple strings facing this issue. My question is: is there a general method to trim zero-width spaces in Clojure?
Some method's I've tried:
(count (clojure.string/trim "​abc")) ; returns 4
(count (clojure.string/replace "​abc" #"\s" "")) ; returns 4
This thread Remove zero-width space characters from a JavaScript string does provide a solution with regular expressions that works in this example, i.e.
(count (clojure.string/replace "​abc" #"[\u200B-\u200D\uFEFF]" "")) ; returns 3
However, as stated in the post itself, there are many other potential ascii characters that may be invisible. So I'm still interested if there's a more general method that doesn't rely on listing all possible invisible unicode symbols.
I believe, what you are referring to are so-called non-printable characters. Based on this answer in Java, you could pass the #"\p{C}" regular expression as pattern to replace:
(defn remove-non-printable-characters [x]
(clojure.string/replace x #"\p{C}" ""))
However, this will remove line breaks, e.g. \n. So in order to keep those characters, we need a more complex regular expression:
(defn remove-non-printable-characters [x]
(clojure.string/replace x #"[\p{C}&&^(\S)]" ""))
This function will remove non-printable characters. Let's test it:
(= "sample" "​sample")
;; => false
(= (remove-non-printable-characters "sample")
(remove-non-printable-characters "​sample"))
;; => true
(remove-non-printable-characters "sam\nple")
;; => "sam\nple"
The \p{C} pattern is discussed here.
The regex solution from #Rulle is very nice. The tupelo.chars namespace also has a collection of character classes and predicate functions that could be useful. They work in Clojure and ClojureScript, and also include the ^nbsp; for browsers. In particular, check out the visible? predicate.
The tupelo.string namespace also has a number of helper & convenience functions for string processing.
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require
[tupelo.chars :as chars]
[tupelo.string :as str] ))
(def sss
"Some multi-line
string." )
(dotest
(println "result:")
(println
(str/join
(filterv
#(or (chars/visible? %)
(chars/whitespace? %))
sss))))
with result
result:
Some multi-line
string.
To use, make your project.clj look like:
:dependencies [
[org.clojure/clojure "1.10.2-alpha1"]
[prismatic/schema "1.1.12"]
[tupelo "20.07.01"]
]

Multiline string literal in Matlab?

Is there a multiline string literal syntax in Matlab or is it necessary to concatenate multiple lines?
I found the verbatim package, but it only works in an m-file or function and not interactively within editor cells.
EDIT: I am particularly after readbility and ease of modifying the literal in the code (imagine it contains indented blocks of different levels) - it is easy to make multiline strings, but I am looking for the most convenient sytax for doing that.
So far I have
t = {...
'abc'...
'def'};
t = cellfun(#(x) [x sprintf('\n')],t,'Unif',false);
t = horzcat(t{:});
which gives size(t) = 1 8, but is obviously a bit of a mess.
EDIT 2: Basically verbatim does what I want except it doesn't work in Editor cells, but maybe my best bet is to update it so it does. I think it should be possible to get current open file and cursor position from the java interface to the Editor. The problem would be if there were multiple verbatim calls in the same cell how would you distinguish between them.
I'd go for:
multiline = sprintf([ ...
'Line 1\n'...
'Line 2\n'...
]);
Matlab is an oddball in that escape processing in strings is a function of the printf family of functions instead of the string literal syntax. And no multiline literals. Oh well.
I've ended up doing two things. First, make CR() and LF() functions that just return processed \r and \n respectively, so you can use them as pseudo-literals in your code. I prefer doing this way rather than sending entire strings through sprintf(), because there might be other backslashes in there you didn't want processed as escape sequences (e.g. if some of your strings came from function arguments or input read from elsewhere).
function out = CR()
out = char(13); % # sprintf('\r')
function out = LF()
out = char(10); % # sprintf('\n');
Second, make a join(glue, strs) function that works like Perl's join or the cellfun/horzcat code in your example, but without the final trailing separator.
function out = join(glue, strs)
strs = strs(:)';
strs(2,:) = {glue};
strs = strs(:)';
strs(end) = [];
out = cat(2, strs{:});
And then use it with cell literals like you do.
str = join(LF, {
'abc'
'defghi'
'jklm'
});
You don't need the "..." ellipses in cell literals like this; omitting them does a vertical vector construction, and it's fine if the rows have different lengths of char strings because they're each getting stuck inside a cell. That alone should save you some typing.
Bit of an old thread but I got this
multiline = join([
"Line 1"
"Line 2"
], newline)
I think if makes things pretty easy but obviously it depends on what one is looking for :)

Modifying a character in a string in Lua

Is there any way to replace a character at position N in a string in Lua.
This is what I've come up with so far:
function replace_char(pos, str, r)
return str:sub(pos, pos - 1) .. r .. str:sub(pos + 1, str:len())
end
str = replace_char(2, "aaaaaa", "X")
print(str)
I can't use gsub either as that would replace every capture, not just the capture at position N.
Strings in Lua are immutable. That means, that any solution that replaces text in a string must end up constructing a new string with the desired content. For the specific case of replacing a single character with some other content, you will need to split the original string into a prefix part and a postfix part, and concatenate them back together around the new content.
This variation on your code:
function replace_char(pos, str, r)
return str:sub(1, pos-1) .. r .. str:sub(pos+1)
end
is the most direct translation to straightforward Lua. It is probably fast enough for most purposes. I've fixed the bug that the prefix should be the first pos-1 chars, and taken advantage of the fact that if the last argument to string.sub is missing it is assumed to be -1 which is equivalent to the end of the string.
But do note that it creates a number of temporary strings that will hang around in the string store until garbage collection eats them. The temporaries for the prefix and postfix can't be avoided in any solution. But this also has to create a temporary for the first .. operator to be consumed by the second.
It is possible that one of two alternate approaches could be faster. The first is the solution offered by Paŭlo Ebermann, but with one small tweak:
function replace_char2(pos, str, r)
return ("%s%s%s"):format(str:sub(1,pos-1), r, str:sub(pos+1))
end
This uses string.format to do the assembly of the result in the hopes that it can guess the final buffer size without needing extra temporary objects.
But do beware that string.format is likely to have issues with any \0 characters in any string that it passes through its %s format. Specifically, since it is implemented in terms of standard C's sprintf() function, it would be reasonable to expect it to terminate the substituted string at the first occurrence of \0. (Noted by user Delusional Logic in a comment.)
A third alternative that comes to mind is this:
function replace_char3(pos, str, r)
return table.concat{str:sub(1,pos-1), r, str:sub(pos+1)}
end
table.concat efficiently concatenates a list of strings into a final result. It has an optional second argument which is text to insert between the strings, which defaults to "" which suits our purpose here.
My guess is that unless your strings are huge and you do this substitution frequently, you won't see any practical performance differences between these methods. However, I've been surprised before, so profile your application to verify there is a bottleneck, and benchmark potential solutions carefully.
You should use pos inside your function instead of literal 1 and 3, but apart from this it looks good. Since Lua strings are immutable you can't really do much better than this.
Maybe
"%s%s%s":format(str:sub(1,pos-1), r, str:sub(pos+1, str:len())
is more efficient than the .. operator, but I doubt it - if it turns out to be a bottleneck, measure it (and then decide to implement this replacement function in C).
With luajit, you can use the FFI library to cast the string to a list of unsigned charts:
local ffi = require 'ffi'
txt = 'test'
ptr = ffi.cast('uint8_t*', txt)
ptr[1] = string.byte('o')

Displaying a string while using cond in Lisp

I'm just starting off with Lisp and need some help. This is technically homework, but I gave it a try and am getting somewhat what I wanted:
(defun speed (kmp)
(cond ((> kmp 100) "Fast")
((< kmp 40) "Slow")
(t "Average")))
However, if I run the program it displays "Average" instead of just Average (without the quotes).
How can I get it to display the string without quotes?
You can use symbols instead of strings. But keep in mind that symbols will be converted to uppercase:
> 'Average
AVERAGE
If you care about case or want to embed spaces, use format:
> (format t "Average")
Average
The read-eval-print loop displays the return value of your function, which is one of the strings in a cond branch. Strings are printed readably by surrounding them with double-quotes.
You could use (write-string (speed 42)). Don't worry that it also shows the string in double-quotes - that's the return value of write-string, displayed after the quoteless output.
You can also use symbols instead of strings:
(defun speed (kmp)
(cond ((> kmp 100) 'fast)
((< kmp 40) 'slow)
(t 'average)))
Symbols are uppercased by default, so internally fast is then FAST.
You can write any symbol in any case and with any characters using escaping with vertical bars:
|The speeed is very fast!|
Above is a valid symbol in Common Lisp and is stored internally just as you write it with case preserved.

Resources