Sorting a string alphabetically by line in Clojure(Script) - string

Suppose S is a string defined as follows:
;; S
B
C
A
Is there some clojure operation sort-alphabetically (that also works in clojurescript) such that (sort-alphabetically S) generates the following string?
;; (sort-alphabetically S) =>
A
B
C

The following code snippet will do what you want:
(require '[clojure.string :as str])
(def s "C\nB\nA")
(->> s
(str/split-lines) ; split your string into sequence of lines
(sort) ; sort the sequence using natural order (for strings it will be alphabetical order)
(str/join "\n")) ; join elements of the sorted sequence using \n producing a multiline string
;; => "A\nB\nC"

I think you want:
(def foo "a\ne\nc")
(:use clojure.string)
(sort (clojure.string/split foo #"\n"))
;; ("a" "c" "e")
In general you should try to provide a definition of the data that is executable or at least enclosed in parens to make it clear exactly what the data looks like, per the advice of #jmargolisvt

Related

How to distinguish escaped characters from non-escaped e.g. "\x27" from "x27" in a string in Common Lisp?

Solving Advent of Code 2015 task 8 part2 I encountered the problem to have to distinguish in a string the occurrence of "\x27" from plain "x27".
But I don't see a way how I can do it. Because
(length "\x27") ;; is 3
(length "x27") ;; is also 3
(subseq "\x27" 0 1) ;; is "x"
(subseq "x27" 0 1) ;; is "x"
Neither print, prin1, princ made a difference.
# nor does `coerce`
(coerce "\x27" 'list)
;; (#\x #\2 #\7)
So how then to distinguish in a string when "\x27" or any of such
hexadecimal representation occurs?
It turned out, one doesn't need to solve this to solve the task. However, now I still would like to know whether there is a way to distinguish "\x" from "x" in common lisp.
The string literal "\x27" is read as the same as "x27", because \ is an escape character in string literals. If you want a string with the contents \x27, you need to write the literal as "\\x27" (i. e. escape the escape character). This has nothing to do with the strings themselves. If you read a string from a file containing \x27 (e. g. with read-line), then the four-character string \x27 results.
By the time that the Lisp reader gets to work, \x is the same as x. There may be some way to turn this off - I wouldn't be surprised - but the original text talks about Santa's file.
So, I created my own file, like this:
x27
\x27
And I read the data into special variables like this:
(defun read-line-crlf (stream)
(string-right-trim '(#\Return) (read-line stream nil)))
(defun read-lines (filename)
(with-open-file (stream filename)
(setf x (read-line-crlf stream))
(setf x-esc (read-line-crlf stream))
))
The length of x is then 3, and the length of x-esc is 4. The returned string must be trimmed on Windows, or an external format declared, because otherwise SBCL will leave half of the CR-LF on the end of the read strings.

General method to trim non-printable characters in Clojure

I encountered a bug where I couldn't match two seemingly 'identical' strings together. For example, the following two strings fail to match:
"sample" and "​sample".
To replicate the issue, one can run the following in Clojure.
(= "sample" "​sample") ; returns false
After an hour of frustrated debugging, I discovered that there was a zero-width space at the front of the second string! Removing it from this particular example via a backspace is trivial. However I have a database of strings that I'm matching, and it seems like there are multiple strings facing this issue. My question is: is there a general method to trim zero-width spaces in Clojure?
Some method's I've tried:
(count (clojure.string/trim "​abc")) ; returns 4
(count (clojure.string/replace "​abc" #"\s" "")) ; returns 4
This thread Remove zero-width space characters from a JavaScript string does provide a solution with regular expressions that works in this example, i.e.
(count (clojure.string/replace "​abc" #"[\u200B-\u200D\uFEFF]" "")) ; returns 3
However, as stated in the post itself, there are many other potential ascii characters that may be invisible. So I'm still interested if there's a more general method that doesn't rely on listing all possible invisible unicode symbols.
I believe, what you are referring to are so-called non-printable characters. Based on this answer in Java, you could pass the #"\p{C}" regular expression as pattern to replace:
(defn remove-non-printable-characters [x]
(clojure.string/replace x #"\p{C}" ""))
However, this will remove line breaks, e.g. \n. So in order to keep those characters, we need a more complex regular expression:
(defn remove-non-printable-characters [x]
(clojure.string/replace x #"[\p{C}&&^(\S)]" ""))
This function will remove non-printable characters. Let's test it:
(= "sample" "​sample")
;; => false
(= (remove-non-printable-characters "sample")
(remove-non-printable-characters "​sample"))
;; => true
(remove-non-printable-characters "sam\nple")
;; => "sam\nple"
The \p{C} pattern is discussed here.
The regex solution from #Rulle is very nice. The tupelo.chars namespace also has a collection of character classes and predicate functions that could be useful. They work in Clojure and ClojureScript, and also include the ^nbsp; for browsers. In particular, check out the visible? predicate.
The tupelo.string namespace also has a number of helper & convenience functions for string processing.
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require
[tupelo.chars :as chars]
[tupelo.string :as str] ))
(def sss
"Some multi-line
string." )
(dotest
(println "result:")
(println
(str/join
(filterv
#(or (chars/visible? %)
(chars/whitespace? %))
sss))))
with result
result:
Some multi-line
string.
To use, make your project.clj look like:
:dependencies [
[org.clojure/clojure "1.10.2-alpha1"]
[prismatic/schema "1.1.12"]
[tupelo "20.07.01"]
]

what is the interactive REPL IO function?

I have been learning Common Lisp for a while, there was a question I have met that
how I can implement such a function which allows user to input some words until user input exit.
(actually I want to know what kind of command line interactive function APIs fit such requirement)
e.g.
prompt "please input a word: " in the REPL, then store user inputs into a global my-words , exit when user input "exit".
You specification is a little bit incomplete (e.g. what constitutes a word in your problem? What if the user add multiple words? What if the input is empty?). Here below I am using CL-PPCRE to split the input into different words and add them all at once, because it seems useful in general. In your case you might want to add more error checking.
If you want to interact with the user, you should read and write from and to the *QUERY-IO* stream. Here I'll present a version with a global variables, as you requested, as well as another one without side-effects (apart from input/output).
With a global variable
Define the global variable and initialize it with an empty adjustable array.
I am using an array so that it is easy to add words at the end, but you could also use a queue.
(defvar *my-words* (make-array 10 :fill-pointer 0 :adjustable t))
The following function mutates the global variable:
(defun side-effect-word-repl ()
(loop
(format *query-io* "~&Please input a word: ")
(finish-output *query-io*)
(let ((words (ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
(read-line *query-io*))))
(dolist (w words)
(when (string-equal w "exit") ; ignore case
(return-from side-effect-word-repl))
(vector-push-extend w *my-words*)))))
The LOOP uses the simple syntax where there are only expressions and no loop-specific keywords. I first write the prompt to *QUERY-IO*. The ~& FORMAT directive performs the same operation as FRESH-LINE. As Rainer pointed out in comments, we have to call FINISH-OUTPUT to ensure the message is effectively printed before the user is expected to reply.
Then, I read a whole line from the same bidirectional stream, and split it into a list of words, where a word is a string of non-whitespace characters.
With DOLIST, I iterate over the list and add words into the global array with VECTOR-PUSH-EXTEND. But as soon as I encouter "exit", I terminate the loop; since I rely on STRING-EQUAL, the test is done case-insensitively.
Side-effect free approach
Having a global variable as done above is discouraged. If you only need to have a prompt which returns a list of words, then the following will be enough. Here, I use the PUSH/NREVERSE idiom to built the resulting list of words.
(defun pure-word-repl ()
(let ((result '()))
(loop
(format *query-io* "~&Please input a word: ")
(finish-output *query-io*)
(let ((words (ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
(read-line *query-io*))))
(dolist (w words)
(when (string-equal w "exit")
(return-from pure-word-repl (nreverse result)))
(push w result))))))
Note about words
As jkiiski commented, it might be better to split words at :word-boundary. I tried different combinations and the following result seems satisfying with weird example strings:
(mapcan (lambda (string)
(ppcre:split :word-boundary string))
(ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
"amzldk 'amlzkd d;:azdl azdlk"))
=> ("amzldk" "'" "amlzkd" "d" ";:" "azdl" "azdlk")
I first remove all whitespaces and split the string into a list of strings, which can contain punctuation marks. Then, each string is itself splitted at :word-boundary, and concatenated with MAPCAN to form a list of separate words. However, I can't really guess what your actual needs are, so you should probably define your own SPLIT-INTO-WORDS function to validate and split an input string.
CL-USER 23 > (progn
(format t "~%enter a list of words:~%")
(finish-output)
(setf my-words (read))
(terpri))
enter a list of words:
(foo bar baz)
or
CL-USER 28 > (loop with word = nil
do
(format t "~%enter a word or exit:~%")
(finish-output)
(setf word (read))
(terpri)
until (eql word 'exit)
collect word)
enter a word or exit:
foo
enter a word or exit:
bar
enter a word or exit:
baz
enter a word or exit:
exit
(FOO BAR BAZ)

Standard ML string to a list

Is there a way in ML to take in a string and output a list of those string where a separation is a space, newline or eof, but also keeping strings inside strings intact?
EX) hello world "my id" is 5555
-> [hello, world, my id, is, 5555]
I am working on a tokenizing these then into:
->[word, word, string, word, int]
Sure you can! Here's the idea:
If we take a string like "Hello World, \"my id\" is 5555", we can split it at the quote marks, ignoring the spaces for now. This gives us ["Hello World, ", "my id", " is 5555"]. The important thing to notice here is that the list contains three elements - an odd number. As long as the string only contains pairs of quotes (as it will if it's properly formatted), we'll always get an odd number of elements when we split at the quote marks.
A second important thing is that all the even-numbered elements of the list will be strings that were unquoted (if we start counting from 0), and the odd-numbered ones were quoted. That means that all we need to do is tokenize the ones that were unquoted, and then we're done!
I put some code together - you can continue from there:
fun foo s =
let
val quoteSep = String.tokens (fn c => c = #"\"") s
val spaceSep = String.tokens (fn c => c = #" ") (* change this to include newlines and stuff *)
fun sepEven [] = []
| sepEven [x] = (* there were no quotes in the string *)
| sepEven (x::y::xs) = (* x was unquoted, y was quoted *)
in
if length quoteSep mod 2 = 0
then (* there was an uneven number of quote marks - something is wrong! *)
else (* call sepEven *)
end
String.tokens brings you halfway there. But if you really want to handle quotes like you are sketching then there is no way around writing an actual lexer. MLlex, which comes with SML/NJ and MLton (but is usable with any SML) could help. Or you just write it by hand, which should be easy enough in this case as well.

Clojure: Idiomatic Way to Insert a Char in a String

I have a string in Clojure and a character I want to put in between the nth and (n+1)st character. For example: Lets say the string is "aple" and I want to insert another "p" between the "p" and the "l".
(prn
(some-function "aple" "p" 1 2))
;; prints "apple"
;; ie "aple" -> "ap" "p" "le" and the concatenated back together.
I'm finding this somewhat challenging, so I figure I am missing information about some useful function(s) Can someone please help me write the "some-function" above that takes a string, another string, a start position and an end position and inserts the second string into the first between the start position and the end position? Thanks in advance!
More efficient than using seq functions:
(defn str-insert
"Insert c in string s at index i."
[s c i]
(str (subs s 0 i) c (subs s i)))
From the REPL:
user=> (str-insert "aple" "p" 1)
"apple"
NB. This function doesn't actually care about the type of c, or its length in the case of a string; (str-insert "aple" \p 1) and (str-insert "ale" "pp" 1) work also (in general, (str c) will be used, which is the empty string if c is nil and (.toString c) otherwise).
Since the question asks for an idiomatic way to perform the task at hand, I will also note that I find it preferable (in terms of "semantic fit" in addition to the performance advantage) to use string-oriented functions when dealing with strings specifically; this includes subs and functions from clojure.string. See the design notes at the top of the source of clojure.string for a discussion of idiomatic string handling.

Resources