Make String from Sequence of Characters - string

This code does not work as I expected. Could you please explain why?
(defn make-str [s c]
(let [my-str (ref s)]
(dosync (alter my-str str c))))
(defn make-str-from-chars
"make a string from a sequence of characters"
([chars] make-str-from-chars chars "")
([chars result]
(if (== (count chars) 0) result
(recur (drop 1 chars) (make-str result (take 1 chars))))))
Thank you!

This is very slow & incorrect way to create string from seq of characters. The main problem, that changes aren't propagated - ref creates new reference to existing string, but after it exits from function, reference is destroyed.
The correct way to do this is:
(apply str seq)
for example,
user=> (apply str [\1 \2 \3 \4])
"1234"
If you want to make it more effective, then you can use Java's StringBuilder to collect all data in string. (Strings in Java are also immutable)

You pass a sequence with one character in it to your make-str function, not the character itself. Using first instead of take should give you the desired effect.
Also there is no need to use references. In effect your use of them is a gross misuse of them. You already use an accumulator in your function, so you can use str directly.
(defn make-str-from-chars
"make a string from a sequence of characters"
([chars] (make-str-from-chars chars ""))
([chars result]
(if (zero? (count chars))
result
(recur (drop 1 chars) (str result (first chars))))))
Of course count is not very nice in this case, because it always has to walk the whole sequence to figure out its length. So you traverse the input sequence several times unnecessarily. One normally uses seq to identify when a sequence is exhausted. We can also use next instead of drop to save some overhead of creating unnecessary sequence objects. Be sure to capture the return value of seq to avoid overhead of object creations later on. We do this in the if-let.
(defn make-str-from-chars
"make a string from a sequence of characters"
([chars] (make-str-from-chars chars ""))
([chars result]
(if-let [chars (seq chars)]
(recur (next chars) (str result (first chars)))
result)))
Functions like this, which just return the accumulator upon fully consuming its input, cry for reduce.
(defn make-str-from-chars
"make a string from a sequence of characters"
[chars]
(reduce str "" chars))
This is already nice and short, but in this particular case we can do even a little better by using apply. Then str can use the underlying StringBuilder to its full power.
(defn make-str-from-chars
"make a string from a sequence of characters"
[chars]
(apply str chars))
Hope this helps.

You can also use clojure.string/join, as follows:
(require '[clojure.string :as str] )
(assert (= (vec "abcd") [\a \b \c \d] ))
(assert (= (str/join (vec "abcd")) "abcd" ))
There is an alternate form of clojure.string/join which accepts a separator. See:
http://clojuredocs.org/clojure_core/clojure.string/join

Related

General method to trim non-printable characters in Clojure

I encountered a bug where I couldn't match two seemingly 'identical' strings together. For example, the following two strings fail to match:
"sample" and "​sample".
To replicate the issue, one can run the following in Clojure.
(= "sample" "​sample") ; returns false
After an hour of frustrated debugging, I discovered that there was a zero-width space at the front of the second string! Removing it from this particular example via a backspace is trivial. However I have a database of strings that I'm matching, and it seems like there are multiple strings facing this issue. My question is: is there a general method to trim zero-width spaces in Clojure?
Some method's I've tried:
(count (clojure.string/trim "​abc")) ; returns 4
(count (clojure.string/replace "​abc" #"\s" "")) ; returns 4
This thread Remove zero-width space characters from a JavaScript string does provide a solution with regular expressions that works in this example, i.e.
(count (clojure.string/replace "​abc" #"[\u200B-\u200D\uFEFF]" "")) ; returns 3
However, as stated in the post itself, there are many other potential ascii characters that may be invisible. So I'm still interested if there's a more general method that doesn't rely on listing all possible invisible unicode symbols.
I believe, what you are referring to are so-called non-printable characters. Based on this answer in Java, you could pass the #"\p{C}" regular expression as pattern to replace:
(defn remove-non-printable-characters [x]
(clojure.string/replace x #"\p{C}" ""))
However, this will remove line breaks, e.g. \n. So in order to keep those characters, we need a more complex regular expression:
(defn remove-non-printable-characters [x]
(clojure.string/replace x #"[\p{C}&&^(\S)]" ""))
This function will remove non-printable characters. Let's test it:
(= "sample" "​sample")
;; => false
(= (remove-non-printable-characters "sample")
(remove-non-printable-characters "​sample"))
;; => true
(remove-non-printable-characters "sam\nple")
;; => "sam\nple"
The \p{C} pattern is discussed here.
The regex solution from #Rulle is very nice. The tupelo.chars namespace also has a collection of character classes and predicate functions that could be useful. They work in Clojure and ClojureScript, and also include the ^nbsp; for browsers. In particular, check out the visible? predicate.
The tupelo.string namespace also has a number of helper & convenience functions for string processing.
(ns tst.demo.core
(:use tupelo.core tupelo.test)
(:require
[tupelo.chars :as chars]
[tupelo.string :as str] ))
(def sss
"Some multi-line
string." )
(dotest
(println "result:")
(println
(str/join
(filterv
#(or (chars/visible? %)
(chars/whitespace? %))
sss))))
with result
result:
Some multi-line
string.
To use, make your project.clj look like:
:dependencies [
[org.clojure/clojure "1.10.2-alpha1"]
[prismatic/schema "1.1.12"]
[tupelo "20.07.01"]
]

what is the interactive REPL IO function?

I have been learning Common Lisp for a while, there was a question I have met that
how I can implement such a function which allows user to input some words until user input exit.
(actually I want to know what kind of command line interactive function APIs fit such requirement)
e.g.
prompt "please input a word: " in the REPL, then store user inputs into a global my-words , exit when user input "exit".
You specification is a little bit incomplete (e.g. what constitutes a word in your problem? What if the user add multiple words? What if the input is empty?). Here below I am using CL-PPCRE to split the input into different words and add them all at once, because it seems useful in general. In your case you might want to add more error checking.
If you want to interact with the user, you should read and write from and to the *QUERY-IO* stream. Here I'll present a version with a global variables, as you requested, as well as another one without side-effects (apart from input/output).
With a global variable
Define the global variable and initialize it with an empty adjustable array.
I am using an array so that it is easy to add words at the end, but you could also use a queue.
(defvar *my-words* (make-array 10 :fill-pointer 0 :adjustable t))
The following function mutates the global variable:
(defun side-effect-word-repl ()
(loop
(format *query-io* "~&Please input a word: ")
(finish-output *query-io*)
(let ((words (ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
(read-line *query-io*))))
(dolist (w words)
(when (string-equal w "exit") ; ignore case
(return-from side-effect-word-repl))
(vector-push-extend w *my-words*)))))
The LOOP uses the simple syntax where there are only expressions and no loop-specific keywords. I first write the prompt to *QUERY-IO*. The ~& FORMAT directive performs the same operation as FRESH-LINE. As Rainer pointed out in comments, we have to call FINISH-OUTPUT to ensure the message is effectively printed before the user is expected to reply.
Then, I read a whole line from the same bidirectional stream, and split it into a list of words, where a word is a string of non-whitespace characters.
With DOLIST, I iterate over the list and add words into the global array with VECTOR-PUSH-EXTEND. But as soon as I encouter "exit", I terminate the loop; since I rely on STRING-EQUAL, the test is done case-insensitively.
Side-effect free approach
Having a global variable as done above is discouraged. If you only need to have a prompt which returns a list of words, then the following will be enough. Here, I use the PUSH/NREVERSE idiom to built the resulting list of words.
(defun pure-word-repl ()
(let ((result '()))
(loop
(format *query-io* "~&Please input a word: ")
(finish-output *query-io*)
(let ((words (ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
(read-line *query-io*))))
(dolist (w words)
(when (string-equal w "exit")
(return-from pure-word-repl (nreverse result)))
(push w result))))))
Note about words
As jkiiski commented, it might be better to split words at :word-boundary. I tried different combinations and the following result seems satisfying with weird example strings:
(mapcan (lambda (string)
(ppcre:split :word-boundary string))
(ppcre:split
'(:greedy-repetition 1 nil :whitespace-char-class)
"amzldk 'amlzkd d;:azdl azdlk"))
=> ("amzldk" "'" "amlzkd" "d" ";:" "azdl" "azdlk")
I first remove all whitespaces and split the string into a list of strings, which can contain punctuation marks. Then, each string is itself splitted at :word-boundary, and concatenated with MAPCAN to form a list of separate words. However, I can't really guess what your actual needs are, so you should probably define your own SPLIT-INTO-WORDS function to validate and split an input string.
CL-USER 23 > (progn
(format t "~%enter a list of words:~%")
(finish-output)
(setf my-words (read))
(terpri))
enter a list of words:
(foo bar baz)
or
CL-USER 28 > (loop with word = nil
do
(format t "~%enter a word or exit:~%")
(finish-output)
(setf word (read))
(terpri)
until (eql word 'exit)
collect word)
enter a word or exit:
foo
enter a word or exit:
bar
enter a word or exit:
baz
enter a word or exit:
exit
(FOO BAR BAZ)

Scheme string-ref and char-whitespace? using

(define len (string-length "James ApR23Trb&%25G)(=?vqa"))
(define (divide-string str)
(let (x)
(if (char-whitespace? (string-ref str x))
(substring str (+ 1 x) (- len 1))
(printf "an invalid input!"))
))
(divide-string "James ApR23Trb&%25G)(=?vqa")
I have a string with divided into a blank space. I need to handle two
substring. One is till blank space and the other one is from blank
space. But i could not handle the index of blank space with x.
Any help will be appraciated. Thank you even now.
Try regexp-split:
> (regexp-split #rx"\\s" "James ApR23Trb&%25G)(=?vqa")
'("Jame" " ApR23Trb&%25G)(=?vqa")
Here \\s matches whitespace.
Oops. I mistook the question for a Racket question.
In a Scheme implementation: search for split in the documentation
and see what your implementation of choice has available.

Appending character to string in Common Lisp

I have a character ch that I want to append to a string str. I realize you can concatenate strings like such:
(setf str (concatenate 'string str (list ch)))
But that seems rather inefficient. Is there a faster way to just append a single character?
If the string has a fill-pointer and maybe is also adjustable.
Adjustable = can change its size.
fill-pointer = the content size, the length, can be less than the string size.
VECTOR-PUSH = add an element at the end and increment the fill-pointer.
VECTOR-PUSH-EXTEND = as VECTOR-PUSH, additionally resizes the array, if it is too small.
We can make an adjustable string from a normal one:
CL-USER 32 > (defun make-adjustable-string (s)
(make-array (length s)
:fill-pointer (length s)
:adjustable t
:initial-contents s
:element-type (array-element-type s)))
MAKE-ADJUSTABLE-STRING
CL-USER 33 > (let ((s (make-adjustable-string "Lisp")))
(vector-push-extend #\! s)
s)
"Lisp!"
If you want to extend a single string multiple times, it is often
quite performant to use with-output-to-string, writing to the stream
it provides. Be sure to use write or princ etc. (instead of format)
for performance.

Clojure: Idiomatic Way to Insert a Char in a String

I have a string in Clojure and a character I want to put in between the nth and (n+1)st character. For example: Lets say the string is "aple" and I want to insert another "p" between the "p" and the "l".
(prn
(some-function "aple" "p" 1 2))
;; prints "apple"
;; ie "aple" -> "ap" "p" "le" and the concatenated back together.
I'm finding this somewhat challenging, so I figure I am missing information about some useful function(s) Can someone please help me write the "some-function" above that takes a string, another string, a start position and an end position and inserts the second string into the first between the start position and the end position? Thanks in advance!
More efficient than using seq functions:
(defn str-insert
"Insert c in string s at index i."
[s c i]
(str (subs s 0 i) c (subs s i)))
From the REPL:
user=> (str-insert "aple" "p" 1)
"apple"
NB. This function doesn't actually care about the type of c, or its length in the case of a string; (str-insert "aple" \p 1) and (str-insert "ale" "pp" 1) work also (in general, (str c) will be used, which is the empty string if c is nil and (.toString c) otherwise).
Since the question asks for an idiomatic way to perform the task at hand, I will also note that I find it preferable (in terms of "semantic fit" in addition to the performance advantage) to use string-oriented functions when dealing with strings specifically; this includes subs and functions from clojure.string. See the design notes at the top of the source of clojure.string for a discussion of idiomatic string handling.

Resources