Which is the most clojuresque way to compare characters and string? (single char string) - string

I was wondering about which is the best (clojuresque) way to compare a character and a string in Clojure.
Obviously something like that returns false:
(= (first "clojure") "c")
because first returns a java.lang.Character and "c" is a single character string. Does exists a construct to compare directly char and string without invoking a cast? I haven't found a way different from this:
(= (str (first "clojure")) "c")
but I'm not satisfied.
Any ideas?
Bye,
Alfredo

How about the straight forward String interop?
(= (.charAt "clojure" 0) \c)
or
(.startsWith "clojure" "c")
It should be as fast as it can get and doesn't allocate a seq object (and in your second example an additional string) which is immediately thrown away again just to do a comparison.

Character literals are written \a \b \c ... in Clojure so you can simply write
(= (first "clojure") \c)

strings can be directly indexed without building a sequence from then and taking the first of that sequence.
(= (nth "clojure" 0) \c)
=> true
nth calls through to this java code:
static public Object nth(Object coll, int n){
if(coll instanceof Indexed)
return ((Indexed) coll).nth(n); <-------
return nthFrom(Util.ret1(coll, coll = null), n);
}
which efficiently reads the character directly.
first call through to this java code:
static public Object first(Object x){
if(x instanceof ISeq)
return ((ISeq) x).first();
ISeq seq = seq(x); <----- (1)
if(seq == null)
return null;
return seq.first(); <------ (2)
}
which builds a seq for the string (1) (building a seq is really fast) and then takes the first item from that seq (2). after the return the seq is garbage.
Seqs are clearly the most idomatic way of accessing anything sequential in clojure and I'm not knocking them at all. It is interesting to be aware of what you are creating when. switching out all your calls to first with calls to nth is likely to be a case of premature optimization. if you want the 100th char in the string i would suggest using an indexed access function like nth
in short: don't sweat the small stuff :)

Fundamentally (at least on the Clojure level — though see Kotarak's answer and others for alternatives to this), you're comparing two sequences: "clojure" and "c". The condition of equality is that the first element of each sequence is equal. So if you want to express this directly you can do
(apply = (map first ["clojure" "c"]))
or the other way around, where you create a lazy sequence over the equality comparison between each pair of characters, and just take the first element of it:
(first (map = "clojure" "c"))

You could use the take function from clojure.contrib.string. Or write your own function that returns the first char if that's something you need frequently.

You can just use str, as you did in your second example. There isn't really anything wrong with that. I mean, you could call first on "c" as well to make it a character, but it wont really make a difference. Is there any reason why you don't like this? It's not really adding much to your code by calling str on the character.

user=> (= (subs "clojure" 0 1) "c")
true
user=> (= (str (first "clojure") "c"))
true

These days you don't necessarily have to use Java interop:
(clojure.string/starts-with? "clojure" "c")
starts-with? is just a thin wrapper (around .startsWith).
So now if you use both Clojure and ClojureScript you won't have to remember both the Java and the JavaScript interop.

Related

Recursively reading a file in Racket

I am struggling to understand how to read a file line by line with racket, while passing each line to a recursive function.
According to the manual, the idiomatic way of doing this is something like the following example:
(with-input-from-file "manylines.txt"
(lambda ()
(for ([l (in-lines)])
(op l))))
What if my function op is a recursive function that needs to do some complicated operations depending on the line just read from file and also on the history of the recursion?
For example, I could have a function like this:
(define (op l s)
;; l is a string, s is a list
(cond ((predicate? l)
(op (next-line-from-file) (cons (function-yes l) s)))
(else
(op (next-line-from-file) (append (function-no l) s)))))
I am not sure how to use this function within the framework described by the manual.
Here next-line-from-file is a construct I made up to make it clear that I would like to keep reading the file.
I think I could do what I want by introducing side effects, for example:
(with-input-from-file "manylines.txt"
(lambda ()
(let ((s '()))
(for ([l (in-lines)])
(if (predicate? l)
(let ((prefix (function-yes l)))
(set-cdr! s s)
(set-car! s prefix))
(let ((prefix (function-no l)))
(set-cdr! prefix s)
(set-car! s prefix)))))))
I actually did not try to run this code, so I'm not sure it would work.
Anyway I would bet that this common task can be solved without introducing side effects, but how?
Two approaches that Racket supports rather well are to turn the port into something which is essentially a generator of lines, or into a stream. You can then pass these things around as arguments to whatever function you are using in order to successively read lines from the file.
The underlying thing in both of these is that ports are sequences, (in-lines p) returns another sequence which consists of the lines from p, and then you can turn these into generators or streams.
Here's a function which will cat a file (just read its lines in other words) using a generator:
(define (cat/generator f)
(call-with-input-file f
(λ (p)
(let-values ([(more? next) (sequence-generate (in-lines p))])
(let loop ([carry-on? (more?)])
(when carry-on?
(displayln (next))
(loop (more?))))))))
Here call-with-input-file deals with opening the file and calling its second argument with a suitable port. in-lines makes a sequence of lines from the port, and sequence-generate then takes any sequence and returns two thunks: one tells you if the sequence is exhausted, and one returns the next thing in it if it isn't. The remainder of the function just uses these functions to print the lines of the file.
Here's an equivalent function which does it using a stream:
(define (cat/stream f)
(call-with-input-file f
(λ (p)
(let loop ([s (sequence->stream (in-lines p))])
(unless (stream-empty? s)
(displayln (stream-first s))
(loop (stream-rest s)))))))
Here the trick is that sequence->stream returns a stream corresponding to a sequence, and then stream-empty? will tell you if you're at the end of the stream, and if it's not empty, then stream-first returns the first element (conceptually the car) while stream-rest returns a stream of all the other elements.
The second one of these is nicer I think.
One nice thing is that lists are streams so you can write functions which use the stream-* functions, test them on lists, and then use them on any other kind of stream, which means any other kind of sequence, and the functions will never know.
I recently implement something similar, except in my case the predicate depended on the following line, not the preceding one. In any case, I found it simplest to discard in-lines and use read-line recursively. Since the predicate depended on unread input, I used peek-string to look ahead in the input stream.
If you really want to use in-lines, you might like to experiment with sequence-fold:
(sequence-fold your-procedure '() (in-lines))
Notice this uses an accumulator, which you could use to check the previous results from your procedure. However, if you're building a list, you generally want to build it backwards using cons, so the most recent element is at the head of the list and can be accessed in constant time. Once you're done, reverse the list.

Switch statement in Lisp

Switch statement with Strings in Lisp.
(defun switch(value)
(case value
(("XY") (print "XY"))
(("AB") (print "AB"))
)
)
I want to compare if value is "XY" then print "XY" or same for "AB".
I have tried this code but it gives me nil. Can some please tell me what i am doing wrong?
You can use the library alexandria, which has a configurable switch macro:
(switch ("XY" :test 'equal)
("XY" "an X and a Y")
("AB" "an A and a B"))
print("XY") looks more like Algol (and all of its descendants) rather than LISP. To apply print one would surround the operator and arguments in parentheses like (print "XY")
case happens to be a macro and you can test the result yourself with passing the quoted code to macroexpand and in my implementation I get:
(let ((value value))
(cond ((eql value '"XY") (print "XY"))
((eql value '"AB") (print "AB"))))
You should know that eql is only good for primiitive data types and numbers. Strings are sequences and thus (eql "XY" "XY") ;==> nil
Perhaps you should use something else than case. eg. use cond or if with equal.
The Hyperspec on CASE says:
These macros allow the conditional execution of a body of forms in a clause that is selected by matching the test-key on the basis of its identity.
And strings are not identical in CL, i.e. (EQ "AB" "AB") => NIL.
That is why CASE wouldn't work for strings. You either need to use symbols (they are interned once only, thus guaranteeing identity) or use COND with EQUAL or even EQUALP if the letters case to be ignored.

How to find maximum overlap between two strings in Scala?

Suppose I have two strings: s and t. I need to write a function f to find a max. t prefix, which is also an s suffix. For example:
s = "abcxyz", t = "xyz123", f(s, t) = "xyz"
s = "abcxxx", t = "xx1234", f(s, t) = "xx"
How would you write it in Scala ?
This first solution is easily the most concise, also it's more efficient than a recursive version as it's using a lazily evaluated iteration
s.tails.find(t.startsWith).get
Now there has been some discussion regarding whether tails would end up copying the whole string over and over. In which case you could use toList on s then mkString the result.
s.toList.tails.find(t.startsWith(_: List[Char])).get.mkString
For some reason the type annotation is required to get it to compile. I've not actually trying seeing which one is faster.
UPDATE - OPTIMIZATION
As som-snytt pointed out, t cannot start with any string that is longer than it, and therefore we could make the following optimization:
s.drop(s.length - t.length).tails.find(t.startsWith).get
Efficient, this is not, but it is a neat (IMO) one-liner.
val s = "abcxyz"
val t ="xyz123"
(s.tails.toSet intersect t.inits.toSet).maxBy(_.size)
//res8: String = xyz
(take all the suffixes of s that are also prefixes of t, and pick the longest)
If we only need to find the common overlapping part, then we can recursively take tail of the first string (which should overlap with the beginning of the second string) until the remaining part will not be the one that second string begins with. This also covers the case when the strings have no overlap, because then the empty string will be returned.
scala> def findOverlap(s:String, t:String):String = {
if (s == t.take(s.size)) s else findOverlap (s.tail, t)
}
findOverlap: (s: String, t: String)String
scala> findOverlap("abcxyz", "xyz123")
res3: String = xyz
scala> findOverlap("one","two")
res1: String = ""
UPDATE: It was pointed out that tail might not be implemented in the most efficient way (i.e. it creates a new string when it is called). If that becomes an issue, then using substring(1) instead of tail (or converting both Strings to Lists, where it's tail / head should have O(1) complexity) might give a better performance. And by the same token, we can replace t.take(s.size) with t.substring(0,s.size).

check if string are just letters (no symbols and numbers) lisp

i need to check if a word are just with letter
example "hfjel" return true
"df(f0" return false
and anyone can explain me the function symbol-name
thank you for help
There is a handy standard function called alpha-char-p that does what you're asking for.
CL-USER(1): (alpha-char-p #\a)
T
CL-USER(2): (alpha-char-p #\Γ)
T
CL-USER(3): (alpha-char-p #\α)
T
CL-USER(4): (alpha-char-p #\0)
NIL
CL-USER(5): (alpha-char-p #\.)
NIL
You can use it in conjunction with every:
CL-USER(7): (every #'alpha-char-p "word")
T
CL-USER(8): (every #'alpha-char-p "nonword7")
NIL
CL-USER(9): (every #'alpha-char-p "non-alpha-word")
NIL
CL-USER(10): (every #'alpha-char-p "今日は")
T
OK, I commented above on diacritics because this particular case often goes unnoticed, below is an example:
* (defparameter *weird-letter*
(coerce (list (code-char #x0438)
(code-char #x0306)) 'string))
*WEIRD-LETTER*
* *weird-letter*
"и"
* (length *weird-letter*)
2
* (every #'alpha-char-p *weird-letter*)
NIL
I'm actually not sure what different Lisp implementations will do here because Unicode support is different in some of them (so far I can tell).
For the code above, the expected result must've been T, because, in fact, two codepoints U+0438-U+0306 constitute a single letter with diacritic. There are two ways in Unicode to spell it, one is a single character, and another one is a combination of the same letter without the diacritic and the diacritic.
So, if you wanted to be super-correct, then you would have to check if, by chance, the letter is not followed by diacritic, but (lo and behold!) only some of these are actually valid letters! Unicode is serious business...
EDIT:
In order to better illustrate my case:
#!/opt/ActivePerl-5.14/bin/perl
binmode STDOUT, ":utf8";
my $weird_letter = "\x{0438}\x{0306}";
print "$weird_letter\n";
if ($weird_letter =~ m/^(\pL|(\pL\pM))+$/)
{ print "it is a letter!\n"; }
else { print "it is not a letter!\n"; }
The above would almost fairly treat this kind of characters.

Correct way to define a function in Haskell

I'm new to Haskell and I'm trying out a few tutorials.
I wrote this script:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky x = "Bad luck"
I saved this as lucky.hs and ran it in the interpreter and it works fine.
But I am unsure about function definitions. It seems from the little I have read that I could equally define the function lucky as follows (function name is lucky2):
lucky2::(Integral a)=> a-> String
lucky2 x=(if x== 7 then "LUCKY NUMBER 7" else "Bad luck")
Both seem to work equally well. Clearly function lucky is clearer to read but is the lucky2 a correct way to write a function?
They are both correct. Arguably, the first one is more idiomatic Haskell because it uses its very important feature called pattern matching. In this form, it would usually be written as:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky _ = "Bad luck"
The underscore signifies the fact that you are ignoring the exact form (value) of your parameter. You only care that it is different than 7, which was the pattern captured by your previous declaration.
The importance of pattern matching is best illustrated by function that operates on more complicated data, such as lists. If you were to write a function that computes a length of list, for example, you would likely start by providing a variant for empty lists:
len [] = 0
The [] clause is a pattern, which is set to match empty lists. Empty lists obviously have length of 0, so that's what we are having our function return.
The other part of len would be the following:
len (x:xs) = 1 + len xs
Here, you are matching on the pattern (x:xs). Colon : is the so-called cons operator: it is appending a value to list. An expression x:xs is therefore a pattern which matches some element (x) being appended to some list (xs). As a whole, it matches a list which has at least one element, since xs can also be an empty list ([]).
This second definition of len is also pretty straightforward. You compute the length of remaining list (len xs) and at 1 to it, which corresponds to the first element (x).
(The usual way to write the above definition would be:
len (_:xs) = 1 + len xs
which again signifies that you do not care what the first element is, only that it exists).
A 3rd way to write this would be using guards:
lucky n
| n == 7 = "lucky"
| otherwise = "unlucky"
There is no reason to be confused about that. There is always more than 1 way to do it. Note that this would be true even if there were no pattern matching or guards and you had to use the if.
All of the forms we've covered so far use so-called syntactic sugar provided by Haskell. Pattern guards are transformed to ordinary case expressions, as well as multiple function clauses and if expressions. Hence the most low-level, unsugared way to write this would be perhaps:
lucky n = case n of
7 -> "lucky"
_ -> "unlucky"
While it is good that you check for idiomatic ways I'd recommend to a beginner that he uses whatever works for him best, whatever he understands best. For example, if one does (not yet) understand points free style, there is no reason to force it. It will come to you sooner or later.

Resources