check if string are just letters (no symbols and numbers) lisp - string

i need to check if a word are just with letter
example "hfjel" return true
"df(f0" return false
and anyone can explain me the function symbol-name
thank you for help

There is a handy standard function called alpha-char-p that does what you're asking for.
CL-USER(1): (alpha-char-p #\a)
T
CL-USER(2): (alpha-char-p #\Γ)
T
CL-USER(3): (alpha-char-p #\α)
T
CL-USER(4): (alpha-char-p #\0)
NIL
CL-USER(5): (alpha-char-p #\.)
NIL
You can use it in conjunction with every:
CL-USER(7): (every #'alpha-char-p "word")
T
CL-USER(8): (every #'alpha-char-p "nonword7")
NIL
CL-USER(9): (every #'alpha-char-p "non-alpha-word")
NIL
CL-USER(10): (every #'alpha-char-p "今日は")
T

OK, I commented above on diacritics because this particular case often goes unnoticed, below is an example:
* (defparameter *weird-letter*
(coerce (list (code-char #x0438)
(code-char #x0306)) 'string))
*WEIRD-LETTER*
* *weird-letter*
"и"
* (length *weird-letter*)
2
* (every #'alpha-char-p *weird-letter*)
NIL
I'm actually not sure what different Lisp implementations will do here because Unicode support is different in some of them (so far I can tell).
For the code above, the expected result must've been T, because, in fact, two codepoints U+0438-U+0306 constitute a single letter with diacritic. There are two ways in Unicode to spell it, one is a single character, and another one is a combination of the same letter without the diacritic and the diacritic.
So, if you wanted to be super-correct, then you would have to check if, by chance, the letter is not followed by diacritic, but (lo and behold!) only some of these are actually valid letters! Unicode is serious business...
EDIT:
In order to better illustrate my case:
#!/opt/ActivePerl-5.14/bin/perl
binmode STDOUT, ":utf8";
my $weird_letter = "\x{0438}\x{0306}";
print "$weird_letter\n";
if ($weird_letter =~ m/^(\pL|(\pL\pM))+$/)
{ print "it is a letter!\n"; }
else { print "it is not a letter!\n"; }
The above would almost fairly treat this kind of characters.

Related

Accept number or character

I know I can use (every #'digit-char-p str) to check if every character is an digit or (every #'alpha-char-p str) if it is a character. The problem is that I want to accept if the word contains letters or number but not symbols like +. Hello1 should return true but not Hello1-. Anyone knows how to do that?
every's first argument is a function (in fact a function designator): you can put any function you want there. In this case, as mentioned in a comment, there's a function defined by the language which does what you want, but if there wasn't you could write one:
(every (lambda (c) (or (digit-char-p c) (alpha-char-p c))) ...)
or (every (lambda (c) (and (explodablep c) (green c))) ...). Or anything you want.

Switch statement in Lisp

Switch statement with Strings in Lisp.
(defun switch(value)
(case value
(("XY") (print "XY"))
(("AB") (print "AB"))
)
)
I want to compare if value is "XY" then print "XY" or same for "AB".
I have tried this code but it gives me nil. Can some please tell me what i am doing wrong?
You can use the library alexandria, which has a configurable switch macro:
(switch ("XY" :test 'equal)
("XY" "an X and a Y")
("AB" "an A and a B"))
print("XY") looks more like Algol (and all of its descendants) rather than LISP. To apply print one would surround the operator and arguments in parentheses like (print "XY")
case happens to be a macro and you can test the result yourself with passing the quoted code to macroexpand and in my implementation I get:
(let ((value value))
(cond ((eql value '"XY") (print "XY"))
((eql value '"AB") (print "AB"))))
You should know that eql is only good for primiitive data types and numbers. Strings are sequences and thus (eql "XY" "XY") ;==> nil
Perhaps you should use something else than case. eg. use cond or if with equal.
The Hyperspec on CASE says:
These macros allow the conditional execution of a body of forms in a clause that is selected by matching the test-key on the basis of its identity.
And strings are not identical in CL, i.e. (EQ "AB" "AB") => NIL.
That is why CASE wouldn't work for strings. You either need to use symbols (they are interned once only, thus guaranteeing identity) or use COND with EQUAL or even EQUALP if the letters case to be ignored.

Correct way to define a function in Haskell

I'm new to Haskell and I'm trying out a few tutorials.
I wrote this script:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky x = "Bad luck"
I saved this as lucky.hs and ran it in the interpreter and it works fine.
But I am unsure about function definitions. It seems from the little I have read that I could equally define the function lucky as follows (function name is lucky2):
lucky2::(Integral a)=> a-> String
lucky2 x=(if x== 7 then "LUCKY NUMBER 7" else "Bad luck")
Both seem to work equally well. Clearly function lucky is clearer to read but is the lucky2 a correct way to write a function?
They are both correct. Arguably, the first one is more idiomatic Haskell because it uses its very important feature called pattern matching. In this form, it would usually be written as:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky _ = "Bad luck"
The underscore signifies the fact that you are ignoring the exact form (value) of your parameter. You only care that it is different than 7, which was the pattern captured by your previous declaration.
The importance of pattern matching is best illustrated by function that operates on more complicated data, such as lists. If you were to write a function that computes a length of list, for example, you would likely start by providing a variant for empty lists:
len [] = 0
The [] clause is a pattern, which is set to match empty lists. Empty lists obviously have length of 0, so that's what we are having our function return.
The other part of len would be the following:
len (x:xs) = 1 + len xs
Here, you are matching on the pattern (x:xs). Colon : is the so-called cons operator: it is appending a value to list. An expression x:xs is therefore a pattern which matches some element (x) being appended to some list (xs). As a whole, it matches a list which has at least one element, since xs can also be an empty list ([]).
This second definition of len is also pretty straightforward. You compute the length of remaining list (len xs) and at 1 to it, which corresponds to the first element (x).
(The usual way to write the above definition would be:
len (_:xs) = 1 + len xs
which again signifies that you do not care what the first element is, only that it exists).
A 3rd way to write this would be using guards:
lucky n
| n == 7 = "lucky"
| otherwise = "unlucky"
There is no reason to be confused about that. There is always more than 1 way to do it. Note that this would be true even if there were no pattern matching or guards and you had to use the if.
All of the forms we've covered so far use so-called syntactic sugar provided by Haskell. Pattern guards are transformed to ordinary case expressions, as well as multiple function clauses and if expressions. Hence the most low-level, unsugared way to write this would be perhaps:
lucky n = case n of
7 -> "lucky"
_ -> "unlucky"
While it is good that you check for idiomatic ways I'd recommend to a beginner that he uses whatever works for him best, whatever he understands best. For example, if one does (not yet) understand points free style, there is no reason to force it. It will come to you sooner or later.

Which is the most clojuresque way to compare characters and string? (single char string)

I was wondering about which is the best (clojuresque) way to compare a character and a string in Clojure.
Obviously something like that returns false:
(= (first "clojure") "c")
because first returns a java.lang.Character and "c" is a single character string. Does exists a construct to compare directly char and string without invoking a cast? I haven't found a way different from this:
(= (str (first "clojure")) "c")
but I'm not satisfied.
Any ideas?
Bye,
Alfredo
How about the straight forward String interop?
(= (.charAt "clojure" 0) \c)
or
(.startsWith "clojure" "c")
It should be as fast as it can get and doesn't allocate a seq object (and in your second example an additional string) which is immediately thrown away again just to do a comparison.
Character literals are written \a \b \c ... in Clojure so you can simply write
(= (first "clojure") \c)
strings can be directly indexed without building a sequence from then and taking the first of that sequence.
(= (nth "clojure" 0) \c)
=> true
nth calls through to this java code:
static public Object nth(Object coll, int n){
if(coll instanceof Indexed)
return ((Indexed) coll).nth(n); <-------
return nthFrom(Util.ret1(coll, coll = null), n);
}
which efficiently reads the character directly.
first call through to this java code:
static public Object first(Object x){
if(x instanceof ISeq)
return ((ISeq) x).first();
ISeq seq = seq(x); <----- (1)
if(seq == null)
return null;
return seq.first(); <------ (2)
}
which builds a seq for the string (1) (building a seq is really fast) and then takes the first item from that seq (2). after the return the seq is garbage.
Seqs are clearly the most idomatic way of accessing anything sequential in clojure and I'm not knocking them at all. It is interesting to be aware of what you are creating when. switching out all your calls to first with calls to nth is likely to be a case of premature optimization. if you want the 100th char in the string i would suggest using an indexed access function like nth
in short: don't sweat the small stuff :)
Fundamentally (at least on the Clojure level — though see Kotarak's answer and others for alternatives to this), you're comparing two sequences: "clojure" and "c". The condition of equality is that the first element of each sequence is equal. So if you want to express this directly you can do
(apply = (map first ["clojure" "c"]))
or the other way around, where you create a lazy sequence over the equality comparison between each pair of characters, and just take the first element of it:
(first (map = "clojure" "c"))
You could use the take function from clojure.contrib.string. Or write your own function that returns the first char if that's something you need frequently.
You can just use str, as you did in your second example. There isn't really anything wrong with that. I mean, you could call first on "c" as well to make it a character, but it wont really make a difference. Is there any reason why you don't like this? It's not really adding much to your code by calling str on the character.
user=> (= (subs "clojure" 0 1) "c")
true
user=> (= (str (first "clojure") "c"))
true
These days you don't necessarily have to use Java interop:
(clojure.string/starts-with? "clojure" "c")
starts-with? is just a thin wrapper (around .startsWith).
So now if you use both Clojure and ClojureScript you won't have to remember both the Java and the JavaScript interop.

Efficient mass string search problem

The Problem: A large static list of strings is provided. A pattern string comprised of data and wildcard elements (* and ?). The idea is to return all the strings that match the pattern - simple enough.
Current Solution: I'm currently using a linear approach of scanning the large list and globbing each entry against the pattern.
My Question: Are there any suitable data structures that I can store the large list into such that the search's complexity is less than O(n)?
Perhaps something akin to a suffix-trie? I've also considered using bi- and tri-grams in a hashtable, but the logic required in evaluating a match based on a merge of the list of words returned and the pattern is a nightmare, furthermore I'm not convinced its the correct approach.
I agree that a suffix trie is a good idea to try, except that the sheer size of your dataset might make it's construction use up just as much time as its usage would save. Theyre best if youve got to query them multiple times to amortize the construction cost. Perhaps a few hundred queries.
Also note that this is a good excuse for parallelism. Cut the list in two and give it to two different processors and have your job done twice as fast.
you could build a regular trie and add wildcard edges. then your complexity would be O(n) where n is the length of the pattern. You would have to replace runs of ** with * in the pattern first (also an O(n) operation).
If the list of words were I am an ox then the trie would look a bit like this:
(I ($ [I])
a (m ($ [am])
n ($ [an])
? ($ [am an])
* ($ [am an]))
o (x ($ [ox])
? ($ [ox])
* ($ [ox]))
? ($ [I]
m ($ [am])
n ($ [an])
x ($ [ox])
? ($ [am an ox])
* ($ [I am an ox]
m ($ [am]) ...)
* ($ [I am an ox]
I ...
...
And here is a sample python program:
import sys
def addWord(root, word):
add(root, word, word, '')
def add(root, word, tail, prev):
if tail == '':
addLeaf(root, word)
else:
head = tail[0]
tail2 = tail[1:]
add(addEdge(root, head), word, tail2, head)
add(addEdge(root, '?'), word, tail2, head)
if prev != '*':
for l in range(len(tail)+1):
add(addEdge(root, '*'), word, tail[l:], '*')
def addEdge(root, char):
if not root.has_key(char):
root[char] = {}
return root[char]
def addLeaf(root, word):
if not root.has_key('$'):
root['$'] = []
leaf = root['$']
if word not in leaf:
leaf.append(word)
def findWord(root, pattern):
prev = ''
for p in pattern:
if p == '*' and prev == '*':
continue
prev = p
if not root.has_key(p):
return []
root = root[p]
if not root.has_key('$'):
return []
return root['$']
def run():
print("Enter words, one per line terminate with a . on a line")
root = {}
while 1:
line = sys.stdin.readline()[:-1]
if line == '.': break
addWord(root, line)
print(repr(root))
print("Now enter search patterns. Do not use multiple sequential '*'s")
while 1:
line = sys.stdin.readline()[:-1]
if line == '.': break
print(findWord(root, line))
run()
If you don't care about memory and you can afford to pre-process the list, create a sorted array of every suffix, pointing to the original word, e.g., for ['hello', 'world'], store this:
[('d' , 'world'),
('ello' , 'hello'),
('hello', 'hello'),
('ld' , 'world'),
('llo' , 'hello'),
('lo' , 'hello'),
('o' , 'hello'),
('orld' , 'world'),
('rld' , 'world'),
('world', 'world')]
Use this array to build sets of candidate matches using pieces of the pattern.
For instance, if the pattern is *or*, find the candidate match ('orld' , 'world') using a binary chop on the substring or, then confirm the match using a normal globbing approach.
If the wildcard is more complex, e.g., h*o, built sets of candidates for h and o and find their intersection before the final linear glob.
You say you're currently doing linear search. Does this give you any data on the most frequently performed query patterns? e.g. is blah* much more common than bl?h (which i'd assume it was) among your current users?
With that kind of prior knowledge you can focus your indexing efforts on the commonly used cases and get them down to O(1), rather than trying to solve the much more difficult, and yet much less worthwhile, problem of making every possible query equally fast.
You can achieve a simple speedup by keeping counts of the characters in your strings. A string with no bs or a single b can never match the query abba*, so there is no point in testing it. This works much better on whole words, if your strings are made of those, since there are many more words than characters; plus, there are plenty of libraries that can build the indexes for you. On the other hand, it is very similar to the n-gram approach you mentioned.
If you do not use a library that does it for you, you can optimize queries by looking up the most globally infrequent characters (or words, or n-grams) first in your indexes. This allows you to discard more non-matching strings up front.
In general, all speedups will be based on the idea of discarding things that cannot possibly match. What and how much to index depends on your data. For example, if the typical pattern length is near to the string length, you can simply check to see if the string is long enough to hold the pattern.
There are plenty of good algorithms for multi-string search. Google "Navarro string search" and you'll see a good analysis of multi-string options. A number of algorithsm are extremely good for "normal" cases (search strings that are fairly long: Wu-Manber; search strings with characters that are modestly rare in the text to be searched: parallel Horspool). Aho-Corasick is an algorithm that guarantees a (tiny) bounded amount of work per input character, no matter how the input text is tuned to create worst behaviour in the search. For programs like Snort, that's really important, in the face of denial-of-service attacks. If you are interested in how a really efficient Aho-Corasick search can be implemented, take a look at ACISM - an Aho-Corasick Interleaved State Matrix.

Resources