Prolog getting head and tail of string - string

I'm trying to wrap my brain around Prolog for the first time (SWI-Prolog) and I'm struggling with what I'm sure are the basics. I'm trying to take a string such as "pie" and print out the military NATO spelling of it to look something like this:
spellWord("Pie").
Papa
India
Echo
Currently I'm just trying to verify that I'm using the [H|T] syntax and Write function correctly. My function is:
spellWord(String) :- String = [H|T], writeChar(H), spellWord(T).
writeChar(String) :- H == "P", print4("Papa").
When making a call to spellWord("Pie"). this currently just returns false.

SWI-Prolog has several different representation of what you might call "strings".
List of character codes (Unicode);
List of chars (one-letter atoms);
Strings, which are "atomic" objects, and can be manipulated only with the built-in predicates for strings;
And finally, of course, atoms.
You should read the documentation, but for now, you have at least two choices.
Choice 1: Use a flag to make double-quoted strings code lists
$ swipl --traditional
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.3.19-57-g9d8aa27)
Copyright (c) 1990-2015 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.
For help, use ?- help(Topic). or ?- apropos(Word).
?- X = "abc".
X = [97, 98, 99].
At this point, your approach should work, as you now have a list.
Choice 2: Use the new code list syntax with back-ticks
?- X = `abc`.
X = [97, 98, 99].
And, of course, there are predicates that convert between atoms, code lists, char lists, and strings. So, to make a list of chars (one-character atoms), you have:
atom_chars/2
char_code/2
string_chars/2
As for your predicate definition, consider using unification in the head. Also, don't mix side effects (printing) with what the predicate does. Let the top level (the Prolog interpreter) do the printing for you.
nato(p, 'Papa').
nato(i, 'India').
nato(e, 'Echo').
% and so on
word_nato([], []).
word_nato([C|Cs], [N|Ns]) :-
char_code(Char, C),
char_type(U, to_lower(Char)),
nato(U, N),
word_nato(Cs, Ns).
And with this:
?- word_nato(`Pie`, Nato).
Nato = ['Papa', 'India', 'Echo'].
I used chars (one-letter atoms) instead of character codes because those are easier to write.
And finally, you can use the following flag, and set_prolog_flag/2 at run time to change how Prolog treats a string enclosed in double quotes.
For example:
$ swipl
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.3.19-40-g2bcbced)
Copyright (c) 1990-2015 University of Amsterdam, VU Amsterdam
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to redistribute it under certain conditions.
Please visit http://www.swi-prolog.org for details.
For help, use ?- help(Topic). or ?- apropos(Word).
?- current_prolog_flag(double_quotes, DQs).
DQs = string.
?- string("foo").
true.
?- set_prolog_flag(double_quotes, codes).
true.
?- X = "foo".
X = [102, 111, 111].
?- set_prolog_flag(double_quotes, chars).
true.
?- X = "foo".
X = [f, o, o].
?- set_prolog_flag(double_quotes, atom).
true.
?- X = "foo".
X = foo.

Regardless of the Prolog system you are using and unless you have to
maintain existing code, stick to set_prolog_flag(double_quotes, chars). This works in many
systems
like B, GNU, IF, IV, Minerva, Scryer, SICStus, SWI, Tau, Trealla, YAP. So it is a safe
bet. The other options mentioned by #Boris are hard to debug. One is even specific to SWI
only.
?- set_prolog_flag(double_quotes, chars).
true.
?- L = "abc".
L = [a,b,c].
With
library(double_quotes)
these strings can be printed more compactly.
In SWI, the best you can do is to put in your .swiplrc the lines:
:- set_prolog_flag(back_quotes, string).
:- set_prolog_flag(double_quotes, chars).
:- use_module(library(double_quotes)).
For your concrete example, it is a good idea to avoid producing
side-effects immediately. Instead consider defining a relation
between a word and the spelling:
word_spelling(Ws, Ys) :-
phrase(natospelling(Ws), Ys).
natospelling([]).
natospelling([C|Cs]) -->
{char_lower(C, L)},
nato(L),
"\n",
natospelling(Cs).
nato(p) --> "Papa".
nato(i) --> "India".
nato(e) --> "Echo".
char_lower(C, L) :-
char_type(L, to_lower(C)).
?- word_spelling("Pie",Xs).
Xs = "Papa\nIndia\nEcho\n".
?- word_spelling("Pie",Xs), format("~s",[Xs]).
Papa
India
Echo
Xs = "Papa\nIndia\nEcho\n".
And here is your original definition. Most of the time, however, rather stick with the pure core of it.
spellWord(Ws) :-
word_spelling(Ws, Xs),
format("~s", [Xs]).
Also note that SWI's built-in library(pio) only works for
codes and leaves unnecessary choice-points open. Instead, use this
replacement
which works for chars and codes depending on the Prolog flag.
Historically, characters were first represented as atoms of length
one. That is, 1972 in Prolog 0. However, there, strings were
represented in a left-associative manner which facilitated suffix matching.
plur(nil-c-i-e-l, nil-c-i-e-u-x).
Starting with Prolog I, 1973, double quotes meant a list of characters
like today.
In 1977, DECsystem 10 Prolog changed the meaning of double quotes
to lists of characters codes and used codes in place of chars. This made some I/O operations
a little bit more efficient, but made debugging such programs much
more difficult [76,105,107,101,32,116,104,105,115] - can you read it?
ISO Prolog supports both. There is a flag double_quotes that
indicates how double quotes are
interpreted. Also,
character related built-ins are present for both:
char_code/2
atom_chars/2, number_chars/2, get_char/1/2, peek_char/1/2, put_char/1/2
atom_codes/2, number_codes/2, get_code/1/2, peek_code/1/2, put_code/1/2

The problems with your code are:
spellWord(String) :- String = [H|T], writeChar(H), spellWord(T).
When you give this predicate a long string, it will invoke itself with the tail of that string. But when String is empty, it cannot be split into [H|T], therefore the predicate fails, returning false.
To fix this, you have to define additionally:
spellWord([]).
This is the short form of:
spellWord(String) :- String = [].
Your other predicate also has a problem:
writeChar(String) :- H == "P", print4("Papa").
You have two variables here, String and H. These variables are in no way related. So no matter what you pass as a parameter, it will not influence the H that you use for comparison. And since the == operator only does a comparison, without unification, writeChar fails at this point, returning false. This is the reason why there is no output at all.

Related

Prolog: Checking if the first and last character of a string are left and right squiggly brackets('{' & '}')

I am very new to prolog, so assume that I know very little terminology.
I am using swipl in SWI-prolog.
I want to check if a string starts with a left squiggly bracket('{') and ends with a right squiggly bracket('}'}
Some answers that I have read online have lead me to program the following into my knowledge base to check if the string starts with a left squiggly bracket.
start_left_squiggle([Letter|_]):-
Letter = '{'.
But when I run this function, I get false, when I expect it to return true.
?- start_left_squiggle('{hello').
false.
As well, answers that seem correct for checking the if the last character is a squiggly bracket have lead me to code the following.
last_char(str, X):-
name(S, N),
reverse(N, [F|_]),
name(X, [F]).
end_right_squiggle(Werd):-
last_char(Werd, Last),
Last = '}'.
And I again get false when running the function, when I expect it to return true.
?- end_right_squiggle('hello}').
false.
Use sub_atom(Atom, Before, Length, After, Subatom) like so:
?- sub_atom('{abc}',0,1,_,C).
C = '{'.
?- sub_atom('{abc}',_,1,0,C).
C = '}'.
Or just test:
?- sub_atom('{abc}',0,1,_,'{').
true.
?- sub_atom('{abc}',_,1,0,'}').
true.
First thing you need to do is to break the atom into list of characters like this:
start_with_left(H):-
atom_chars(H,X), %here x is a list
X = [L|_], %get the head of the list which is frist element an compare
L == '{'.
You can use a recursive definition to check righ side of the atom after converting the atom into list of characters and when length of the list is 1 then compare it with bracket , it means if last element is same you should get true otherwise False.
Right is like this, it's same but we need last element so we have to use recursion:
start_with_right(H):-
atom_chars(H,X), %here x is a list
length(X,Y),
check_at_end(X,Y).
check_at_end([H|_],1):-
H == '}'.
check_at_end([_|T],Y):-
NewY is Y -1,
check_at_end(T,NewY).
.

Find the minimal lexographical string formed by merging two strings

Suppose we are given two strings s1 and s2(both lowercase). We have two find the minimal lexographic string that can be formed by merging two strings.
At the beginning , it looks prettty simple as merge of the mergesort algorithm. But let us see what can go wrong.
s1: zyy
s2: zy
Now if we perform merge on these two we must decide which z to pick as they are equal, clearly if we pick z of s2 first then the string formed will be:
zyzyy
If we pick z of s1 first, the string formed will be:
zyyzy which is correct.
As we can see the merge of mergesort can lead to wrong answer.
Here's another example:
s1:zyy
s2:zyb
Now the correct answer will be zybzyy which will be got only if pick z of s2 first.
There are plenty of other cases in which the simple merge will fail. My question is Is there any standard algorithm out there used to perform merge for such output.
You could use dynamic programming. In f[x][y] store the minimal lexicographical string such that you've taken x charecters from the first string s1 and y characters from the second s2. You can calculate f in bottom-top manner using the update:
f[x][y] = min(f[x-1][y] + s1[x], f[x][y-1] + s2[y]) \\ the '+' here represents
\\ the concatenation of a
\\ string and a character
You start with f[0][0] = "" (empty string).
For efficiency you can store the strings in f as references. That is, you can store in f the objects
class StringRef {
StringRef prev;
char c;
}
To extract what string you have at certain f[x][y] you just follow the references. To udapate you point back to either f[x-1][y] or f[x][y-1] depending on what your update step says.
It seems that the solution can be almost the same as you described (the "mergesort"-like approach), except that with special handling of equality. So long as the first characters of both strings are equal, you look ahead at the second character, 3rd, etc. If the end is reached for some string, consider the first character of the other string as the next character in the string for which the end is reached, etc. for the 2nd character, etc. If the ends for both strings are reached, then it doesn't matter from which string to take the first character. Note that this algorithm is O(N) because after a look-ahead on equal prefixes you know the whole look-ahead sequence (i.e. string prefix) to include, not just one first character.
EDIT: you look ahead so long as the current i-th characters from both strings are equal and alphabetically not larger than the first character in the current prefix.

Find out if first letter is a vowel prolog

I'm used to procedural programming languages, and I'm kind of struggling with prolog - the lack of resources online is also a bummer.
What would be the most 'prolog'-y way to get the first character of a given variable and check if it is a vowel?
Something like this is what I'm after, I think? This is all pseudocode - but is that how you'd solve it?
isVowel(Word) :-
vowels = [a, e, i, o, u],
firstLetter(Word[0]),
(
firstLetter in vowels ->
Vowel!
; Not a vowel!
).
Thanks so much,
Ollie
In Prolog you write definite clauses (rules) for predicates. Predicates describe logical relations. For example, you might have a predicate is_vowel/1 which is true if the given argument is a vowel.
is_vowel(Letter):-
member(Letter, "aeiouAEIOU").
In order to see if a word starts with a vowel you have to take the first letter:
starts_with_vowel(Word):-
Word = [First|_],
is_vowel(First).
Now, you can do unification and pattern matching simultaneously like this:
starts_with_vowel([FirstLetter|_]):-
is_vowel(FirstLetter).
A few example queries:
?- starts_with_vowel("Italy").
true ;
false.
?- starts_with_vowel("Vietnam").
false.
?- Letters = [_|"pple"], starts_with_vowel(Letters), string_to_atom(Letters, Word).
Letters = [97, 112, 112, 108, 101],
Word = apple ;
Letters = [101, 112, 112, 108, 101],
Word = epple ;
Letters = [105, 112, 112, 108, 101],
Word = ipple ...
You've got answers, but:
Don't do member, or memberchk. Instead, just use a table:
vowel(a).
vowel(e).
vowel(i).
vowel(o).
vowel(u).
Then, you don't say what sort of variable you have. If you have an atom:
?- sub_atom(Word, 0, 1, _, First), vowel(First).
You can convert almost anything to an atom easily. See for example here.
This query will succeed if the first character of the atom is a vowel, and fail otherwise. To make it a predicate:
first_letter_vowel(Word) :-
sub_atom(Word, 0, 1, _, First),
vowel(First).
Or, for example:
nth_letter_vowel(N, Word) :-
sub_atom(Word, N, 1, _, Letter),
vowel(Letter).
If you are using SWI-Prolog, you can also use downcase_atom/2:
nth_letter_vowel(N, Word) :-
sub_atom(Word, N, 1, _, Letter),
downcase_atom(Letter, Lower_Case),
vowel(Lower_Case).
EDIT: Why a table of facts and not member/2 or memberchk/2?
It is cleaner; it is more memory efficient, and speedy; it makes the intent of the program obvious; it is (and has been) the preferred way to do it: see the very bottom of this page (which, by the way, discusses many interesting things).
Here are is the exhaustive list of possible queries with vowel/1, when it is defined as a table of facts:
?- vowel(r).
false.
?- vowel(i).
true.
?- vowel(V).
V = a ;
V = e ;
V = i ;
V = o ;
V = u.
?- vowel(foobar(baz)). % or anything, really
false.
Now we know that member/2 is going to leave behind choice points, so memberchk/2 is certainly to be preferred (unless we mean to use the choice points!). But even then:
?- memberchk(a, [a,b,c]).
true. % that's fine
?- memberchk(x, [a,b,c]).
false. % ok
?- memberchk(a, L).
L = [a|_G1190]. % what?
?- memberchk(X, [a,b,c]).
X = a. % what?
So yes, in the context of the original question, assuming we make sure to carefully check the arguments to member/2 or memberchk/2, the reasons to prefer a table of facts are only stylistic and "practical" (memory efficiency, speed).
This can be done in several ways. In this particular solution I use Definite Clause Grammar (DCG).
Also, the answer depends a bit on what a "word" is. If it is a list of character codes, then the following suffices:
starts_with_vowel --> vowel, dcg_end.
vowel --> [X], {memberchk(X, [0'a,0'A,0'e,0'E,0'i,0'I,0'o,0'O,0'u,0'U])}.
dcg_end(_, []).
Example of use:
?- phrase(starts_with_vowel, `answer`).
true.
?- phrase(starts_with_vowel, `question`).
false.
PS: Notice that the use of backquotes here is SWI7-specific. In other Prologs a list of codes would appear within double quotes.
If a word is something else, then you first need to convert to codes. E.g., atom_codes(answer, Codes) if a word is represented by an atom.

SWI Prolog escape quotes

I need to put " " around a String in prolog.
I get the input from another program and as it looks I can't escape the " in this program, so i have to add the " in prolog otherwise the prolog statement doesn't work.
Thanks for your help!
For a discussion of strings see here, they are SWI-Prolog specific but use the same escape rules as atoms. There are many ways to enter quotes into an atom in a Prolog text:
1) Doubling them. So for example 'can''t be' is an atom,
with a single quote as the fourth character, and no other
single quotes in it.
2) Escaping them, with the backslash. So for example
'can\'t be' is the same atom as 'can''t be'.
3) Character coding them, using octal code and a closing back slash.
So for example 'can\47\t be' is the same atom as 'can''t be'.
4) Character coding them, using hex code and a closing back slash.
So for example 'can\x27\t be' is the same atom as 'can''t be'.
The above possibilities are all defined in the ISO standard. A
Prolog implementation might define further possibilities.
Bye
P.S.: Here is an example run in SWI-Prolog, using a different
example character. In the first example query below, you don't
need doubling, doubling can only be done for the surrounding quote.
The last example query below shows a SWI-Prolog specific syntax
which is not found in the ISO standard, namely using a backslash
u with a fixed width hex code:
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.1.33)
Copyright (c) 1990-2015 University of Amsterdam, VU Amsterdam
?- X = 'she said "bye"'.
X = 'she said "bye"'.
?- X = 'she said \"bye\"'.
X = 'she said "bye"'.
?- X = 'she said \42\bye\42\'.
X = 'she said "bye"'.
?- X = 'she said \x22\bye\x22\'.
X = 'she said "bye"'.
?- X = 'she said \u0022bye\u0022'.
X = 'she said "bye"'.

Representing the strings we use in programming in math notation

Now I'm a programmer who's recently discovered how bad he is when it comes to mathematics and decided to focus a bit on it from that point forward, so I apologize if my question insults your intelligence.
In mathematics, is there the concept of strings that is used in programming? i.e. a permutation of characters.
As an example, say I wanted to translate the following into mathematical notation:
let s be a string of n number of characters.
Reason being I would want to use that representation in find other things about string s, such as its length: len(s).
How do you formally represent such a thing in mathematics?
Talking more practically, so to speak, let's say I wanted to mathematically explain such a function:
fitness(s,n) = 1 / |n - len(s)|
Or written in more "programming-friendly" sort of way:
fitness(s,n) = 1 / abs(n - len(s))
I used this function to explain how a fitness function for a given GA works; the question was about finding strings with 5 characters, and I needed the solutions to be sorted in ascending order according to their fitness score, given by the above function.
So my question is, how do you represent the above pseudo-code in mathematical notation?
You can use the notation of language theory, which is used to discuss things like regular languages, context free grammars, compiler theory, etc. A quick overview:
A set of characters is known as an alphabet. You could write: "Let A be the ASCII alphabet, a set containing the 128 ASCII characters."
A string is a sequence of characters. ε is the empty string.
A set of strings is formally known as a language. A common statement is, "Let s ∈ L be a string in language L."
Concatenating alphabets produces sets of strings (languages). A represents all 1-character strings, AA, also written A2, is the set of all two character strings. A0 is the set of all zero-length strings and is precisely A0 = {ε}. (It contains exactly one string, the empty string.)
A* is special notation and represents the set of all strings over the alphabet A, of any length. That is, A* = A0 ∪ A1 ∪ A2 ∪ A3 ... . You may recognize this notation from regular expressions.
For length use absolute value bars. The length of a string s is |s|.
So for your statement:
let s be a string of n number of characters.
You could write:
Let A be a set of characters and s ∈ An be a string of n characters. The length of s is |s| = n.
Mathematically, you have explained fitness(s, n) just fine as long as len(s) is well-defined.
In CS texts, a string s over a set S is defined as a finite ordered list of elements of S and its length is often written as |s| - but this is only notation, and doesn't change the (mathematical) meaning behind your definition of fitness, which is pretty clear just how you've written it.

Resources