Translating expressions into English - programming-languages

I am curious as to this problem:
𝑎 (𝑎 | 𝑏) * a
with alphabet Σ = {𝑎, 𝑏}
I am suppose to translate this function into English. What I'm not too sure on is the |, and *.
Is the answer to this: "a followed by some combination of a or b, followed by a"

Related

How to search for A (any lengths any character) * in Python 3

I have a string with letters and occasionally *. I am trying to write regular expressions in Python 3 to get anything in the string that starts with A and ends with *, things in between can be any letter but not *. Help is much appreciated.
Try using the regex pattern \bA\w*\*(?!\S):
inp = "An example is an A* and also an Abc*"
matches = re.findall(r'\bA\w*\*(?!\S)', inp)
print(matches) # ['A*', 'Abc*']
There is a simpler pattern one can use with python regex.
r"(A.+?\*)"
For more on "Greedy vs Non-Greedy in Python": https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy
An example:
import re
string_to_test = "This is Atest1* but not this* but Atest2* done *a asdf"
for x in re.findall(r"(A.+?\*)", string_to_test):
print(x)
This will produce:
Atest1*
Atest2*
Note that depending on if the string "A*" is valid or not you might need the pattern r"(A.*?\*)"

What does the pound sign represent in a language?

L = { w#w#w | w ϵ {0,1}* }
What are some string outputs for this language? The pound sign is throwing me off. Is it the same as www? For example w = 01 so 010101 is in the language of L?
So, the pound sign (or sometimes the $ sign) is used in formal language theory as a separator. It's just some symbol that is not already in your original alphabet. So, formally what that means is that you are considering words on the extended alphabet containing three symbols {0, 1, #}.
The language you described is basically a word replicated three times seperated by #. Here are some examples of words in the language: 0#0#0, 111000#111000#111000, ###. Here are some examples of words not in the language: 0, 1###, 01#11#11.

Prolog DCG Building/Recognizing Word Strings from Alphanumeric Characters

So I'm writing simple parsers for some programming languages in SWI-Prolog using Definite Clause Grammars. The goal is to return true if the input string or file is valid for the language in question, or false if the input string or file is not valid.
In all almost all of the languages there is an "identifier" predicate. In most of the languages the identifier is defined as the one of the following in EBNF: letter { letter | digit } or ( letter | digit ) { letter | digit }, that is to say in the first case a letter followed by zero or more alphanumeric characters, or i
My input file is split into a list of word strings (i.e. someIdentifier1 = 3 becomes the list [someIdentifier1,=,3]). The reason for the string to be split into lists of words rather than lists of letters is for recognizing keywords defined as terminals.
How do I implement "identifier" so that it recognizes any alphanumeric string or a string consisting of a letter followed by alphanumeric characters.
Is it possible or necessary to further split the word into letters for this particular predicate only, and if so how would I go about doing this? Or is there another solution, perhaps using SWI-Prolog libraries' built-in predicates?
I apologize for the poorly worded title of this question; however, I am unable to clarify it any further.
First, when you need to reason about individual letters, it is typically most convenient to reason about lists of characters.
In Prolog, you can easily convert atoms to characters with atom_chars/2.
For example:
?- atom_chars(identifier10, Cs).
Cs = [i, d, e, n, t, i, f, i, e, r, '1', '0'].
Once you have such characters, you can used predicates like char_type/2 to reason about properties of each character.
For example:
?- char_type(i, T).
T = alnum ;
T = alpha ;
T = csym ;
etc.
The general pattern to express identifiers such as yours with DCGs can look as follows:
identifier -->
[L],
{ letter(L) },
identifier_rest.
identifier_rest --> [].
identifier_rest -->
[I],
{ letter_or_digit(I) },
identifier_rest.
You can use this as a building block, and only need to define letter/1 and letter_or_digit/1. This is very easy with char_type/2.
Further, you can of course introduce an argument to relate such lists to atoms.

SWI Prolog escape quotes

I need to put " " around a String in prolog.
I get the input from another program and as it looks I can't escape the " in this program, so i have to add the " in prolog otherwise the prolog statement doesn't work.
Thanks for your help!
For a discussion of strings see here, they are SWI-Prolog specific but use the same escape rules as atoms. There are many ways to enter quotes into an atom in a Prolog text:
1) Doubling them. So for example 'can''t be' is an atom,
with a single quote as the fourth character, and no other
single quotes in it.
2) Escaping them, with the backslash. So for example
'can\'t be' is the same atom as 'can''t be'.
3) Character coding them, using octal code and a closing back slash.
So for example 'can\47\t be' is the same atom as 'can''t be'.
4) Character coding them, using hex code and a closing back slash.
So for example 'can\x27\t be' is the same atom as 'can''t be'.
The above possibilities are all defined in the ISO standard. A
Prolog implementation might define further possibilities.
Bye
P.S.: Here is an example run in SWI-Prolog, using a different
example character. In the first example query below, you don't
need doubling, doubling can only be done for the surrounding quote.
The last example query below shows a SWI-Prolog specific syntax
which is not found in the ISO standard, namely using a backslash
u with a fixed width hex code:
Welcome to SWI-Prolog (Multi-threaded, 64 bits, Version 7.1.33)
Copyright (c) 1990-2015 University of Amsterdam, VU Amsterdam
?- X = 'she said "bye"'.
X = 'she said "bye"'.
?- X = 'she said \"bye\"'.
X = 'she said "bye"'.
?- X = 'she said \42\bye\42\'.
X = 'she said "bye"'.
?- X = 'she said \x22\bye\x22\'.
X = 'she said "bye"'.
?- X = 'she said \u0022bye\u0022'.
X = 'she said "bye"'.

Representing the strings we use in programming in math notation

Now I'm a programmer who's recently discovered how bad he is when it comes to mathematics and decided to focus a bit on it from that point forward, so I apologize if my question insults your intelligence.
In mathematics, is there the concept of strings that is used in programming? i.e. a permutation of characters.
As an example, say I wanted to translate the following into mathematical notation:
let s be a string of n number of characters.
Reason being I would want to use that representation in find other things about string s, such as its length: len(s).
How do you formally represent such a thing in mathematics?
Talking more practically, so to speak, let's say I wanted to mathematically explain such a function:
fitness(s,n) = 1 / |n - len(s)|
Or written in more "programming-friendly" sort of way:
fitness(s,n) = 1 / abs(n - len(s))
I used this function to explain how a fitness function for a given GA works; the question was about finding strings with 5 characters, and I needed the solutions to be sorted in ascending order according to their fitness score, given by the above function.
So my question is, how do you represent the above pseudo-code in mathematical notation?
You can use the notation of language theory, which is used to discuss things like regular languages, context free grammars, compiler theory, etc. A quick overview:
A set of characters is known as an alphabet. You could write: "Let A be the ASCII alphabet, a set containing the 128 ASCII characters."
A string is a sequence of characters. ε is the empty string.
A set of strings is formally known as a language. A common statement is, "Let s ∈ L be a string in language L."
Concatenating alphabets produces sets of strings (languages). A represents all 1-character strings, AA, also written A2, is the set of all two character strings. A0 is the set of all zero-length strings and is precisely A0 = {ε}. (It contains exactly one string, the empty string.)
A* is special notation and represents the set of all strings over the alphabet A, of any length. That is, A* = A0 ∪ A1 ∪ A2 ∪ A3 ... . You may recognize this notation from regular expressions.
For length use absolute value bars. The length of a string s is |s|.
So for your statement:
let s be a string of n number of characters.
You could write:
Let A be a set of characters and s ∈ An be a string of n characters. The length of s is |s| = n.
Mathematically, you have explained fitness(s, n) just fine as long as len(s) is well-defined.
In CS texts, a string s over a set S is defined as a finite ordered list of elements of S and its length is often written as |s| - but this is only notation, and doesn't change the (mathematical) meaning behind your definition of fitness, which is pretty clear just how you've written it.

Resources