How to know if an atom is starting with a pattern? - string

For example, if I got the following predicates :
father('jim', 'Boby')
father('rob', 'bob')
and I would like to know who got father with is name starting with 'bo' ?

Simply use atom_concat/3, a ISO Prolog standard built-in predicate.

Another ISO option is sub_atom/5:
sub_atom(Atom, Before, Length, After, Sub_atom)
?- sub_atom(bob, 0, _, _, bo).
true.
Compared to atom_concat/3, this avoids the generation of the unneeded atom to represent the suffix.

Related

Lexer definition to match keyword or an abbreviation

For writing a parser I would like to not only match full keywords but also abbreviations thereof, for example
MY-KEYWORD
must at least match
MY-KEY
but also any exact match longer than that, namely
MY-KEYW or MY-KEYWO or MY-KEYWOR or the full MY-KEYWORD
Is this possible with a reasonable lexer fragment or will I have to define specific alternative matches ?
TIA
Alex
Easiest would be to do something like this:
MY_KEYWORD
: 'MY-KEY' ('W' ('O' ('R' 'D'?)?)?)?
;

Way to find a number at the end of a string in Smalltalk

I have different commands my program is reading in (i.e., print, count, min, max, etc.). These words can also include a number at the end of them (i.e., print3, count1, min2, max6, etc.). I'm trying to figure out a way to extract the command and the number so that I can use both in my code.
I'm struggling to figure out a way to find the last element in the string in order to extract it, in Smalltalk.
You didn't told which incarnation of Smalltalk you use, so I will explain what I would do in Pharo, that is the one I'm familiar with.
As someone that is playing with Pharo a few months at most, I can tell you the sheer amount of classes and methods available can feel overpowering at first, but the environment actually makes easy to find things. For example, when you know the exact input and output you want, but doesn't know if a method already exists somewhere, or its name, the Finder actually allow you to search by giving a example. You can open it in the world menu, as shown bellow:
By default it seeks selectors (method names) matching your input terms:
But this default is not what we need right now, so you must change the option in the upper right box to "Examples", and type in the search field a example of the input, followed by the output you want, both separated by a ".". The input example I used was the string 'max6', followed by the desired result, the number 6. Pharo then gives me a list of methods that match that:
To get what would return us the text part, you can make a new search, changing the example output from number 6 to the string 'max':
Fortunately there is several built-in methods matching the description of your problem.
There are more elegant ways, I suppose, but you can make use of the fact that String>>#asNumber only parses the part it can recognize. So you can do
'print31' reversed asNumber asString reversed asNumber
to give you 31. That only works if there actually is a number at the end.
This is one of those cases where we can presume the input data has a specific form, ie, the only numbers appear at the end of the string, and you want all those numbers. In that case it's not too hard to do, really, just:
numText := 'Kalahari78' select: [ :each | each isDigit ].
num := numText asInteger. "78"
To get the rest of the string without the digits, you can just use this:
'Kalahari78' withoutTrailingDigits. "Kalahari"6
As some of the Pharo "OGs" pointed out, you can take a look at the String class (just type CMD-Return, type in String, hit Return) and you will find an amazing number of methods for all kinds of things. Usually you can get some ideas from those. But then there are times when you really just need an answer!

Remove single quotes/quotation marks in Prolog

I have an extern API sending info to my Prolog application and I found a problem creating my facts.
When the information received is extensive, Prolog automatically adds ' (single quotes) to that info.
Example: with the data received, the fact I create is:
object(ObjectID,ObjectName,'[(1,09:00,12:00),(2,10:00,12:00)]',anotherID)
The fact I would like to create is
object(ObjectID,ObjectName,[(1,09:00,12:00),(2,10:00,12:00)] ,anotherID)
without the ' before the list.
Does anyone know how to solve this problem? With a predicate that receives '[(1,09:00,12:00),(2,10:00,12:00)]' and returns [(1,09:00,12:00),(2,10:00,12:00)]?
What you see is an atom, and you want to convert it to a term I think.
If you use swi-prolog, you can use the builtin term_to_atom/2:
True if Atom describes a term that unifies with Term. When Atom is instantiated, Atom is parsed and the result unified with Term.
Example:
?- term_to_atom(X,'[(1,09:00,12:00),(2,10:00,12:00)]').
X = [ (1, 9:0, 12:0), (2, 10:0, 12:0)].
So at the right hand side, you enter the atom, at the left side the "equivalent" term. Mind however that for instance 00 is interpreted as a number and thus is equal to 0, this can be unintended behavior.
You can thus translate the predicate as:
translate(object(A,B,C,D),object(A,B,CT,D)) :-
term_to_atom(CT,C).
Since you do not fully specify how you get this data, it is unknown to me how you will convert it. But the above way will probably be of some help.

In R, how do I replace a string that contains a certain pattern with another string?

I'm working on a project involving cleaning a list of data on college majors. I find that a lot are misspelled, so I was looking to use the function gsub() to replace the misspelled ones with its correct spelling. For example, say 'biolgy' is misspelled in a list of majors called Major. How can I get R to detect the misspelling and replace it with its correct spelling? I've tried gsub('biol', 'Biology', Major) but that only replaces the first four letters in 'biolgy'. If I do gsub('biolgy', 'Biology', Major), it works for that case alone, but that doesn't detect other forms of misspellings of 'biology'.
Thank you!
You should either define some nifty regular expression, or use agrep from base package. stringr package is another option, I know that people use it, but I'm a very huge fan of regular expressions, so it's a no-no for me.
Anyway, agrep should do the trick:
agrep("biol", "biology")
[1] 1
agrep("biolgy", "biology")
[1] 1
EDIT:
You should also use ignore.case = TRUE, but be prepared to do some bookkeeping "by hand"...
You can set up a vector of all the possible misspellings and then do a loop over a gsub call. Something like:
biologySp = c("biolgy","biologee","bologee","bugs")
for(sp in biologySp){
Major = gsub(sp,"Biology",Major)
}
If you want to do something smarter, see if there's any fuzzy matching packages on CRAN, or something that uses 'soundex' matching....
The wikipedia page on approx. string matching might be useful, and try searching R-help for some of the key terms.
http://en.wikipedia.org/wiki/Approximate_string_matching
You could first match the majors against a list of available majors, any not matching would then be the likely missspellings. Then use the agrep function to match these against the known majors again (agrep does approximate matching, so if it is similar to a correct value then you will get a match).
The vwr package has methods for string matching:
http://ftp.heanet.ie/mirrors/cran.r-project.org/web/packages/vwr/index.html
so your best bet might be to use the string with the minimum Levenshtein distance from the possible subject strings:
> levenshtein.distance("physcs",c("biology","physics","geography"))
biology physics geography
7 1 9
If you get identical minima then flip a coin:
> levenshtein.distance("biolsics",c("biology","physics","geography"))
biology physics geography
4 4 8
example 1a) perl/linux regex: 's/oldstring/newstring/'
example 1b) R equivalent of 1a: srcstring=sub(oldstring, newstring, srcstring)
example 2a) perl/linux regex: 's/oldstring//'
example 2b) R equivalent of 2a: srcstring=sub(oldstring, "", srcstring)

identify common chars in correct order (kind of regular expression) from a array of strings

I am looking for how to identify common chars from a set of strings of different
length. First let me tell the same problem had posted here, and the author is somehow able to find out the answer. But i could not get his solution. I tried to post my query over
there, but not sure whether I will get any reply. So i am posting as a new one. (this is
the link for old qs Find common chars in array of strings, in the right order
of-strings-in-the-right-order).
I m taking the same example from him.
Let's assume "+" is the "wildcard char":
Array(
0 => '48ca135e0$5',
1 => 'b8ca136a0$5',
2 => 'c48ca13730$5',
3 => '48ca137a0$5');
Should return :
$wildcard='+8ca13+0$5';
This looks to me as a standard problem. so i doubt there will be some library
for this. If not pls show some light for solving this.
I dont think comparing char-by-char work (as told in the reply), becoz the matching char can come in anywhere (eg:- arr1[1] and arr2[3] can be starting index of matching some substring and the other way also).
regards,
Looks like you're looking for the "longest common substring". The first longest common substring is 8ca13, the second longest is 0$5. Once we have these two strings, you can take any of the strings in the set and replace extra characters with a single +.
http://en.wikipedia.org/wiki/Longest_common_substring_problem

Resources