Any method for finding the combination of a word? - groovy

I want to find the all possible combination of a given word. For example say, the given word is "the" then I need "t,h,e,teh..". I have to find this in groovy, is there is any method? Or please say me the outline of the algorithm.

If you need subsets as well, you could do something like this:
("word" as List).subsequences()*.permutations().inject( [] ) { list, set ->
list.addAll( set )
list
}*.join().sort { it.length() }
which gives you the following output:
[o, d, r, w, dw, wd, do, od, dr, rd,
wr, rw, ow, wo, ro, or, owd, wod, wdo,
odw, dwo, dow, orw, owr, wor, wro,
rwo, row, dor, ord, odr, rdo, rod,
dro, wdr, rwd, drw, rdw, wrd, dwr,
wrdo, orwd, wrod, wodr, ordw, wdor,
rwod, wdro, word, owdr, rdow, drow,
drwo, rdwo, odwr, dorw, odrw, dowr,
dwro, rodw, dwor, owrd, rowd, rwdo]
edit: changed the set.each to a list.addAll as it should be faster (and reads a lot easier)

("word" as List).permutations()*.join() will generate all permutations, not including subsets. Permutations of every possible subset could use this.
Update: After reading Tim's answer, I could come up with this:
("word" as List).subsequences()*.permutations().collect{ it*.join() }.flatten().sort{ it.length() } (could go without .sort{...})

Related

FuzzyWuzzy for very similar records in Python

I have a dataset with which I want to find the closest string match. For that purpose I'm using FuzzyWuzzy in this way
sol=process.extract(t,dev2,scorer=fuzz.token_sort_ratio)
Where t is the string and dev2 is the list to compare to. My problem is that sometimes it has very similar records and options provided by FuzzyWuzzy seems to be lacking. And I've tested with token_sort, token_set, partial_token sort and set, ratio, partial_ratio, and WRatio.
For example, the string Italy - Serie A gives me the following 2 closest matches.
Token_sort_ratio: (92, 'Italy - Serie D');(86, 'Italian - Serie A')
The one wanted is obviously the second one, but character by character is closer the first one, which is a different league.
This happens as well with teams. If, let's say I have a string Buchtholz I would obtains Buchtholz II before I get TSV Buchtholz.
My main guess now would be to try and weight the presence and absence of several characters more heavily, like single capital letters at the end of the string, so if there is a difference in the letter or an absence it is weighted as less close. Or for () and special characters.
I don't know if there is a way to take this into account or you guys have a better approach to get the string that really matches.
Similarity matches often require knowledge of the data being analysed. i.e. it is not just a blind single round of matching. I recommend that you pass your results through more steps of matching, starting with inclusive/optimistic approaches (like token_set_ratio) with low cut off scores and working toward more exclusive/pessimistic approaches with higher cut off scores until you have a clear winner. If you know more about the text you're analyzing, you can even modify the strings as you progress.
In a case I worked on, I did similarity matches of goods movement descriptions. In the descriptions the numbers sequences were more important than the text. e.g. when looking for a match for "SLURRY VALVE 250MM RAGMAX 2000" the 250 and 2000 part of the string are important, otherwise I get a "SLURRY VALVE 50MM RAGMAX 2000" as the best match instead of "VALVE B/F 250MM,RAGMAX 250RAG2000 RAGON" which is a better result.
I put the similarity match process through two steps: 1. Get a bunch of similar matches using an optimistic matching scorer (token_set_ratio) 2. get the number sequences of these results and pass them through another round of matching with a more strict scorer (token_sort_ratio). Doing this gave me the better result in the example I showed above.
Below is some blocks of code that could be of assistance:
here's a function to get numbers from the sequence. (In your case you might use this to exclude numbers from your string instead?)
def get_numbers_from_string(description):
numbers = ''.join((ch if ch in '0123456789.-' else ' ') for ch in description)
numbers = ' '.join([nr for nr in numbers.split()])
return numbers
and here is a portion of the code I used to put the description match through two rounds:
try:
# get close match from goods move that has material numbers
df_material = pd.DataFrame(process.extract(description,
corpus_material,
scorer=fuzz.token_set_ratio),
columns=['Similar Text','Score']
)
if df_material['Score'][df_material['Score']>=cut_off_accuracy_materials].count()>=1:
similar_text = df_material['Similar Text'].iloc[0]
score = df_material['Score'].iloc[0]
if nr_description_numbers>4:
# if there are multiple matches found, then get best number combination match
df_material = df_material[df_material['Score']>=cut_off_accuracy_materials]
new_corpus = list(df_material['Similar Text'])
new_corpus = np.vectorize(get_numbers_from_string)(new_corpus)
df_material['numbers'] = new_corpus
df_numbers = pd.DataFrame(process.extract(description_numbers,
new_corpus,
scorer=fuzz.token_sort_ratio),
columns=['numbers','Score']
)
similar_text = df_material['Similar Text'][df_material['numbers']==df_numbers['numbers'].iloc[0]].iloc[0]
nr_score = df_numbers['Score'].iloc[0]
hope it helps, and good luck

Loop with matrices created with assign function in R project

I created several matrices with the assign function as follows:
for (i in 2:105) { # Loop for creating and filling matrices
(assign(paste("m",i,sep=""),Datos[(x[i-1]+1):x[i],1:14]))
}
This give me several matrices... from m2 to m105... which is exactly what i wanted because i can extract and call this matrices with their index like m2[i,j] or m65[i,j] etc.
My problem is that I want to make a loop which include all my "m" matrices, but I don't know what could be the right code to do so because I need something like:
paste("m",i,"[i,j]",sep="") to return m2[i,j]...m3[i,j] ...... m105[i,j] and do the loop over this , but clearly the paste function returns a string and don't recognize m2.... m105 like matrices..... it returns m2[i,j] as text.
What should I do ?
Thank you very much !
regards
You have to use get:
get(paste("m", i, sep=""))[i,j]

how can i assign indented block in vim a special syntax highlighter?

for convenience in grouping couchdb functions
i created a file format that groups separate things together using yaml
it basically contains entries in the form of name.ext: |
followed by a intended block of code in the language fitting to .ext
for more pleasant editing i'd like to have vim use the correct syntax highlighters for them
edit
some code examples as requested
simple:
map.coffee: |
(doc) ->
for item in doc.items:
emit [doc.category, item], null
return
reduce: _count
more complex:
map.coffee: |
(doc) ->
emit doc.category, {items: 1, val: doc.value}
return
reduce.coffee: |
(keys, values, rereduce) ->
ret = {items: 0, val: 0}
for v in values
ret.items += doc.items
ret.val += doc.val
return ret
I believe that what you want it to make use of Vim's syntax regions (:help syn-region). But regions take delimiters as parameters.
You have a well defined start but not a defined end, maybe you could work your way around by establishing some conventions here like "2 empty new lines at the end".
There are similar answers that might give you a hint (including the docs) on how to implement a solution, like: Embedded syntax highligting in Vim
Also interesting and similar approach is this Vimtip: http://vim.wikia.com/wiki/Different_syntax_highlighting_within_regions_of_a_file
You have to write your own syntax file, and define a syntax region for each of your entries. Inside that region, you can then syntax-include the corresponding language as defined by your ext. Read all the details at :help :syn-include.
If that sounds too complicated, check out my SyntaxRange plugin. It is based on the Vimtip mentioned by alfredodeza. With it, you can quickly assign a syntax to a range of lines, e.g. :11,42SyntaxInclude perl

What's the best way to extract a single value from a Set in groovy?

If I have a Set that I know contains a single element, what's the best way to extract it? The best I can come up with is this, but it doesn't feel very groovy:
set = [1] as Set
e = set.toList()[0]
assert e == 1
If I'm dealing with a list, I've got lots of nice ways to get the element, none of which seem to work with Sets:
def list = [1]
e = list[0]
(e) = list
e = list.head()
One other possibility (which will work in Java or Groovy):
set.iterator().next()
A few alternatives, none of them very pretty:
set.iterator()[0]
set.find { true }
set.collect { it }[0]
Finally, if it's guaranteed that that set has only one item:
def e
set.each { e = it }
The underlying issue, of course, is that Java Sets provide no defined order (as mentioned in the Javadoc), and hence no ability to get the nth element (discussed in this question and this one). Hence, any solution is always to somehow convert the set to a list.
My guess is that either of the first two options involve the least data-copying, as they needn't construct a complete list of the set, but for a one-element set, that should hardly be a concern.
Since Java 8, here is another solution that will work for both Java and Groovy:
set.stream().findFirst().get()
Even when this question is quite old, I am sharing my just a bit prettier solution.
(set as List).first()
(set as List)[0]
If you need to take null into account (not the case in this question):
(set as List)?.first()
(set as List)?.get(index)
Hope it helps! :)

weighted RDF predicate (owl:ObjectProperty)

in RDF a statement is represented with S,P and O; In OWL the owl:ObjectProperty represents the predicate logic.
(S) (P) (O)
I like dog
<owl:Class rdf:about="Person" />
<owl:NamedIndividual rdf:about="I">
<rdf:type rdf:resource="Person"/>
<like rdf:resource="Dog"/>
</owl:NamedIndividual>
<owl:Class rdf:about="Pet" />
<owl:NamedIndividual rdf:about="Dog">
<rdf:type rdf:resource="Pet"/>
</owl:NamedIndividual>
<owl:ObjectProperty rdf:about="like">
<rdfs:domain>
<owl:Restriction>
<owl:onProperty rdf:resource="like"/>
<owl:someValuesFrom rdf:resource="Person"/>
</owl:Restriction>
</rdfs:domain>
<rdfs:range>
<owl:Restriction>
<owl:onProperty rdf:resource="like"/>
<owl:someValuesFrom rdf:resource="Pet"/>
</owl:Restriction>
</rdfs:range>
</owl:ObjectProperty>
But how about to describe "the degree" I like dogs?
How can I give a property or value to a predicate?
One solution I got is to extend one (S,P,O) statement to 3 statements.
For example,
(S) (P) (O)
Person isSrcOf LikeRelation
Pet isTargetOf LikeRelation
LikeRelation hasValue [0~100]
It should work but obviously it will let ontology 3 times bigger :(
I appreciate any suggestion!
I wouldn't use RDF reification, not in this case and almost not in any case. RDF reification just makes the things always more complicated. As you commented it will inflate your ontology, but not just that, it'll also make your ontology very difficult for applying OWL reasoning.
I've dealt with the same scenario that you've presented and most of times I've ended up with the following design.
(S) (P) [ (P) (O) (P) (O)]
I like [ 'what I like' Dog , 'how much I like it' 'a lot']
Class: LikeLevel //it represents class of things a person likes with a degree factor.
ObjectProperty: likeObject
Domain: LikeLevel
Range: Pet //(or Thing)
ObjectProperty: likeScale
Domain: LikeLevel
Range: xsd:int //(or an enumeration class i.e: 'nothing', 'a bit', 'very much',...)
ObjectProperty: like
Domain: Person
Range: LikeLevel
If you want to represent some instance data with this model (in RDF/Turtle syntax):
:I :like [ a :LikeLevel;
:likeObject :dogs;
:likeScale 5.7] .
In this case I'm creating a blank node for the object LikeLevel but you could create a ground object as well, sometimes you might want/need to avoid bNodes. In that case:
:I :like :a0001 .
:a0001 a :LikeLevel;
:likeObject :dogs;
:likeScale 5.7.
This design can be consider a light case of reification, the main difference with RDF reification is that keeps the ontology design in the user's model.
Your suggestion is a valid one; it is called reification and is the standard way of representing properties inherent to a relationship between two items in an ontology or RDF graph, where statements are made in a pairwise manner between items - it is a limitation of the data model itself that makes reification necessary sometimes.
If you're worried that reification will inflate your ontology, you could try the following instead, but are generally less desirable and come with their own problems:
Create specific properties, such as somewhatLikes, doesntLike, loves; this may be suitable if you have a limited set of alternatives, and don't mind creating the extra properties. This becomes tedious and cumbersome (and I'd go so far as to suggest incorrect) if you intend to encode the 'degree of likeness' with an integer (or any wide range of alternatives) - following this approach, you'd have properties like likes0, likes1, ..., likes99, likes100. This method would also preclude querying, for example, all dogs that a person likes within a range of degree values, which is possible in SPARQL with the reification approach you've specified, but not with this approach.
Attach the likesDogs property to the Person instance, if the assertion can be made against the person onto all types/instances of Dog, and not individual instances. This will, of course, be dependent of what you're trying to capture here; if it's the latter, then this also won't be appropriate.
Good luck!
I think #msalvadores gets it wrong.
Let's forget about the dogs and likes. What we are really doing here is:
a x b
axb y c
axb z d
where axb is the identifier of the a x b statement, a, b, c, d are subjects or objects and x, y, z are predicates. What we need is binding the a, x, b resources to the axb statement somehow.
This is how reification does it:
axb subject a
axb predicate x
axb object b
which I think is very easy to understand.
Let's check what msalvadores does:
:I :like [ a :LikeLevel;
:likeObject :dogs;
:likeScale 5.7] .
we can easily translate this to axb terms
a x w
w type AxbSpecificObjectWrapper
w object b
w y c
which is just mimicking reification with low quality tools and more effort (you need a wrapper class and define an object property). The a x w statement does not makes sense to me; I like a like level, which objects are dogs???
But how about to describe "the degree" I like dogs?
There are 2 ways to do this as far as I can tell with my very limited RDF knowledge.
1.) use reification
stmt_1
a LikeStatement
subject I
predicate like
object dogs
how_much "very much"
2.) instantiate a predicate class
I like_1 dogs
like_1
a Like
how_much "very much"
It depends on your taste and your actual vocab which one you choose.
How can I give a property or value to a predicate?
I don't think you understand the difference between a predicate and a statement. A great example about it is available here: Simple example of reification in RDF
Tolkien wrote Lord of the rings
Wikipedia said that
The statement here:
that: [Tolkien, wrote, LotR]
If we are making statements about the statement, we write something like this:
[Wikipedia, said, that]
If we are making statements about the predicate then we write something like this:
[Wikipedia, said, wrote]
I think there is a big difference. Reification is about making statements about statements not about predicates...
A sentence from Jena's document just catch my eye.
...OWL Full allows ... state the following .... construction:
<owl:Class rdf:ID="DigitalCamera">
<rdf:type owl:ObjectProperty />
</owl:Class>
..
does OWL Full really allow an ObjectProperty be a Class as well?
If an ObjectProperty could be a Class, and could have individuals then I could describe a statement with
S_individual P_individual O_individual
and I could have further properties on P_individual. Is it right?
or am I missing some points?
since the following RDF is valid, a corresponding OWL should be achievable.
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:j.0="http://somewhere/" >
<rdf:Description rdf:about="http://somewhere/Dog_my_dog">
<j.0:name>Lucky</j.0:name>
</rdf:Description>
<rdf:Description rdf:about="http://somewhere/like_dog">
<j.0:degree>80</j.0:degree>
</rdf:Description>
<rdf:Description rdf:about="http://somewhere/Cat_my_cat">
<j.0:name>Catty</j.0:name>
</rdf:Description>
<rdf:Description rdf:about="http://somewhere/like_cat">
<j.0:degree>86</j.0:degree>
</rdf:Description>
<rdf:Description rdf:about="http://somewhere/Person_I">
<j.0:name>Bob</j.0:name>
<j.0:like_dog rdf:resource="http://somewhere/Dog_my_dog"/>
<j.0:like_cat rdf:resource="http://somewhere/Cat_my_cat"/>
</rdf:Description>
</rdf:RDF>

Resources