Simplifying and debugging a nested SPARQL query - nested

I'm working on a recommendation system for video games. I have a RDF graph containing informations about various games, including their genre and developer. In addition, I have a database of players that have given a star rating to various games.
I'm trying to build a single SPARQL query that does the following:
Assign a score to each game genre and developer, based on the average score the user gave to games of that genre/developer. To increase the effect of high and low ratings, I'm computing 2.5*(rating -3)^2 for each rating before taking the average.
Based on that, give every single game in the database a score which is the weighted average of the score given by the player to all its genres and developers (the real database has a few more features, I give here a simplified version). I also have a similarity relation between genres, where a game should also get points if one of its genres is similar to a well-liked genre.
Add/remove points to each game based on the price and language of the game
Finally, sort all the games by decreasing points. This gives a list of recommendations, where the best recommendations are the game with the highest scores.
My current attempt is the query below. It seems to be globally working, but there are a few caveats:
It oftens returns a very small sample of games, or even not at all. I would expect to get scores for every game (with possibly a lot of 0s), but that is not what is happening. I suspect this is due to not having default values in some places, but I'm not sure how to fix that
Overall, the query is very clunky and has multiple levels of nesting. It seems to me that the inner two SELECT queries are necessary, but I believe one of the outermost ones could be removed. However, my attempts so far have only led to stack overflow errors in Blazegraph. I also have to manually substitute the name of the player in 4 different places, which seems counterproductive. I would like to avoid that repetition if possible.
The query is quite slow (~2s) even if the database is rather small (~8Mb). When all the features are added, it can take up to 30 seconds to execute in Blazegraph. It doesn't seem to run at all when using RDFlib (empty results).
I would be very interested in optimizing/fixing this query, as I believe there are probably certain features of SPARQL that I'm not using but that could be useful in my case.
Any comments or ideas would be greatly appreciated.
(Simplified) query so far:
PREFIX gamerec:<http://example.com/gamerec/>
SELECT (SAMPLE(?title2) AS ?title3) (SAMPLE(?finalScore) AS ?finalScore2) WHERE {
{
SELECT ?game (SAMPLE(?title) AS ?title2) (SAMPLE(?aggregateScore) AS ?aggregateScore2) WHERE {
{
?player rdfs:label "USERNAME" ;
gamerec:uses ?os ;
gamerec:speaks ?lang .
?game rdfs:label ?title ;
gamerec:os ?os ;
gamerec:hasTextIn ?lang .
{
SELECT ?game (AVG(?devScore2) AS ?devScore3) (AVG(?pubScore2) AS ?pubScore3) (AVG(?tagScore2) AS ?tagScore3) (AVG(?genreScore2) AS ?genreScore3) (AVG(?featureScore2) AS ?featureScore3) WHERE {
?game gamerec:developedBy ?dev ;
gamerec:hasGenre ?genre ;
{
SELECT ?dev (AVG(?devScore) AS ?devScore2) WHERE {
?player rdfs:label "USERNAME" ;
gamerec:hasPlayerRating ?rating .
?rating gamerec:isAbout ?game ;
rdf:value ?value .
BIND ((?value-3)*(?value-3)*2.5*IF(?value < 3, -1, 1) AS ?devScore)
?game gamerec:developedBy ?dev .
} GROUP BY ?dev
}
{
SELECT ?genre (AVG(?genreScore) AS ?genreScore2) WHERE {
?player rdfs:label "USERNAME" ;
gamerec:hasPlayerRating ?rating .
?rating gamerec:isAbout ?game ;
rdf:value ?value .
BIND ((?value-3)*(?value-3)*2.5*IF(?value < 3, -1, 1) AS ?genreScore)
{
?game gamerec:hasGenre ?genre .
} UNION {
?game gamerec:hasGenre ?otherGenre .
?otherGenre gamerec:isSimilarGenre ?genre .
}
} GROUP BY ?genre
}
} GROUP BY ?game
}
}
MINUS {
?player a gamerec:Player ;
rdfs:label "USERNAME" ;
gamerec:hasPlayed ?game .
}
BIND (?devScore3 + 0.3*?genreScore3)
} GROUP BY ?game
}
?player a gamerec:Player ;
rdfs:label "USERNAME" ;
gamerec:hasBudget ?budget ;
gamerec:speaks ?lang .
?game gamerec:Price ?price .
?price rdf:value ?priceVal .
?budget rdf:value ?budgetVal .
BIND (IF(EXISTS { ?game gamerec:hasAudioIn ?lang }, 1, 0) AS ?audioBonus)
BIND (IF(?budgetVal < ?priceVal, ?priceVal/?budgetVal, 0) AS ?budgetMalus)
BIND (?aggregateScore2 - 10*?budgetMalus + 10*?audioBonus AS ?finalScore)
} GROUP BY ?title2 ORDER BY DESC(?finalScore2) ?title

Related

SPARQL: Get a page of results and total results without repeating the query

This seems like it should be a simple/common thing, but I haven't found any useful answers that work.
I want to get a page of results (using OFFSET and LIMIT) as well as a COUNT of total results in the same query, without repeating or re-running the query. In other words, I want to run the query once, count the results, get the first n results after some offset, and return just those n results along with the total result count. The exact format for how this is returned doesn't matter; I just need the data.
The closest answer I've found was in How to get total number of results when using LIMIT in SPARQL?, but the solution boils down to "duplicate the WHERE clause in two subqueries", which seems unnecessary (and runs the query twice(?)).
I suspect this can be done with some combination of subqueries and possibly a UNION, but I'm new to SPARQL so my grasp on its semantics isn't very firm yet.
A blatantly invalid example that illustrates what I want to accomplish (but not how I intend to do it):
SELECT (?id OFFSET 5 LIMIT 10 AS ?pageOfResults) (COUNT(?id) AS ?totalResults)
WHERE {
?id some:predicate some:object
ORDER BY ?id
}
The closest I've gotten is embodied by the next two examples. First, one which gives the desired result set (in this case, an extra result that contains the count). This is based on the link above. As noted above, it does so by duplicating the WHERE clause (effectively running the same query twice unless I misunderstand how SPARQL works), which I want to avoid:
SELECT ?id ?count
WHERE {
{
SELECT (COUNT(*) as ?count)
WHERE {
?id some:predicate some:object .
}
}
UNION
{
SELECT ?id
WHERE {
?id some:predicate some:object .
}
ORDER BY ?id
OFFSET 5
LIMIT 10
}
}
Next, one which comes close to what I want, but which always returns a ?count of 1 (presumably because it's counting the ?ids being grouped instead of counting all of the matches). I was trying to get (and COUNT) all of the matches first before passing the ?id up to the outer layer to get OFFSET and LIMITed (and that part seems to work).
SELECT ?id ?count
{
{
SELECT ?id (COUNT(*) as ?count)
WHERE {
?id some:predicate some:object .
}
GROUP BY ?id
ORDER BY ?id
}
}
OFFSET 5
LIMIT 10
It would be nice (for this and other purposes) to be able to store the result of the WHERE clause in a variable and then do two separate SELECTs on it (one for the page of results, one for the count), but if that's possible, I haven't seen a way to do it.

using collect in arangodb insert to create new documents

I have a collection called prodSampleNew with documents that have hierarchy levels as fields in arangodb:
{
prodId: 1,
LevelOne: "clothes",
LevelTwo: "pants",
LevelThree: "jeans",
... etc....
}
I want take the hierarchy levels and convert them into their own documents, so I can eventually build a proper graph with the hierarchy.
I was able to get this to extract the first level fo the hierarchy and put it in a new collection using the following:
for i IN [1]
let HierarchyList = (
For prod in prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel in HierarchyList
INSERT {"name": hierarchyLevel}
IN tmp
However, having to put a for I IN [1] at the top seems wrong and that there should be a better way.(yes I am fairly new to AQL)
Any pointers on a better way to do this would be appreciated
Not sure what you are trying to achieve exactly.
The FOR i IN [1] seems unnecessary however, so you could start your AQL query directly with the subquery to compute the distinct values from hierarchy level 1:
LET HierarchyList = (
FOR prod IN prodSampleNew
COLLECT LevelOneUnique = prod.LevelOne
RETURN LevelOneUnique
)
FOR hierarchyLevel IN HierarchyList
INSERT {"name": hierarchyLevel} IN tmp
The result should be the same.
If the question is more like "how can I get all distinct names of levels from all hierarchies", then you could use something like
LET HierarchyList = UNIQUE(FLATTEN(
FOR prod IN prodSampleNew
RETURN [ prod.LevelOne, prod.LevelTwo, prod.LevelThree ]
))
...
to produce an array with the unique names of the hierarchy levels for level 1-3.
Shouldn't this answer your question, please describe the desired result the query should produce.

ArangoDb AQL Graph queries traversal example

I am having some trouble wrapping my head around how to traverse a certain graph to extract some data.
Given a collection of "users" and a collection of "places".
And a "likes" edge collection to denote that a user likes a certain place. The "likes" edge collection also has a "review" property to store a user's review about the place.
And a "follows" edge collection to denote that a user follows another user.
How can I traverse the graph to fetch all the places that I like with my review of the place and the reviews of the users I follow that also like the same place.
for example, in the above graph. I am user 6327 and I reviewed both places(7968 and 16213)
I also follow user 6344 which also happens to have reviewed the place 7968.
How can I get all the places that I like and the reviews of the people that I follow who also reviewed the same place that I like.
an expected output would be something like the following:
[
{
name:"my name",
place: "place 1",
id: 1
review,"my review about place 1"
},
{
name:"my name",
place: "place 2",
id: 2
review,"my review about place 2"
},
{
name:"name of the user I follow",
place: "place 2",
id: 2
review,"review about place 2 from the user I follow"
}
]
There are a number of ways to do this query, and it also depends on where you want to add parameters, but for the sake of simplicity I've built this quite verbose query below to help you understand one way of approaching the problem.
One way is to determine the _id of your user record, then find all the _id's of the friends you follow, and then to work out all related reviews in one query.
I take a different approach below, and that is to:
Determine the reviews you have written
Determine who you follow
Determine the reviews the people you follow have written
Merge together your reviews with those of the people you follow
It is possible to merge these queries together more optimally, but I thought it worth breaking them out like this (and showing the output of each stage as well as the final answer) to help you see what data is available.
A key thing to understand about AQL graph queries is how you have access to vertices, edges, and paths when you perform a query.
A path is an object in it's own right and it's worth investigating the contents of that object to better understand how to exploit it for path information.
This query assumes:
users document collection contains users
places document collection contains places
follows edge collection tracks users following other users
reviews edge collection tracks reviews people wrote
Note: When providing an id on each record I used the id of the review, because if you know that id you can fetch the edge document and get the id of both the user and the place as well as read all the data about the review.
LET my_reviews = (
FOR vertices, edges, paths IN 1..1 OUTBOUND "users/6327" reviews
RETURN {
name: FIRST(paths.vertices).name,
review_id: FIRST(paths.edges)._id,
review: FIRST(paths.edges).review,
place: LAST(paths.vertices).place
}
)
LET who_i_follow = (
FOR v IN 1..1 OUTBOUND "users/6327" follows
RETURN v
)
LET reviews_of_who_i_follow = (
FOR users IN who_i_follow
FOR vertices, edges, paths in 1..1 OUTBOUND users._id reviews
RETURN {
name: FIRST(paths.vertices).name,
review_id: FIRST(paths.edges)._id,
review: FIRST(paths.edges).review,
place: LAST(paths.vertices).place
}
)
RETURN {
my_reviews: my_reviews,
who_i_follow: who_i_follow,
reviews_of_who_i_follow: reviews_of_who_i_follow,
merged_reviews: UNION(my_reviews, reviews_of_who_i_follow)
}
The first vertex in paths.vertices is the starting vertex (users/6327)
The last vertex in paths.vertices is the end of the path, e.g. who you follow
The first edge in paths.edges is the review that the user made of the place
Here is another more compact version of the query that takes a param, the _id of the user that is 'you'.
LET target_users = APPEND(TO_ARRAY(#user), (
FOR v IN 1..1 OUTBOUND #user follows RETURN v._id
))
LET selected_reviews = (
FOR u IN target_users
FOR vertices, edges, paths in 1..1 OUTBOUND u reviews
LET user = FIRST(paths.vertices)
LET place = LAST(paths.vertices)
LET review = FIRST(paths.edges)
RETURN {
name: user.name,
review_id: review._id,
review: review.review,
place: place.place
}
)
RETURN selected_reviews

Wordpress Woocommerce Variable Products

When setting up a variable type product in woocommerce based on product attributes, for the respective product, the front-end store provides a dropdown box where the user can pick the product attribute (ie lets say shirt size S, M or L).
At the moment that dropdown box is propagated by the attribute names, can someone please let me know where is the function located where I can make this dropdown be propagated by the names PLUS the particular attribute description?
So for instance, if the size attribute has name 'S' and description 'Small', I want the dropdown to say 'S [Small]', rather than just 'S', which is how it would be presented at the moment.
there's a filter available for this... but currently, version 2.5.2, you can only check for its name. Which for me, not so flexible. But at least we have this...
try pasting this code to your functions.php
add_filter('woocommerce_variation_option_name','rei_variation_option_name',10,1) ;
function rei_variation_option_name($name){
if ($name == 'S') {
$name = 'Small';
}
if ($name == 'M') {
$name = 'Medium';
}
if ($name == 'L') {
$name = 'Large';
}
return $name;
}
Please take note that this is just for displaying it. Database has not been change.
The solution I used, is this:
wc-template-functions.php, function wc_dropdown_variation_attribute_options
Modify:
$html .= '<option value="' . esc_attr( $term->slug ) . '" ' . selected( sanitize_title( $args['selected'] ), $term->slug, false ) . '>' . esc_html( apply_filters( 'woocommerce_variation_option_name', $term->name ) ) . '</option>';
to
$html .= '<option value="' . esc_attr( $term->slug ) . '" ' . selected( sanitize_title( $args['selected'] ), $term->slug, false ) . '>' . esc_html( apply_filters( 'woocommerce_variation_option_name', $term) ) . '</option>';
Then in functions.php, implement the following:
add_filter('woocommerce_variation_option_name','variation_option_name_description',10,1) ;
function variation_option_name_description($term){
return $term->name.' ['.$term->description.']';
}
Thanks to Reigel for steering me in the right direction.

Elastic Search input analysis

Can Elastic Search split input string into categorized words? i.e. if the input is
4star wi-fi 99$
and we are searching hotels with ES, is it possible to analyze/tokenize this string as
4star - hotel level, wi-fi - hotel amenities, 99$ - price?
yep, it's a noob question :)
Yes and no.
By default, query_string searches will work against the automatically created _all field. The contents of the _all field come from literally and naively combining all fields into a single analyzed string.
As such, if you have a "4star" rating, a "wi-fi" amenity, and a "99$" price, then all of those values would be inside of the _all field and you should get relevant hits against it. For example:
{
"level" : "4star",
"amenity" : ["pool", "wi-fi"],
"price" : 99.99
}
The problem is that you will not--without client-side effort--know what field(s) matched when searching against _all. It won't tell you the breakdown of where each value came from, rather it will simply report a score that determines the overall relevance.
If you have some way of knowing which field each term (or terms) is meant to search against, then you can easily do this yourself (quotes aren't required, but they're good to have to avoid mistakes with spaces). This would be the input that you might provide to the query_string query linked above:
level:"4star" amenity:"wi-fi" price:(* TO 100)
You could further complicate this by using a spelled out query:
{
"query" : {
"bool" : {
"must" : [
{ "match" : { "level" : "4star" } },
{ "match" : { "amentiy" : "wi-fi" } },
{
"range" : {
"price" : {
"lt" : 100
}
}
}
]
}
}
}
Naturally the last two requests would require advanced knowledge about what each search term referenced. You could certainly use the $ in "99$" as a tipoff for price, but not for the others. Chances are you wouldn't have them typing in 4 stars I hope, rather having some checkboxes or other form-based selections, so this should be quite realistic.
Technically, you could create a custom analyzer that recognized each term based on their position, but that's not really a good or useful idea.

Resources