Issue with GEO_CONTAINS - arangodb

I am looking at 3.4 RC with a view to outputting full GeoJSON files. Using the example of restaurants and neighbourhoods ie https://www.arangodb.com/arangodb-training-center/geojson-tutorial/ I can append the neighbourhood if I filter to a single neighbourhood but get "invalid loop in polygon" if I remove the name filter.
FOR n IN neighborhoods
FILTER n.name == "Chinatown"
FOR restaurant IN restaurants
FILTER GEO_CONTAINS(n.geometry, restaurant.location)
return
{type:"Feature","geometry":restaurant.location,
"properties":{"neighbourhood":n.name,"name":restaurant.name}}
Is this my inexperience with ArangoDB or an issue in 3.4?

Related

ArangoDB: Collect syntax statement error when adding the sum function for an collection attribute

I am trying to add all the Amounts in the edge collection and also extract the days from the date attribute in the edge collection named Transaction.
However, I am getting error in the collect statement.
for d in Transaction
filter d._to == "Account/123"
COLLECT aggregate ct =count(d._id),
aggregate totamnt=sum(d.Amount),
aggregate daysactive= count(distinct date_trunc(d.Time))
return distinct {"Incoming Accounts":length, "Days Active": daysactive}
If I understand what you want to achieve correctly, this is a query to achieve it:
FOR d IN Transaction
FILTER d._to == "Account/123"
COLLECT AGGREGATE length = COUNT_UNIQUE(d._id),
totamnt = SUM(d.Amount),
daysactive = COUNT_UNIQUE(DATE_TRUNC(d.Time, "day"))
RETURN {
"Incoming Accounts": length ,
"Days Active": LENGTH(daysactive),
"Total Amount": totamnt
}
Note: The distinct is not necessary, I include the total amount in the return value, and specified "day" as the unit to truncate the date to.
I tested this slightly adapted on a collection of mine and got sensible results.

Filtering on product frequency and category

I think there is a fairly simple solution to this but I just can't wrap my head around it right now.
I have a data frame with several hundred thousand orders. I am trying to find the nsmallest margin products by sub-category, but I am trying to filter out products that have been sold less than 2 separate times. e.g. if a product has only been involved in 1 transaction I don't want to include it in my output.
I would like my final output to be a frame of nsmallest margin products where 'Sub-Category' == 'Appliance' and 'Product' value count > 2.
The portion I'm getting stuck on is the "Product" value count portion. I just can't get it to work.
I've tried value_counts() as below:
df[(df['Sub-Category'] == 'Appliances') & (df.value_counts('Product Name') > 2)]\
.groupby('Product Name', as_index = False)['margin'].mean().nsmallest(20,'margin')
I've tried groupby().count()
df[(df['Sub-Category'] == 'Appliances') & (df.groupby('Product Name').count() > 2)]\
.groupby('Product Name', as_index = False)['margin'].mean().nsmallest(20,'margin')
I think I know the problem: I think the problem is that I'm filtering an aggregated frame and a non aggregated frame but I just cannot think of how to check and filter the frequency of a 'Product Name'. I'm kind of brain dead at this point.
Any suggestions?
Thanks in Advance
I think the problem is that I'm filtering an aggregated frame and a non aggregated frame.
You are right so use map to replace each product name by its value count.
Replace:
(df.value_counts('Product Name') > 2)
By:
(df['Product Name'].map(df.value_counts('Product Name')) > 2)
# OR
(df.groupby('Product Name')['Product Name'].transform('size') > 2)

How to check if all element of a list is inside of a list of strings

Im parsing a website to catch available products and there sizes. Theres 3 products loaded. Theres a list named 'find_id_1' that houses 3 elements, each element has the product name and their variant ids. I made 2 other list one named keywords and one named negative. the keywords list houses the keywords that my desired product title should have. If any elements from the negative list are in the product title then I don't want that product.
found_product = []
keywords = ['YEEZY','BOOST','700']
negative = ['INFANTS','KIDS']
find_id_1 = ['{"id":2069103968384,"title":
"\nYEEZY BOOST 700 V2","handle":**"yeezy-boost-700-v2-vanta-june-6"**,
[{"id":19434310238336,"parent_id":2069103968384,"available":true,
"sku":"193093889925","featured_image":null,"public_title":null,
"requires_shipping":true,"price":30000,"options"',
'{"id":2069103935616,"title":"\nYEEZY BOOST 700 V2 KIDS","handle":
"yeezy-boost-700-v2-vanta-kids-june-6",`
["10.5k"],"option1":"10.5k","option2":"",
`"option3":"","option4":""},{"id":19434309845120,"parent_id":2069103935616,
"available":false,"sku":"193093893625","featured_image":null,
"public_title":null,"requires_shipping":true,"price":18000,"options"',
'{"id":2069104001152,"title":"\nYEEZY BOOST 700 V2 INFANTS",
"handle":**"yeezy-boost-700-v2-vanta-infants-june-6"***,`
["4K"],"option1":"4k","option2":"",`
"option3":"","option4":""},{"id":161803398876,"parent_id":2069104001152,
"available":false,"sku":"193093893724",
"featured_image":null,"public_title":null,
"requires_shipping":true,"price":15000,"options"']
I've tried using a for loop to iterate through every element in find_info_1 then creating another for loop that iterates through every element in keyword and negative but i get the wrong product. Heres my code:
for product in find_id_1:
for key in keywords:
for neg in negative:
if key in product:
if neg not in product:
found_product = product
It prints the following:
'{"id":2069104001152,"title":"\nYEEZY BOOST 700 V2 INFANTS",
"handle":"yeezy-boost-700-v2-vanta-infants-june-6,`
["4K"],"option1":"4k","option2":"",`
"option3":"","option4":""},
{"id":161803398876,"parent_id":2069104001152,
"available":false,"sku":"193093893724",
"featured_image":null,"public_title":null,
"requires_shipping":true,"price":15000,"options"']
Im trying to get it to return element 0 from find_info_1 because thats the only one that doesn't have any of the elements from the list negative. Would using a for loop be the best and fastest way to iterate through my list? Thank you! Any help is welcome!
First of all you should'nt treat a json data as a string. Just parse the json using json library so you can check just the title of the product. As the product list and the specification of each of the product get bigger, the time taken for iteration increases.
To answer your question, you can simply do
for product in find_id_1:
if any(key in product for key in keywords):
if not any(neg in product for neg in negative):
found_product.append(product)
this will get you the element as per your specification. however I made some changes to your data, just to make it a valid python code..
found_product = []
keywords = ['YEEZY','BOOST','700']
negative = ['INFANTS','KIDS']
find_id_1 = [""""'{"id":2069103968384,"title":
"\nYEEZY BOOST 700 V2","handle":**"yeezy-boost-700-v2-vanta-june-6"**,
[{"id":19434310238336,"parent_id":2069103968384,"available":true,
"sku":"193093889925","featured_image":null,"public_title":null,
"requires_shipping":true,"price":30000,"options"'""",
""""'{"id":2069103935616,"title":"\nYEEZY BOOST 700 V2 KIDS","handle":
"yeezy-boost-700-v2-vanta-kids-june-6",`
["10.5k"],"option1":"10.5k","option2":"",
`"option3":"","option4":""},{"id":19434309845120,"parent_id":2069103935616,
"available":false,"sku":"193093893625","featured_image":null,
"public_title":null,"requires_shipping":true,"price":18000,"options"'""",
""""'{"id":2069104001152,"title":"\nYEEZY BOOST 700 V2 INFANTS",
"handle":**"yeezy-boost-700-v2-vanta-infants-june-6"***,`
["4K"],"option1":"4k","option2":"",`
"option3":"","option4":""},{"id":161803398876,"parent_id":2069104001152,
"available":false,"sku":"193093893724",
"featured_image":null,"public_title":null,
"requires_shipping":true,"price":15000,"options"'"""]
for product in find_id_1:
if any(key in product for key in keywords):
if not any(neg in product for neg in negative):
found_product.append(product)
print(found_product)

How to efficiently use Django query and Q to filter each object in a queryset and return 1 field value for each unique field in the queryset

I have query that returns the following queryset:
results = <QuerySet [<Product: ItemA>, <Product: ItemA>, <Product: ItemB>, <Product: ItemB>, <Product: ItemB>, <Product: ItemC>, <Product: ItemC>]>
The __str__ representation of the model is name and each Product variation likely has a different value for the price field. After this query, I need to search my database for each Product in the queryset and return the lowest price for each unique name so like:
Lowest price for all in database where name is == to ItemA
Lowest price for all in database where name is == to ItemB
Lowest price for all in database where name is == to ItemC
I use the following block of code to accomplish this goal:
query_list = []
for each in results:
if each.name not in query_list: #Checks if the name of the object is not in in the query list
query_list.append(each.name) #Adds just the name of the objects so there is just one of each name in query_list
for each in query_list:
priced = results.filter(name=each).order_by('price').first() #Lowest price for each name in query_list
This feel very inefficient. Is there a way to make a similar computation without having to append the unique name of each Product to a separate list, and iterating over that list, and then making a query for each one? I feel like there is a way to use a type of complex lookup to accomplish my goals, maybe event use less Python, and make the db do more of the work, but the above is the best I've been able to figure out so far. There can be a lot of different hits in results so I need this block to be as efficient as possible
It is easy after reading docs Generating aggregates for each item in a QuerySet and also "Interaction with default ordering or order_by()".
from django.db.models import Min
prices = {x['name']: x['lowest_price']
for x in results.values('name').annotate(lowest_price=Min('price').order_by()}
for product in results:
if product.name in prices and product.price == prices[product.name]:
priced = row # output the row
del prices[product.name]
That runs by two database queries.
An even more efficient solution with one query is probably possible with Window function, but it requires an advanced database backend and it can't work e.g. in tests with sqlite3.

Get objects with max value from grouped by linq query

I got the following linq query:
var invadersOrderedInColumns = from i in invaders
group i by i.GetPosition().X;
This will order the invaders with the same X position. The next thing I want to do is retrieve the invader with the highest Y value from each of those columns.
Imagine if you will each invader as a black blok in the following image. This will represent the invaders after the above linq query. Each X = Value is the key.
Now, from each of these groups (columns), I want to get the invaders with the highest Y position (so the bottom invader of each column when you look at the picture):
How can I get this done with a Linq query?
I don't much care for the query syntax, but in extension method syntax it would look something like this.
var invadersOrderedInColumns = invaders
.GroupBy(d => d.GetPosition().X)
.Select(d => d.OrderByDescending(y => y.GetPosition().Y).First());

Resources