Query multiple collections Arangodb

Query multiple collections Arangodb - arangodb

FOR col_name IN ['col_1', 'col_2']
FOR d IN FULLTEXT(col_name, 'label', #value)
RETURN d
does not works
but
FOR d IN FULLTEXT('col_1', 'label', #value)
RETURN d
works fine
I am using arango 3.4.2-1

in general you can query two collections like this:
FOR col1doc IN col_1
FILTER col1doc.foo == 'bar'
FOR col2doc IN col_2
FILTER col1doc.joinfield == col2doc.joinfield
RETURN {col1doc: col1doc, col2doc: col2doc}
as its documented in the AQL manual for joins
Please note that simple string equalities can be done using FILTERs and don't need fulltext indices.
To the old fulltext index for two collections you can use subqueries like this:
let col1Documents = (FULLTEXT(col_1, 'label', #value))
let col2Documents = (FULLTEXT(col_2, 'label', #value))
RETURN CONCAT(col1Documents, col2Documents)
The more modern way to achieve this would be to use ArangoSearch views which can handle numerous collections.

Related

Loop Through a list in python to query DynamoDB for each item

I have a list of items and would like to use each item as the pk (Primary Key) to query Dynamo DB, using Python.
I have tried using a for loop but I dont get any results, If I try the same query with the actual value from the group_id list it does work which means my query statement is correct.
group_name_query = []
for i in group_id:
group_name_query = config_table.query(
KeyConditionExpression=Key('pk').eq(i) & Key('sk').eq('GROUP')
)
Here is a sample group_ip = ['GROUP#6501e5ac-59b2-4d05-810a-ee63d2f4f826', 'GROUP#6501e5ac-59b2-4d05-810a-ee63d2sfdgd']

not answering your issue but got a suggestion, if you're querying base table with pk and sk instead of query gsi, i would suggest you Batch Get Item API to get multiple items in one shot
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/example_dynamodb_BatchGetItem_section.html

Count distinct doesn't work when using OrderBy & join

I have the following query trying to get count of a query:
var testQuery = Db
.From<Blog>()
.LeftJoin<BlogToBlogCategory>()
.Where(x => x.IsDeleted == false)
.OrderBy(x => x.ConvertedPrice);
var testCount = Db.Scalar<int>(testQuery.Select<Blog>(x => Sql.CountDistinct(x.Id)));
var results = Db.LoadSelect(testQuery.SelectDistinct());
It gives error:
42803: column "blog.converted_price" must appear in the GROUP BY clause or be used in an aggregate function
Issue seems to be the orderby statement. If I remove it then the error goes away. Why does this stop count distinct working?
I am having to clear orderby on all queries I do like this. Is it supposed to work this way?
Also I just realised count is wrong. Results is 501 unique records and testCount is 538.
What am I doing wrong?

Whenever in doubt with what an OrmLite query is generating, you can use the BeforeExecFilter to inspect the DB command before its executed or to just output the query to the Console you can use:
OrmLiteUtils.PrintSql();
You shouldn't be using OrderBy with aggregate scalar functions like COUNT which is meaningless and will fail in your case because it needs to included the GROUP BY clause for joined table queries.
Your specifically querying for COUNT(DISTINCT Id) if you wanted the row count for the query you can instead use:
var testCount = Db.RowCount(testQuery);
If you wanted to use COUNT(*) instead, you can use:
var testCount = Db.Count(testQuery);

How can you update values in a dataset?

So as far as I know Apache Spark doesn't has a functionality that imitates the update SQL command. Like, I can change a single value in a column given a certain condition. The only way around that is to use the following command I was instructed to use (here in Stackoverflow): withColumn(columnName, where('condition', value));
However, the condition should be of column type, meaning I have to use the built in column filtering functions apache has (equalTo, isin, lt, gt, etc). Is there a way I can instead use an SQL statement instead of those built in functions?
The problem is I'm given a text file with SQL statements, like WHERE ID > 5 or WHERE AGE != 50, etc. Then I have to label values based on those conditions, and I thought of following the withColumn() approach but I can't plug-in an SQL statement in that function. Any idea of how I can go around this?

I found a way to go around this:
You want to split your dataset into two sets: the values you want to update and the values you don't want to update
Dataset<Row> valuesToUpdate = dataset.filter('conditionToFilterValues');
Dataset<Row> valuesNotToUpdate = dataset.except(valuesToUpdate);
valueToUpdate = valueToUpdate.withColumn('updatedColumn', lit('updateValue'));
Dataset<Row> updatedDataset = valuesNotToUpdate.union(valueToUpdate);
This, however, doesn't keep the same order of records as the original dataset, so if order is of importance to you, this won't suffice your needs.
In PySpark you have to use .subtract instead of .except

If you are using DataFrame, you can register that dataframe as temp table,
using df.registerTempTable("events")
Then you can query like,
sqlContext.sql("SELECT * FROM events "+)

when clause translates into case clause which you can relate to SQL case clause.
Example
scala> val condition_1 = when(col("col_1").isNull,"NA").otherwise("AVAILABLE")
condition_1: org.apache.spark.sql.Column = CASE WHEN (col_1 IS NULL) THEN NA ELSE AVAILABLE END
or you can chain when clause as well
scala> val condition_2 = when(col("col_1") === col("col_2"),"EQUAL").when(col("col_1") > col("col_2"),"GREATER").
| otherwise("LESS")
condition_2: org.apache.spark.sql.Column = CASE WHEN (col_1 = col_2) THEN EQUAL WHEN (col_1 > col_2) THEN GREATER ELSE LESS END
scala> val new_df = df.withColumn("condition_1",condition_1).withColumn("condition_2",condition_2)
Still if you want to use table, then you can register your dataframe / dataset as temperory table and perform sql queries
df.createOrReplaceTempView("tempTable")//spark 2.1 +
df.registerTempTable("tempTable")//spark 1.6
Now, you can perform sql queries
spark.sql("your queries goes here with case clause and where condition!!!")//spark 2.1
sqlContest.sql("your queries goes here with case clause and where condition!!!")//spark 1.6

If you are using java dataset
you can update dataset by below.
here is the code
Dataset ratesFinal1 = ratesFinal.filter(" on_behalf_of_comp_id != 'COMM_DERIVS' ");
ratesFinal1 = ratesFinal1.filter(" status != 'Hit/Lift' ");
Dataset ratesFinalSwap = ratesFinal1.filter (" on_behalf_of_comp_id in ('SAPPHIRE','BOND') and cash_derivative != 'cash'");
ratesFinalSwap = ratesFinalSwap.withColumn("ins_type_str",functions.lit("SWAP"));
adding new column with value from existing column
ratesFinalSTW = ratesFinalSTW.withColumn("action", ratesFinalSTW.col("status"));

SQLAlchemy: Referencing labels in SELECT subqueries

I'm trying to figure out how to replicate the below query in SQLAlchemy
SELECT c.company_id AS company_id,
(SELECT policy_id FROM associative_table at WHERE at.company_id = c.company_id) AS policy_id_ref,
(SELECT `default` FROM policy p WHERE p.policy_id = policy_id_ref) AS `default`,
FROM company c;
Note that this is a stripped down, basic example of what I'm really dealing with. The actual schema supports data and relationship versioning that requires the subqueries to include additional conditions, sorting, and limiting, making it impractical (if not impossible) for them to be joins.
The crux of the problem is in how the second subquery relies on policy_id_ref -- the value obtained from the first subquery. In SQLAlchemy, this is effectively what I have now:
ct = aliased(classes.company)
at = aliased(classes.associative_table)
pt = aliased(classes.policy)
policy_id_ref = session.query(at.policy_id).\
filter(at.company_id == ct.company_id).\
label('policy_id_ref')
policy_default = session.query(pt.default).\
filter(pt.id == 'policy_id_ref').\
label('default')
query = session.query(ct.company_id,policy_id_ref,policy_default)
The pull from the "company" table works fine as does the first subquery that retrieves the "policy_id_ref" column. The problem is the second subquery that has to reference that "policy_id_ref" column. I don't know how to write its filter in such a way that it literally renders "policy_id_ref" in the resulting query, to match the label of the first subquery.
Suggestions?
Thanks in advance

You can write your query as
select(
Companies.company_id,
AssociativeTable.policy_id.label('policy_id_ref'),
Policy.default.label('policy_default'),
).select_from(
Companies,
).join(
AssociativeTable,
AssociativeTable.company_id == Companies.company_id,
).join(
Policy,
AssociativeTable.policy_id == Policy.id
)
but in case you need reference to label from subquery => use literal_column
from sqlalchemy import func, select, literal_column
session.query(
func.array_agg(
literal_column('batch_info'),
JSONB
).label('history')
).select_from(
select(
func.jsonb_build_object(
'batch_id', AccountingQueueBatch.id,
'batch_label', AccountingQueueBatch.label,
).label('batch_info')
).select_from(
AccountingQueueBatch,
)
)

Getting metadata in plain SQL statement in Slick 3.1.x

In the following plain SQL statement in Slick I know beforehand that it will return a list of (String, String)
sql"""select c.name, s.name
from coffees c, suppliers s
where c.price < $price and s.id = c.sup_id""".as[(String, String)]
But what if I don't know the column types? Can I analyze the metadata and retrieve the values? In JDBC I could use getInt(n) and getString(n), is there anything similar in Slick?

You can use tsql (Type-Checked SQL Statements):
tsql"""select c.name, s.name
from coffees c, suppliers s
where c.price < $price and s.id = c.sup_id"""
this will return a DBIO[Seq[(String, String)]] (depending on the column types).
produces a DBIOAction of the correct type without requiring a call to .as
Note: I've found it a little flakey (to the point of being unusable) with option types, so beware if your columns can be null (since null: String).
This requires a little bit of wiring up, you need #StaticDatabaseConfig (e.g. on your DAO), as these types are checked, against the database, at compile time:
# annotate the object
#StaticDatabaseConfig("file:src/main/resources/application.conf#tsql")
...
val dc = DatabaseConfig.forAnnotation[JdbcProfile]
import dc.driver.api._
val db = dc.db
# to pull out a Future[Seq[String, String]]
# use db.run(tsql"...")
# to pull out a Future[Option[(String, String)]]
# use db.run(tsql"...".headOption)
# etc.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Query multiple collections Arangodb - arangodb

FOR col_name IN ['col_1', 'col_2'] FOR d IN FULLTEXT(col_name, 'label', #value) RETURN d does not works but FOR d IN FULLTEXT('col_1', 'label', #value) RETURN d works fine I am using arango 3.4.2-1

Related

Loop Through a list in python to query DynamoDB for each item

Count distinct doesn't work when using OrderBy & join

How can you update values in a dataset?

SQLAlchemy: Referencing labels in SELECT subqueries

Getting metadata in plain SQL statement in Slick 3.1.x

Categories

Resources