How to join two solr indexes and get result from both? - search

I have two indices, on two different cores:
firstCore {id, fid, resid, status}
secondCore {id, resid, title, name, cat, role, exp}
I want to execute a join query that will give result of both the indexes, with matching criteria, I mean to display id, fid, resid, status, title, name, cat, role, exp. We can omit id, if needed, and this is id of secondCore.
What I tried:
1. Following query returned: id, fid, resid, status i.e. of firstCore
http://localhost:8888/solr/firstCore/select?q=*:*&fq={!join from=resid to=resid fromIndex=secondCore}resid:546384
Following query returned: id, resid, title, name, cat, role, exp, i.e. of secondCore.
http://localhost:8888/solr/secondCore/select?q=*:*&fq={!join%20from=resid%20to=resid%20fromIndex=firstCore}resid:546384
How can I get, id, fid, resid, status, title, name, cat, role, exp?

This can be achieved by sharding but as of now there is no solution for this.
This might help, check out http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201510.mbox/%3CCAEYSxhVie4oei+7sMFuAEZgOUxbJ-YM_hzHh54kgWiPqJuoFhQ#mail.gmail.com%3E

Related

BigQuery Struct Aggregation

I am processing an ETL job on BigQuery, where I am trying to reconcile data where there may be conflicting sources. I frist used array_agg(distinct my_column ignore nulls) to find out where reconciliation was needed and next I need to prioritize data per column base on the source source.
I thought to array_agg(struct(data_source, my_column)) and hoped I could easily extract the preferred source data for a given column. However, with this method, I failed aggregating data as a struct and instead aggregated data as an array of struct.
Considered the simplified example below, where I will prefer to get job_title from HR and dietary_pref from Canteen:
with data_set as (
select 'John' as employee, 'Senior Manager' as job_title, 'vegan' as dietary_pref, 'HR' as source
union all
select 'John' as employee, 'Manager' as job_title, 'vegetarian' as dietary_pref, 'Canteen' as source
union all
select 'Mary' as employee, 'Marketing Director' as job_title, 'pescatarian' as dietary_pref, 'HR' as source
union all
select 'Mary' as employee, 'Marketing Manager' as job_title, 'gluten-free' as dietary_pref, 'Canteen' as source
)
select employee,
array_agg(struct(source, job_title)) as job_title,
array_agg(struct(source, dietary_pref)) as dietary_pref,
from data_set
group by employee
The data I get for John with regard to the job title is:
[{'source':'HR', 'job_title':'Senior Manager'}, {'source': 'Canteen', 'job_title':'Manager'}]
Whereas I am trying to achieve:
[{'HR' : 'Senior Manager', 'Canteen' : 'Manager'}]
With a struct output, I was hoping to then easily access the preferred source using my_struct.my_preferred_source. I this particular case I hope to invoke job_title.HR and dietary_pref.Canteen.
Hence in pseudo-SQL here I imagine I would :
select employee,
AGGREGATE_JOB_TITLE_AS_STRUCT(source, job_title).HR as job_title,
AGGREGATE_DIETARY_PREF_AS_STRUCT(source, dietary_pref).Canteen as dietary_pref,
from data_set group by employee
The output would then be:
I'd like help here solving this. Perhaps that's the wrong approach altogether, but given the more complex data set I am dealing with I thought this would be the preferred approach (albeit failed).
Open to alternatives. Please advise. Thanks
Notes: I edited this post after Mikhail's answer, which solved my problem using a slightly different method than I expected, and added more details on my intent to use a single struct per employee
Consider below
select employee,
array_agg(struct(source as job_source, job_title) order by if(source = 'HR', 1, 2) limit 1)[offset(0)].*,
array_agg(struct(source as dietary_source, dietary_pref) order by if(source = 'HR', 2, 1) limit 1)[offset(0)].*
from data_set
group by employee
if applied to sample data in your question - output is
Update:
use below for clarified output
select employee,
array_agg(job_title order by if(source = 'HR', 1, 2) limit 1)[offset(0)] as job_title,
array_agg(dietary_pref order by if(source = 'HR', 2, 1) limit 1)[offset(0)] as dietary_pref
from data_set
group by employee
with output

How to Dynamically assemble SQL queries with haskell based on route parameters

I've got a scotty web app that I am trying to implement a dynamic search interface over and keep hitting a wall on how I should implement it. The Basic premise is the following:
Given a list of URL parameters: let params = [Param]
where param: type Param = (Text, Text)
I would like to be able to
Lets say I have a table in my database where:
Users
| user_id | username | email | first_name | last_name | created_at |
|---------|----------|-------|------------|-----------|------------|
| Int | Text | Text | Text | Text | timestamp |
My base SQL query might look like this:
baseQuery :: Query
baseQuery :: [sql| SELECT user_id, username, email, first_name, last_name, created_at FROM users |]
In the case of receiving url parameters I would want to be able to apply them to the WHERE clause, ORDER BY clause, etc.
What would be the best strategy to transform the following url:
arbitraryhost:/users?order_by_max=user_id&first_name=emg184&last_name=stackoverflow&limit=20
which would result in the following parameters list:
let params = [("order_by_max","user_id"), ("first_name","emg184"), ("last_name","stackoverflow"), ("limit", "20")]
How would i generate the follwing query:
"SELECT user_id, username, email, first_name, last_name, created_at
FROM users
WHERE first_name = ? AND last_name = ?
ORDER BY user_id
LIMIT ?"
("emg184", "stackoverflow", 20)
(I really like the way that the query builder knexjs works which allows for ad hoc fragments of queries to be generated and applied as a higher order function to some base query http://knexjs.org/)
Im wondering what a strategy or implementation of this might be as I have not been successful in building anything that I find to be very fitting for this.

Invalid type error when using Datastax Cassandra Driver

I have a case class which represents partition key values.
case class UserKeys (bucket:Int,
email: String)
I create query Clauses as follows:
def conditions(id: UserKeys):List[Clauses] = List(
QueryBuilder.eq("bucket", id.bucket), //TODOM - pick table description from config/env file.
QueryBuilder.eq("email", id.email)
)
And use the query as follows
val selectStmt =
select()
.from(tablename)
.where(QueryBuilder.eq(partitionKeyColumns(0), whereClauseList(0))).and(QueryBuilder.eq(partitionKeyColumns(1), whereClauseList(1)))
.limit(1)
I am getting following error.
com.datastax.driver.core.exceptions.InvalidTypeException: Value 0 of type class com.datastax.driver.core.querybuilder.Clause$SimpleClause does not correspond to any CQL3 type
Question 1 - What am I doing wrong?
The query works on cqlsh
The table I am querying is
CREATE TABLE users (
bucket int,
email text,
firstname text,
lastname text,
authprovider text,
password text,
PRIMARY KEY ((bucket, email), firstname, lastname)
Question 2 - Is there a way to print the List which contains the query clauses? I tried it but I get this incomprehensible text.
List(com.datastax.driver.core.querybuilder.Clause$SimpleClause#2389b3ee, com.datastax.driver.core.querybuilder.Clause$SimpleClause#927f81)
My bad, I was using the query clauses incorrectly. Rather than
.where(QueryBuilder.eq(partitionKeyColumns(0), whereClauseList(0))).and(QueryBuilder.eq(partitionKeyColumns(1), whereClauseList(1)))
I needed to do
.where(whereClauseList(0)).and(whereClauseList(1))
because the List already has QueryBuilder.eq("bucket", id.bucket) part

Search for multi-value all

CMIS search is there a way to search for all
In SQL it would be
select ID
from mvTo
where name in ('john', 'sally', 'same')
group by ID
having count(*) = 3
assume unique index on name
If you have a single-value property, such as cmis:createdBy, you can write a query that matches on any value in a list, like this:
SELECT * FROM cmis:document where cmis:createdBy in ('jpotts', 'admin', 'tuser1')
If you have a multi-value property, and you want to match if any of the values match, you can use the ANY keyword, like:
SELECT * FROM cmis:document where ANY sc:someMultiValuedProp in ('val1', 'val2', 'val3')
Group by is not supported.
For more information on what you can do with CMIS queries, read the Query Language Definition section of the CMIS specification.

Ecto query for selecting model without associated record

I have a User model that has many Membership. Membership has a field called group_id.
I want to get a list of Users who have no memberships with the group_id field not equal to 1.
I tried this
from u in User, join: m in assoc(u, :memberships), where: m.group_id != 1
I have 3 users in my db and one of them have a membership with group_id = 1. So I am expecting my query to return 2 users who don't have the membership. But it return empty array.
If you want to explicitly fetch the users without a membership, you need to use a left join and find where the group_id is nil:
from u in User,
left_join: m in assoc(u, :memberships),
where: is_nil(m.group_id))
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN? explains the differences between joins (left vs inner in this case.)

Resources