ClickHouse- Search within nested fields

ClickHouse- Search within nested fields - nested

I have a nested field named items.productName wherein I want to check if the product name contains a particular string.
SELECT * FROM test WHERE hasAny(items.productName,['Samsung'])
This works only when the product name is Samsung.
I have tried array join
SELECT
*
FROM test
ARRAY JOIN items
WHERE items.productName LIKE '%Samsung%'
This works but it is very slow (~1 sec for 5 million records)
Is there a way to perform like within hasAny?

You can achieve this using arrayFilter function. ClickHouse docs
Query
Select * from test where arrayFilter(x -> x LIKE '%Samsung%', items.productName) != []
If you do not use != [] then you will get an error "DB::Exception: Illegal type Array(String) of column for filter. Must be UInt8 or Nullable(UInt8) or Const variants of them."

Related

How to query fields with multiple values in Azure Cognitive Search

Working on Azure Cognitive Search with backend as MS SQL table, have some scenarios where need help to define a query.
Sample table structure and data :
Scenarios 1 : Need to define a query which will return data based on category.
I have tied query using search.ismatch but its uses prefix search and matches other categories as well with similar kind of values i.e. "Embedded" and "Embedded Vision"
$filter=Region eq 'AA' and search.ismatch('Embedded*','Category')
https://{AZ_RESOURCE_NAME}.search.windows.net/indexes/{INDEX_NAME}/docs?api-version=2020-06-30-Preview&$count=true&$filter=Region eq 'AA' and search.ismatch('Embedded*','Category')
And it will response with below result, where it include "Embedded" and "Embedded Vision" both categories.
But my expectation is to fetch data only if it match "Embedded" category, as highlighted below
Scenario 2: For the above Scenario 1, Need little enhancement to find records with multiple category
For example if I pass multiple categories (i.e. "Embedded" , "Automation") need below highlighted output

you'll need to use a different analyzer which will break the tokens on every ';' just for the category field rather than 'whitespaces'.

You should first ensure your Category data is populated as a Collection(Edm.String) in the index. See Supported Data Types in the official documentation. Each of your semicolon-separated values should be separate values in the collection, in a property called Category (or similar).
You can then filter by string values in the collection. See rules for filtering string collections. Assuming that your index contains a string collection field called Category, you can filter by categories containing Embedded like this:
Category/any(c: c eq 'Embedded')
You can filter by multiple values like this:
Category/any(c: search.in(c, 'Embedded, Automation'))
Start with clean data in your index using proper types for the data you have. This allows you to implement proper facets and you can utilize the syntax made specifically for this. Trying to work around this with wildcards is a hack that should be avoided.

To solve above mention problem used a below SQL function which will convert category to a json string array supported by Collection(Edm.String) data type in Azure Search.
Sql Function
CREATE FUNCTION dbo.GetCategoryAsArray
(
#ID VARCHAR(20)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #result NVARCHAR(MAX) = ''
SET #result = REPLACE(
STUFF(
(SELECT
','''+ TRIM(Value) + ''''
FROM dbo.TABLEA p
CROSS APPLY STRING_SPLIT (Category, ';')
WHERE p.ID = #ID
FOR XML PATH('')
),1,1,''),'&','&')
RETURN '[' + #result + ']'
END
GO
View to use function and return desired data
CREATE View dbo.TABLEA_VIEW AS
select
id
,dbo. GetCategoryAsArray(id) as CategoryArr
,type
,region
,Category
from dbo.TABLEA
Defined a new Azure Search Index using above SQL View as data source and during Index column mapping defined CategoryArr column as Collection(Edm.String) data type
Query to use to achieve expected output from Azure Search
$filter=Region eq 'AA' and CategoryArr/any(c: search.in(c, 'Embedded, Automation'))

Cosmos DB paginated query with custom order by clause

I want to do a select query in Cosmos DB that returns a maximum number of results (say 50) and then gives me the continuation token so I can continue the search where I left off.
Now let's say my query has 2 equality conditions in my where clause, e.g.
where prop1 = "a" and prop2 = "w" and prop3 = "g"
In the results that are returned, I want the records that satisfy prop1 = "a" to appear first, followed by the results that have prop2 = "w" followed by the ones with prop3 = "g".
Why do I need it? Because while I could just get all the data to my application and sort it there, I can't pull all records obviously as that would mean pulling in too much data. So if I can't order it this way in cosmos itself, in the results that I get, I might only have those records that don't have prop1 = "a" at all. Now I could keep retrying this till I get the ones with prop1 = "a" (I need this because I want to show the results with prop1 = "a" as the first set of results to the user) but I might have to pull like a 100 times to get the first record since I have a huge dataset sitting in my Cosmos DB.
How can I handle this scenario in Cosmos? Thanks!

So if I am understanding your question correctly, you want to accomplish this:
SELECT * FROM c
WHERE
c.prop1 = 'a'
AND
c.prop2 = 'b'
AND
c.prop3 = 'c'
ORDER BY
c.prop1, c.prop2, c.prop3
OFFSET 0 LIMIT 25
Now, luckily you can now do this in CosmosDB SQL. But, there is a caveat. You have to set up a composite index in your collection to allow for this.
So, for this collection, my composite index would look like this:
Now, if I wanted to change it to this:
SELECT * FROM c
WHERE
c.prop1 = 'a'
AND
c.prop2 = 'b'
AND
c.prop3 = 'c'
ORDER BY
c.prop1 DESC, c.prop2, c.prop3
OFFSET 0 LIMIT 25
I could add another composite index to cover that use-case. You can see in your settings it's an array of arrays so you can add as many combinations as you'd like.
This should get you to where you need to be if I understood your question correctly.

Writing a subquery to display records in a grid

I have two DAC's POReceipt, and and POReceiptLine. POReceiptLine containts a field called MfrPartNbr.
I want the user to be able to lookup all the POReceipts where the POReceiptLine.MfrPartNbr is equal to an entered value.
The SQL would be
SELECT *
FROM dbo.POReceipt
WHERE POReceipt.ReceiptNbr IN
(
SELECT ReceiptNbr
FROM dbo.POReceiptLine
WHERE MfrPartNbr = 'MY_ENTERED_PART_NBR'
)
Any idea how to write the BQL Statement for this?

As stated, an inner join won't work in this case because you will receive the same POReceipt multiple times (once for each POReceiptLine). The following BQL query shows how you can get the desired results using a sub query. If mfrPartNbr is an extension field, then replace POReceiptLine.mfrPartNbr with the correct extension name (e.g. POReceiptLineExtension.mfrPartNbr).
PXSelect<POReceipt, Where<Exists<
Select<POReceiptLine,
Where<POReceiptLine.receiptNbr, Equal<POReceipt.receiptNbr>,
And<POReceiptLine.mfrPartNbr, Equal<Required<POReceiptLine.mfrPartNbr>>>>>>>>.Select(this, "MY_ENTERED_PART_NBR");

Can I filter multiple collections?

I want to filter multiple collections, to return only documents who have those requirements, the problem is when there is more than one matching value in one collection, the elements shown are repeated.
FOR TurmaA IN TurmaA
FOR TurmaB IN TurmaB
FILTER TurmaA.Disciplinas.Mat >10
FILTER TurmaB.Disciplinas.Mat >10
RETURN {TurmaA,TurmaB}
Screenshot of the problem

What your query does is to iterate over all documents of the first collection, and for each record it iterates over the second collection. The applied filters reduce the number of results, but this is not how you should go about it as it is highly inefficient.
Do you actually want to return the union of the matches from both collections?
(SELECT ... UNION SELECT ... in SQL).
What you get with your current approach are all possible combinations of the documents from both collections. I believe what you want is:
LET a = (FOR t IN TurmaA FILTER t.Disciplinas.Mat > 10 RETURN t)
LET b = (FOR t IN TurmaB FILTER t.Disciplinas.Mat > 10 RETURN t)
FOR doc IN UNION(a, b)
RETURN doc
Both collections are filtered individually in sub-queries, then the results are combined and returned.
Another solution would be to store all documents in one collection Turma and have another attribute e.g. Type with a value of "A" or "B". Then the query would be as simple as:
FOR t IN Turma
FILTER t.Disciplinas.Mat > 10
RETURN t
If you want to return TurmaA documents only, you would do:
FOR t IN Turma
FILTER t.Disciplinas.Mat > 10 AND t.Type == "A"
RETURN t
BTW. I recommend to call variables different from collection names, e.g. t instead of Turma if there is a collection Turma.

ColdFusion: Object with duplicate values (removing duplicates)

I have a query object (SQL) with some records, the problem is that some of the records contain duplicate values. :( (I can't use DISTINCT in my SQL Query, so how to remove in my object?)
categories[1].id = 1
categories[2].id = 1
categories[3].id = 2
categories[4].id = 3
categories[5].id = 2
Now I want to get a list with 1, 2, 3
Is that possible?

I'm not quite sure why you say you can't use DISTINCT, even given the qualification you offered. It doesn't matter were a query came from (<cfquery>, <cfldap>, <cfdirectory>, built by hand) by the time it's exposed to your CFML code, it's just "a query", so you can definitely use DISTINCT on it:
<cfquery name="distinctCategories" dbtype="query">
SELECT DISTINCT id
FROM categories
</cfquery>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

ClickHouse- Search within nested fields - nested

Related

How to query fields with multiple values in Azure Cognitive Search

Cosmos DB paginated query with custom order by clause

Writing a subquery to display records in a grid

Can I filter multiple collections?

ColdFusion: Object with duplicate values (removing duplicates)

Categories

Resources