Using Match in a sqlite fts5 query but need more control over ranking? - python-3.x

I have a virtual table created using fts5:
import sqlite3
# create a db in memory
con = sqlite3.connect(':memory:')
con.execute('create virtual table operators using fts5(family, operator, label, summary, tokenize=porter)')
# some sample data
samples = {'insideTOP':
{'label':'Inside',
'family':'TOP',
'summary':'The Inside TOP places Input1 inside Input2.'
},
'inTOP':
{'label':'In',
'family':'TOP',
'summary':'The In TOP is used to create a TOP input.'
},
'fileinSOP':
{'label':'File In',
'family':'SOP',
'summary':'The File In SOP allows you to read a file'
}
}
# fill db with those values
for operator in samples.keys():
opDescr = samples[operator]
con.executescript("insert into operators (family, operator, label, summary) values ('{0}','{1}','{2}','{3}');".format(opDescr['family'],operator,opDescr['label'],opDescr['summary']))
with following columns
+--------+-----------+------------+----------------------------------------------+
| family | operator | label | summary |
+--------+-----------+------------+----------------------------------------------+
| TOP | insideTOP | Inside | The Inside TOP places Input1 inside Input2.|
| TOP | inTOP | In | The In TOP is used to create a TOP input. |
| SOP | fileinSOP | File In | The File In SOP allows you to read a file |
+--------+-----------+------------+----------------------------------------------+
an example query is:
# query the db
query = "select operator from operators where operators match 'operator:In*' or operators match 'label:In*' order by family, bm25(operators)"
result = con.execute(query)
for row in result:
print(row)
And as a result I get
fileinSOP
insideTOP
inTOP
For this particular case though, I'd actually like the 'inTOP' to appear before the 'insideTOP' as the label is a perfect match.
What would be a good technique to be able to massage these results the way I'd like them?
Thank you very much
Markus

maybe you can put your order rule in the question.
If you use bm25 to order your results, you can't achieve the result you want
I suggest you that you can use your custom rank function, like below sql:
query = "select operator from operators where operators match 'operator:In*' or operators match 'label:In*' order by myrank(family, operators)"
define a custom rank function is very easy in fts5, you can follow the guide in the fts5 website.
if you also want bm25 result as a rank score, you can get the score in the rank method can calculate your final score.

Related

KQL query, how to extend information from rendereddescription

I want to extend the query result with specific values, but I do not know how to get only a fragment of information, the one that is in the screen, that is, for example, from the "rendereddescription" section, I only need information about "server_principal_name" and assign it to some value, e.g. "user" and this I know this needs to be resolved | extend "variable name" = i here i do not know what the syntax is.enter image description here
you can use the parse operator: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parseoperator
for example:
print RenderDescription = #"... 0000000000 session_server_principal_name:ABB\HPAM-TCS-DB10 server_principal_sid:01050000000 ...."
| parse RenderDescription with * "session_server_principal_name:" session_server_principal_name " " *
RenderDescription
session_server_principal_name
... 0000000000 session_server_principal_name:ABB\HPAM-TCS-DB10 server_principal_sid:01050000000 ....
ABB\HPAM-TCS-DB10

Kusto Query Language: set column name of summarize by evaluated expression

Me again asking another Kusto related question (I really wish there would be a thorough video tutorial on this somewhere).
I have a summarize statement, that produces two columns for y axis and one for x axis.
Now i want to relabel the columns for x axis to show a string, that i also got from the database and already put into a variable with let.
This basically looks like this:
let android_col = strcat("Android: ", toscalar(customEvents
| where application_Version contains secondLatestVersionAndroid));
let iOS_col = strcat("iOS: ", toscalar(customEvents
| where application_Version contains secondLatestVersionIOS));
... some Kusto magic ...
| summarize
Android = 100 - (round((countif(hasUnhandledErrorAndroid == 1 ) * 100.0 ) / countif(isAndroid == 1), 2)),
iOS = 100 - (round((countif(hasUnhandledErroriOS == 1) * 100.0 ) / countif(isIOS == 1), 2))
by Time
|render timechart with (ytitle="crashfree users in %", xtitle="date", legend=visible )
Now i want to have the summarize display not Android and iOS, but the value of android_col and iOS_col.
Is that possible?
Best regards
Maverick
Generally, it's suggested to have predefined column names, otherwise various features don't work. For example, IntelliSense won't know the names of the columns, as they would be determined at run time only. Also, if you create a function that returns a dynamic schema, you won't be able to run this function from other clusters.
However, if you do want to change column names, you definitely have a way to do it by using various plugins. For example, bag_unpack, pivot and others.
As for courses on Kusto, there are actually several excellent courses on Pluralsight (all are free):
How to start with Microsoft Azure Data Explorer
Basic KQL
Azure Data Explorer – Advanced KQL
The usage of the "toscalar" in this query looks wrong, it seems to me that you should use the "extend" operator with the same logic to create the additional columns.

How to extract specific sub directory names from URL

Given the following request URLs:
https://example.com/api/foos/123/bars/456
https://example.com/api/foos/123/bars/456
https://example.com/api/foos/123/bars/456/details
Common structure: https://example.com/api/foos/{foo-id}/bars/{bar-id}
I wish to get separate columns for the values of {foo-id} and {bar-id}
What I tried
requests
| where timestamp > ago(1d)
| extend parsed_url=parse_url(url)
| extend path = tostring(parsed_url["Path"])
| extend: foo = "value of foo-id"
| extend: bar = "value of bar-id"
This gives me /api/foos/{foo-id}/bars/{bar-id} as a new path column.
Can I solve this question without using regular expressions?
Related, but not the same question:
Application Insights: Analytics - how to extract string at specific position
Splitting on the '/' character will give you an array and then you can extract the elements you are looking for as long as the path stays consistent. Using parse_url() is optional- you could use substring() or just adjust the indexes you retrieve.
requests
| extend path = parse_url(url)
| extend elements = split(substring(path.Path, 1), "/") //gets rid of the leading slash
| extend foo=tostring(elements[2]), bar=tostring(elements[4])
| summarize count() by foo, bar

How to pass List of strings from Cucumber Scenario

I need to pass the List of strings from cucumber scenario which works fine as below
Scenario Outline: Verify some scenario
Given something
When user do something
Then user should have some "<data>"
Examples: Some example
|data|
|Test1, Test2, Test3, Test4|
In the step definition I use List to retrieve the values of something variable.
But when one of the value of data variable contains comma(,) e.g. Tes,t4 it becomes complex,since it considers "Tes" and "t4" as two different values
Examples: Some example
|something|
|Test1, Test2, Test3, Tes,t4|
So is there any escape character that i can use or is there is there any other way to handle this situation
Found an easy way. Please see the below steps.
Here is my feature file.
Here is the corresponding code to map feature step with code.
Oh yes. Result is important. You can see the debug view.
This should work for you:
Scenario: Verify some scenario
Given something
When user do something
Then user should have following
| Test1 |
| Test2 |
| Test3 |
| Tes,t4|
In Step definitions
Then("^user should have following$")
public void user_should_have_following(List<String> testData) throws Throwable {
#TODO user your test data as desired
}
In Transformer of TypeRegistryConfigurer, you can do this
#Override
public Object transform(String s, Type type) {
if(StringUtils.isNotEmpty(s) && s.startsWith("[")){
s = s.subSequence(1, s.length() - 1).toString();
return Arrays.array(s.split(","));
}
return objectMapper.convertValue(s, objectMapper.constructType(type));
}
Try setting the Examples in a column, like this:
| data |
| Test1 |
| Test2 |
| Test3 |
| Tes,t4 |
This will run the scenario 4 times, expecting 'something' to change to the next value. First 'Test1', then 'Test2', etc.
In the step definition you can use that data like so:
Then(/^user should have some "([^"]*)"$/) do |data|
puts data
end
If you want to use |Test1, Test2, Test3, Tes,t4|, change the ',' to ';' ex: |Test1; Test2; Test3; Tes,t4| and in the step definition split the data:
data.split("; ") which results in ["test1", "test2", "test3", "te,st"]
Converting the data to a List (in Java):
String test = "test1; test2; test3; tes,t4";
String[] myArray = test.split("; ");
List<String> myList = new ArrayList<>();
for (String str : myArray) {
myList.add(str);
}
System.out.print(myList);
More on this here
Examples:
Colors
color-count
Red, Green
5
Yellow
8
def function("{colors}"):
context.object.colors = list(colors.split(","))
for color in context.object.colors:
print(color)
Don't put the data in your scenario. You gain very little from it, and it creates loads of problems. Instead give your data a name and use the name in the Then of your scenario
e.g.
Then the user should see something
Putting data and examples in scenarios is mostly pointless. The following apply
The data will be a duplication of what should be produced
The date is prone to typos
When the scenario fails it will be difficult to know if the code is wrong (its producing the wrong data) or the scenario is wrong (you've typed in the wrong data)
Its really hard to express complex data accurately
Nobody is really going to read your scenario carefully enough to ensure the data is accurate

Pycassa: how to query parts of a Composite Type

Basically I'm asking the same thing as in this question but for the Python Cassandra library, PyCassa.
Lets say you have a composite type storing data like this:
[20120228:finalscore] = '31-17'
[20120228:halftimescore]= '17-17'
[20120221:finalscore] = '3-14'
[20120221:halftimescore]= '3-0'
[20120216:finalscore] = '54-0'
[20120216:halftimescore]= '42-0'
So, I know I can easily slice based off of the first part of the composite type by doing:
>>> cf.get('1234', column_start('20120216',), column_finish('20120221',))
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
But if I only want the finalscore, I would assume I could do:
>>> cf.get('1234', column_start('20120216', 'finalscore'),
column_finish('20120221', 'finalscore'))
To get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0')])
But instead, I get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
Same as the 1st call.
Am I doing something wrong? Should this work? Or is there some syntax using the cf.get(... columns=[('20120216', 'finalscore')]) ? I tried that too and got an exception.
According to http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1, I should be able to do something like this...
Thanks
If know all the components of the composite column then you should the 'columns' option:
cf.get('1234', columns=[('20120216', 'finalscore')])
You said you got an error trying to do this, but I would suggest trying again. It works fine for me.
When you are slicing composite columns you need to think about how they are sorted. Composite columns sort starting first with the left most component, and then sorting each component toward the right. So In your example the columns would look like this:
+------------+---------------+------------+---------------+------------+----------------+
| 20120216 | 20120216 | 20120221 | 20120221 | 20120228 | 20120228 |
| finalscore | halftimescore | finalscore | halftimescore | finalscore | halftimescore |
+------------+---------------+------------+---------------+------------+----------------+
Thus when you slice from ('20120216', 'finalscore') to ('20120221', 'finalscore') you get both values for '20120216'. To make your query work as you want it to you could change the column_finish to ('20120216', 'halftimescore').

Resources