Search/replace in Kusto

Search/replace in Kusto - azure

Use case: Remove a string from Azure Application Insights results
This is a simple question but with minimal examples online and as a new user, and with limited experience (but learning) in Regex, I am struggling.
How do I remove all instances of | Articles in the following table, which is an example of what I am exporting from Azure Application Insights?
This did not work:
| extend name=replace(#' | Articles', #'', name)
I have fiddled quite a bit unsuccessfully with an example in Microsoft's documentation (I know this interpretation is incorrect):
| extend str=strcat(' | Articles', tostring(name))
| extend replaced=replace(#' | Articles', #'', str)
Thank you for any insights.

the reason your initial attempt doesn't work is that the first argument to replace() is a regular expression, and if you have the pipe (|) in is, you'll need to properly escape it, using a backslash (\).
for example:
datatable(s:string)
[
"Article 1 | Articles",
"Article 2",
"Article 3 | Articles"
]
| extend replaced=replace(#' \| Articles', #'', s)
ideally, you'll choose a solution that doesn't require using a regular expression, if possible.
for example:
datatable(s:string)
[
"Article 1 | Articles",
"Article 2",
"Article 3 | Articles"
]
| extend i = indexof(s, " | Articles")
| project s = case(i == -1, s, substring(s, 0, i))

Related

How to extract specific sub directory names from URL

Given the following request URLs:
https://example.com/api/foos/123/bars/456
https://example.com/api/foos/123/bars/456
https://example.com/api/foos/123/bars/456/details
Common structure: https://example.com/api/foos/{foo-id}/bars/{bar-id}
I wish to get separate columns for the values of {foo-id} and {bar-id}
What I tried
requests
| where timestamp > ago(1d)
| extend parsed_url=parse_url(url)
| extend path = tostring(parsed_url["Path"])
| extend: foo = "value of foo-id"
| extend: bar = "value of bar-id"
This gives me /api/foos/{foo-id}/bars/{bar-id} as a new path column.
Can I solve this question without using regular expressions?
Related, but not the same question:
Application Insights: Analytics - how to extract string at specific position

Splitting on the '/' character will give you an array and then you can extract the elements you are looking for as long as the path stays consistent. Using parse_url() is optional- you could use substring() or just adjust the indexes you retrieve.
requests
| extend path = parse_url(url)
| extend elements = split(substring(path.Path, 1), "/") //gets rid of the leading slash
| extend foo=tostring(elements[2]), bar=tostring(elements[4])
| summarize count() by foo, bar

How to replace repeated words with single word

I have a string variable response:
where where where is it
I'm going there
where where did you say
sometimes it is where you think
i think its where where you go
its everywhere where you are
i am planning on going where where where i want to
As you can see, the word "where" is repeated quite often. I want to replace strings "where where" and "where where where" (or even "where where where where") with "where".
However, I don't want to replace "everywhere where" with "where".
I know I can do this manually, but I was hoping to condense the code into as few lines as possible.
This is what I have been trying so far:
gen temp = regexr(response, " (where)+ where ", " where ")
replace temp = regexr(response, "^(where)+ where ", "where ")
These are my results after running the code above:
where where is it
I'm going there
where did you say
sometimes it is where you think
i think its where where you go
its everywhere where you are
i am planning on going where where where i want to
Instead, I want the final data to look like this:
where is it
I'm going there
where did you say
sometimes it is where you think
i think its where you go
its everywhere where you are
i am planning on going where i want to
I have been using "(where)+" to capture both "where where" and "where where where" but it doesn't seem to work. I also split the code into two commands, one begins with "^(where)" and the other with " (where)" in order to avoid capturing the 'where' in "everywhere" but it seems as if the code does not capture "where where" when it occurs in the middle of the sentence.

A quick fix using Stata's string functions is the following:
clear
input str50 string1
"where where where is it"
"I'm going there"
"where where did you say"
"sometimes it is where you think"
"i think its where where you go"
"its everywhere where you are"
"i am planning on going where where where i want to"
end
generate tag1 = !strmatch(string1, "*everywhere where*")
generate tag2 = ( length(string1) - length(subinstr(string1, "where", "", .)) ) / 5
generate string2 = cond(tag1 == 1, stritrim(subinstr(string1, "where", "", tag2-1)), string1)
list string2, separator(0)
+----------------------------------------+
| string2 |
|----------------------------------------|
1. | where is it |
2. | I'm going there |
3. | where did you say |
4. | sometimes it is where you think |
5. | i think its where you go |
6. | its everywhere where you are |
7. | i am planning on going where i want to |
+----------------------------------------+

Using Match in a sqlite fts5 query but need more control over ranking?

I have a virtual table created using fts5:
import sqlite3
# create a db in memory
con = sqlite3.connect(':memory:')
con.execute('create virtual table operators using fts5(family, operator, label, summary, tokenize=porter)')
# some sample data
samples = {'insideTOP':
{'label':'Inside',
'family':'TOP',
'summary':'The Inside TOP places Input1 inside Input2.'
},
'inTOP':
{'label':'In',
'family':'TOP',
'summary':'The In TOP is used to create a TOP input.'
},
'fileinSOP':
{'label':'File In',
'family':'SOP',
'summary':'The File In SOP allows you to read a file'
}
}
# fill db with those values
for operator in samples.keys():
opDescr = samples[operator]
con.executescript("insert into operators (family, operator, label, summary) values ('{0}','{1}','{2}','{3}');".format(opDescr['family'],operator,opDescr['label'],opDescr['summary']))
with following columns
+--------+-----------+------------+----------------------------------------------+
| family | operator | label | summary |
+--------+-----------+------------+----------------------------------------------+
| TOP | insideTOP | Inside | The Inside TOP places Input1 inside Input2.|
| TOP | inTOP | In | The In TOP is used to create a TOP input. |
| SOP | fileinSOP | File In | The File In SOP allows you to read a file |
+--------+-----------+------------+----------------------------------------------+
an example query is:
# query the db
query = "select operator from operators where operators match 'operator:In*' or operators match 'label:In*' order by family, bm25(operators)"
result = con.execute(query)
for row in result:
print(row)
And as a result I get
fileinSOP
insideTOP
inTOP
For this particular case though, I'd actually like the 'inTOP' to appear before the 'insideTOP' as the label is a perfect match.
What would be a good technique to be able to massage these results the way I'd like them?
Thank you very much
Markus

maybe you can put your order rule in the question.
If you use bm25 to order your results, you can't achieve the result you want
I suggest you that you can use your custom rank function, like below sql:
query = "select operator from operators where operators match 'operator:In*' or operators match 'label:In*' order by myrank(family, operators)"
define a custom rank function is very easy in fts5, you can follow the guide in the fts5 website.
if you also want bm25 result as a rank score, you can get the score in the rank method can calculate your final score.

How to pass List of strings from Cucumber Scenario

I need to pass the List of strings from cucumber scenario which works fine as below
Scenario Outline: Verify some scenario
Given something
When user do something
Then user should have some "<data>"
Examples: Some example
|data|
|Test1, Test2, Test3, Test4|
In the step definition I use List to retrieve the values of something variable.
But when one of the value of data variable contains comma(,) e.g. Tes,t4 it becomes complex,since it considers "Tes" and "t4" as two different values
Examples: Some example
|something|
|Test1, Test2, Test3, Tes,t4|
So is there any escape character that i can use or is there is there any other way to handle this situation

Found an easy way. Please see the below steps.
Here is my feature file.
Here is the corresponding code to map feature step with code.
Oh yes. Result is important. You can see the debug view.

This should work for you:
Scenario: Verify some scenario
Given something
When user do something
Then user should have following
| Test1 |
| Test2 |
| Test3 |
| Tes,t4|
In Step definitions
Then("^user should have following$")
public void user_should_have_following(List<String> testData) throws Throwable {
#TODO user your test data as desired
}

In Transformer of TypeRegistryConfigurer, you can do this
#Override
public Object transform(String s, Type type) {
if(StringUtils.isNotEmpty(s) && s.startsWith("[")){
s = s.subSequence(1, s.length() - 1).toString();
return Arrays.array(s.split(","));
}
return objectMapper.convertValue(s, objectMapper.constructType(type));
}

Try setting the Examples in a column, like this:
| data |
| Test1 |
| Test2 |
| Test3 |
| Tes,t4 |
This will run the scenario 4 times, expecting 'something' to change to the next value. First 'Test1', then 'Test2', etc.
In the step definition you can use that data like so:
Then(/^user should have some "([^"]*)"$/) do |data|
puts data
end
If you want to use |Test1, Test2, Test3, Tes,t4|, change the ',' to ';' ex: |Test1; Test2; Test3; Tes,t4| and in the step definition split the data:
data.split("; ") which results in ["test1", "test2", "test3", "te,st"]
Converting the data to a List (in Java):
String test = "test1; test2; test3; tes,t4";
String[] myArray = test.split("; ");
List<String> myList = new ArrayList<>();
for (String str : myArray) {
myList.add(str);
}
System.out.print(myList);
More on this here

Examples:
Colors
color-count
Red, Green
5
Yellow
8
def function("{colors}"):
context.object.colors = list(colors.split(","))
for color in context.object.colors:
print(color)

Don't put the data in your scenario. You gain very little from it, and it creates loads of problems. Instead give your data a name and use the name in the Then of your scenario
e.g.
Then the user should see something
Putting data and examples in scenarios is mostly pointless. The following apply
The data will be a duplication of what should be produced
The date is prone to typos
When the scenario fails it will be difficult to know if the code is wrong (its producing the wrong data) or the scenario is wrong (you've typed in the wrong data)
Its really hard to express complex data accurately
Nobody is really going to read your scenario carefully enough to ensure the data is accurate

Pycassa: how to query parts of a Composite Type

Basically I'm asking the same thing as in this question but for the Python Cassandra library, PyCassa.
Lets say you have a composite type storing data like this:
[20120228:finalscore] = '31-17'
[20120228:halftimescore]= '17-17'
[20120221:finalscore] = '3-14'
[20120221:halftimescore]= '3-0'
[20120216:finalscore] = '54-0'
[20120216:halftimescore]= '42-0'
So, I know I can easily slice based off of the first part of the composite type by doing:
>>> cf.get('1234', column_start('20120216',), column_finish('20120221',))
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
But if I only want the finalscore, I would assume I could do:
>>> cf.get('1234', column_start('20120216', 'finalscore'),
column_finish('20120221', 'finalscore'))
To get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0')])
But instead, I get:
OrderedDict([((u'20120216', u'finalscore'), u'54-0'),
((u'20120216', u'halftimescore'), u'42-0')])
Same as the 1st call.
Am I doing something wrong? Should this work? Or is there some syntax using the cf.get(... columns=[('20120216', 'finalscore')]) ? I tried that too and got an exception.
According to http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1, I should be able to do something like this...
Thanks

If know all the components of the composite column then you should the 'columns' option:
cf.get('1234', columns=[('20120216', 'finalscore')])
You said you got an error trying to do this, but I would suggest trying again. It works fine for me.
When you are slicing composite columns you need to think about how they are sorted. Composite columns sort starting first with the left most component, and then sorting each component toward the right. So In your example the columns would look like this:
+------------+---------------+------------+---------------+------------+----------------+
| 20120216 | 20120216 | 20120221 | 20120221 | 20120228 | 20120228 |
| finalscore | halftimescore | finalscore | halftimescore | finalscore | halftimescore |
+------------+---------------+------------+---------------+------------+----------------+
Thus when you slice from ('20120216', 'finalscore') to ('20120221', 'finalscore') you get both values for '20120216'. To make your query work as you want it to you could change the column_finish to ('20120216', 'halftimescore').

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Search/replace in Kusto - azure

Related

How to extract specific sub directory names from URL

How to replace repeated words with single word

Using Match in a sqlite fts5 query but need more control over ranking?

How to pass List of strings from Cucumber Scenario

Pycassa: how to query parts of a Composite Type

Categories

Resources