Parse number out of field and put in it's own column - string

I have data that looks like the following
Public Name | Internal Name |
_____________________|_______________|
Name of object1 (#1) | 1345312 |
Name of object2 (#2) | 1387924 |
..
object2000 (#2000) | 6875238 |
And I'm hoping to parse out the (#*) into it's own column. To look like below
Public Number | Public Name | Internal Name |
______________|__________________|_______________|
(#1) | Name of object1 | 1345312 |
(#2) | Name of object2 | 1387924 |
..
(#2000) | object2000 | 6875238 |
I have absolutely no idea how I would begin to do this. Thoughts?

SELECT
[Public Number] = CASE WHEN [Public Name] LIKE '%(#%' THEN
SUBSTRING([Public Name], CHARINDEX('(', [Public Name]), 255)
ELSE '' END,
[New Public Name] = CASE WHEN [Public Name] LIKE '%(#%' THEN
RTRIM(LEFT([Public Name], CHARINDEX('(', [Public Name])-1))
ELSE [Public Name] END,
[Internal Name]
FROM dbo.table;

Related

Parse `key1=value1 key2=value2` in Kusto

I'm running Cilium inside an Azure Kubernetes Cluster and want to parse the cilium log messages in the Azure Log Analytics. The log messages have a format like
key1=value1 key2=value2 key3="if the value contains spaces, it's wrapped in quotation marks"
For example:
level=info msg="Identity of endpoint changed" containerID=a4566a3e5f datapathPolicyRevision=0
I couldn't find a matching parse_xxx method in the docs (e.g. https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parsecsvfunction ). Is there a possibility to write a custom function to parse this kind of log messages?
Not a fun format to parse... But this should work:
let LogLine = "level=info msg=\"Identity of endpoint changed\" containerID=a4566a3e5f datapathPolicyRevision=0";
print LogLine
| extend KeyValuePairs = array_concat(
extract_all("([a-zA-Z_]+)=([a-zA-Z0-9_]+)", LogLine),
extract_all("([a-zA-Z_]+)=\"([a-zA-Z0-9_ ]+)\"", LogLine))
| mv-apply KeyValuePairs on
(
extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
| summarize dict=make_bag(p)
)
The output will be:
| print_0 | dict |
|--------------------|-----------------------------------------|
| level=info msg=... | { |
| | "level": "info", |
| | "containerID": "a4566a3e5f", |
| | "datapathPolicyRevision": "0", |
| | "msg": "Identity of endpoint changed" |
| | } |
|--------------------|-----------------------------------------|
With the help of Slavik N, I came with a query that works for me:
let containerIds = KubePodInventory
| where Namespace startswith "cilium"
| distinct ContainerID
| summarize make_set(ContainerID);
ContainerLog
| where ContainerID in (containerIds)
| extend KeyValuePairs = array_concat(
extract_all("([a-zA-Z0-9_-]+)=([^ \"]+)", LogEntry),
extract_all("([a-zA-Z0-9_]+)=\"([^\"]+)\"", LogEntry))
| mv-apply KeyValuePairs on
(
extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
| summarize JSONKeyValuePairs=parse_json(make_bag(p))
)
| project TimeGenerated, Level=JSONKeyValuePairs.level, Message=JSONKeyValuePairs.msg, PodName=JSONKeyValuePairs.k8sPodName, Reason=JSONKeyValuePairs.reason, Controller=JSONKeyValuePairs.controller, ContainerID=JSONKeyValuePairs.containerID, Labels=JSONKeyValuePairs.labels, Raw=LogEntry

Azure Log Analytics parse json

I have a query which results in a few columns but one of the columns, I am parsing JSON to retrieve the object value but there are multiple entries in it I want each entry in JSON to retrieve in a loop and display.
Below is the query,
let forEach_table = AzureDiagnostics
| where Parameters_LOAD_GROUP_s contains 'LOAD(AUTO)';
let ParentPlId = '';
let ParentPlName = '';
let commonKey = '';
forEach_table
| where Category == 'PipelineRuns'
| extend pplId = parse_json(Predecessors_s)[0].PipelineRunId, pplName = parse_json(Predecessors_s)[0].PipelineName
| extend dbMapName = tostring(parse_json(Parameters_getMetadataList_s)[0].dbMapName)
| summarize count(runId_g) by Resource, Status = status_s, Name=pipelineName_s, Loadgroup = Parameters_LOAD_GROUP_s, dbMapName, Parameters_LOAD_GROUP_s, Parameters_getMetadataList_s, pipelineName_s, Category, CorrelationId, start_t, end_t, TimeGenerated
| project ParentPL_ID = ParentPlId, ParentPL_Name = ParentPlName, LoadGroup_Name = Loadgroup, Map_Name = dbMapName, Status,Metadata = Parameters_getMetadataList_s, Category, CorrelationId, start_t, end_t
| project-away ParentPL_ID, ParentPL_Name, Category, CorrelationId
here in the above code,
extend dbMapName = tostring(parse_json(Parameters_getMetadataList_s)[0].dbMapName)
I am retrieving 0th element as default but I would like to retrieve all elements in sequence can somebody suggest me how to achieve this.
bag_keys() is just what you need.
For example, take a look at this query:
datatable(myjson: dynamic) [
dynamic({"a": 123, "b": 234, "c": 345}),
dynamic({"dd": 123, "ee": 234, "ff": 345})
]
| project keys = bag_keys(myjson)
Its output is:
|---------|
| keys |
|---------|
| [ |
| "a", |
| "b", |
| "c" |
| ] |
|---------|
| [ |
| "dd", |
| "ee", |
| "ff" |
| ] |
|---------|
If you want to have every key in a separate row, use mv-expand, like this:
datatable(myjson: dynamic) [
dynamic({"a": 123, "b": 234, "c": 345}),
dynamic({"dd": 123, "ee": 234, "ff": 345})
]
| project keys = bag_keys(myjson)
| mv-expand keys
The output of this query will be:
|------|
| keys |
|------|
| a |
| b |
| c |
| dd |
| ee |
| ff |
|------|
extend and mv-expand methods help in resolving this kind of scenario.
Solution:
extend rows = parse_json(Parameters_getMetadataList_s)
| mv-expand rows
| project Parameters_LOAD_GROUP_s,rows

invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected -> Spark SQL

The error I am getting:
invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected
Spark SQL:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
val df = spark.sql(sql)
df.show(10, false)
You added s prefix which means you want the string be interpolated. It means all tokens prefixed with $ will be replaced with the local variable with the same name. From you code it looks like you do not use this feature, so you could just remove s prefix from the string:
val sql =
"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
Otherwise if you really need the interpolation you have to quote $ sign like this:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin

Understanding dequeue_rt_stack() for RT scheduling class linux

enqueue_task_rt function in ./kernel/sched/rt.c is responsible for queuing the task to the run queue. enqueue_task_rt contains call to enqueue_rt_entity which calls dequeue_rt_stack. Most part of the code seems logical but I am a bit lost because of the function dequeue_rt_stack unable to understand what it does. Can somebody tell what is the logic that I am missing or suggest some good read.
Edit: The following is the code for dequeue_rt_stack function
struct sched_rt_entity *back = NULL;
/* macro for_each_sched_rt_entity defined as
for(; rt_se; rt_se = rt_se->parent)*/
for_each_sched_rt_entity(rt_se) {
rt_se->back = back;
back = rt_se;
}
for (rt_se = back; rt_se; rt_se = rt_se->back) {
if (on_rt_rq(rt_se))
__dequeue_rt_entity(rt_se);
}
More specifically, I do not understand why there is a need for this code:
for_each_sched_rt_entity(rt_se) {
rt_se->back = back;
back = rt_se;
}
What is its relevance.
When a task is to be added to some queue, it must first be removed from the queue that it currently is on, if any.
With the group scheduler, a task is always at the lowest level of the tree, and might have multiple ancestors:
NULL
^
|
+-----parent------+
| |
| top-level group |
| |
+-----------------+
^ ^_____________
| \
+-----parent------+ +-----parent------+
| | | |
| mid-level group | | other group | ...
| | | |
+-----------------+ +-----------------+
^ ^_____________
| \
+-----parent------+ +-----------------+
| | | |
| task | | other task | ...
| | | |
+-----------------+ +-----------------+
To remove the task from the tree, it must be removed from all groups' queues, and this must be done first at the top-level group (otherwise, the scheduler might try to run an already partially-removed task). Therefore, dequeue_rt_stack uses the back pointers to constructs a list in the opposite direction:
NULL back
^ |
| V
+-parent----------+
| |
| top-level group |
| |
+----------back---+
^ | ^_____________
| V \
+-parent----------+ +-----parent------+
| | | |
| mid-level group | | other group | ...
| | | |
+----------back---+ +-----------------+
^ | ^_____________
| V \
+-parent----------+ +-----------------+
| | | |
| task | | other task | ...
| | | |
+----------back---+ +-----------------+
|
V
NULL
That back list can then be used to walk down the tree to remove the entities in the correct order.
I am a fresh man in kernel hacking. This is my first time to answer linux kernel question.
Maybe this help to you.
I read the source code. I think it maybe relates to group scheduling.
When kernel have these codes:
#ifdef CONFIG_RT_GROUP_SCHED
It represents that we can collect some schedule entities in to one schduling group.
static void enqueue_rt_entity(struct sched_rt_entity *rt_se, bool head)
{
dequeue_rt_stack(rt_se);
for_each_sched_rt_entity(rt_se)
__enqueue_rt_entity(rt_se, head);
}
Function dequeue_rt_stack(rt_se) extracts all the scheduling entities belong to the group, then add them to run queue.
Hierarchical group I/O scheduling
CFS group scheduling

How to reuse Cucumber step definition with a table for the last parameter?

This code:
Then %{I should see the following data in the "Feeds" data grid:
| Name |
| #{name} |}
And this one:
Then "I should see the following data in the \"Feeds\" data grid:
| Name |
| #{name} |"
And this:
Then "I should see the following data in the \"Feeds\" data grid:\n| Name |\n| #{name} |"
And even this:
Then <<EOS
I should see the following data in the "Feeds" data grid:
| Name |
| #{name} |
EOS
Gives me:
Your block takes 2 arguments, but the Regexp matched 1 argument.
(Cucumber::ArityMismatchError)
tests/endtoend/step_definitions/instruments_editor_steps.rb:29:in `/^the editor shows "([^"]*)" in the feeds list$/'
melomel-0.6.0/lib/melomel/cucumber/data_grid_steps.rb:59:in `/^I should see the following data in the "([^"]*)" data grid:$/'
tests/endtoend/instruments_editor.feature:11:in `And the editor shows "myFeed" in the feeds list
This one:
Then "I should see the following data in the \"Feeds\" data grid: | Name || #{name} |"
And this one:
Then "I should see the following data in the \"Feeds\" data grid:| Name || #{name} |"
Gives:
Undefined step: "I should see the following data in the "Feeds" data grid:| Name || myFeed |" (Cucumber::Undefined)
./tests/endtoend/step_definitions/instruments_editor_steps.rb:31:in `/^the editor shows "([^"]*)" in the feeds list$/'
tests/endtoend/instruments_editor.feature:11:in `And the editor shows "myFeed" in the feeds list'
I've found the answer myself:
steps %Q{
Then I should see the following data in the "Feeds" data grid:
| Name |
| #{name} |
}
NOTE ON THE ABOVE: might seem obvious, but the new line after the first '{' is soooooo important
Another way:
Given /^My basic step:$/ do |table|
#do table operation
end
Given /^My referring step:$/ do |table|
table.hashes.each do |row|
row_as_table = %{
|prop1|prop2|
|#{row[:prop1]}|#{row[:prop2]}|
}
Given %{My basic step:}, Cucumber::Ast::Table.parse(row_as_table, "", 0)
end
end
You can also write it this way, using #table
Then /^some other step$/ do
Then %{I should see the following data in the "Feeds" data grid:}, table(%{
| Name |
| #{name} |
})
end
Consider using
Given /^events with:-$/ do |table|
Given %{I am on the event admin page}
table.hashes.each do |row|
Given %{an event with:-}, Cucumber::Ast::Table.new([row]).transpose
end
end
I find that much more elegant that building up the table by hand.
events with:- gets a table like this
| Form | Element | Label |
| foo | bar | baz |
and an event with:- gets a table like
| Form | foo |
| Element | bar |
| Label | baz |

Resources