invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected -> Spark SQL - apache-spark

The error I am getting:
invalid string interpolation: `$$', `$'ident or `$'BlockExpr expected
Spark SQL:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
val df = spark.sql(sql)
df.show(10, false)

You added s prefix which means you want the string be interpolated. It means all tokens prefixed with $ will be replaced with the local variable with the same name. From you code it looks like you do not use this feature, so you could just remove s prefix from the string:
val sql =
"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin
Otherwise if you really need the interpolation you have to quote $ sign like this:
val sql =
s"""
|SELECT
| ,CAC.engine
| ,CAC.user_email
| ,CAC.submit_time
| ,CAC.end_time
| ,CAC.duration
| ,CAC.counter_name
| ,CAC.counter_value
| ,CAC.usage_hour
| ,CAC.event_date
|FROM
| xyz.command AS CAC
| INNER JOIN
| (
| SELECT DISTINCT replace(split(get_json_object(metadata_payload, '$$.configuration.name'), '_')[1], 'acc', '') AS account_id
| FROM xyz.metadata
| ) AS QCM
| ON QCM.account_id = CAC.account_id
|WHERE
| CAC.event_date BETWEEN '2019-10-01' AND '2019-10-05'
|""".stripMargin

Related

Parse `key1=value1 key2=value2` in Kusto

I'm running Cilium inside an Azure Kubernetes Cluster and want to parse the cilium log messages in the Azure Log Analytics. The log messages have a format like
key1=value1 key2=value2 key3="if the value contains spaces, it's wrapped in quotation marks"
For example:
level=info msg="Identity of endpoint changed" containerID=a4566a3e5f datapathPolicyRevision=0
I couldn't find a matching parse_xxx method in the docs (e.g. https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parsecsvfunction ). Is there a possibility to write a custom function to parse this kind of log messages?
Not a fun format to parse... But this should work:
let LogLine = "level=info msg=\"Identity of endpoint changed\" containerID=a4566a3e5f datapathPolicyRevision=0";
print LogLine
| extend KeyValuePairs = array_concat(
extract_all("([a-zA-Z_]+)=([a-zA-Z0-9_]+)", LogLine),
extract_all("([a-zA-Z_]+)=\"([a-zA-Z0-9_ ]+)\"", LogLine))
| mv-apply KeyValuePairs on
(
extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
| summarize dict=make_bag(p)
)
The output will be:
| print_0 | dict |
|--------------------|-----------------------------------------|
| level=info msg=... | { |
| | "level": "info", |
| | "containerID": "a4566a3e5f", |
| | "datapathPolicyRevision": "0", |
| | "msg": "Identity of endpoint changed" |
| | } |
|--------------------|-----------------------------------------|
With the help of Slavik N, I came with a query that works for me:
let containerIds = KubePodInventory
| where Namespace startswith "cilium"
| distinct ContainerID
| summarize make_set(ContainerID);
ContainerLog
| where ContainerID in (containerIds)
| extend KeyValuePairs = array_concat(
extract_all("([a-zA-Z0-9_-]+)=([^ \"]+)", LogEntry),
extract_all("([a-zA-Z0-9_]+)=\"([^\"]+)\"", LogEntry))
| mv-apply KeyValuePairs on
(
extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
| summarize JSONKeyValuePairs=parse_json(make_bag(p))
)
| project TimeGenerated, Level=JSONKeyValuePairs.level, Message=JSONKeyValuePairs.msg, PodName=JSONKeyValuePairs.k8sPodName, Reason=JSONKeyValuePairs.reason, Controller=JSONKeyValuePairs.controller, ContainerID=JSONKeyValuePairs.containerID, Labels=JSONKeyValuePairs.labels, Raw=LogEntry

Azure Log Analytics parse json

I have a query which results in a few columns but one of the columns, I am parsing JSON to retrieve the object value but there are multiple entries in it I want each entry in JSON to retrieve in a loop and display.
Below is the query,
let forEach_table = AzureDiagnostics
| where Parameters_LOAD_GROUP_s contains 'LOAD(AUTO)';
let ParentPlId = '';
let ParentPlName = '';
let commonKey = '';
forEach_table
| where Category == 'PipelineRuns'
| extend pplId = parse_json(Predecessors_s)[0].PipelineRunId, pplName = parse_json(Predecessors_s)[0].PipelineName
| extend dbMapName = tostring(parse_json(Parameters_getMetadataList_s)[0].dbMapName)
| summarize count(runId_g) by Resource, Status = status_s, Name=pipelineName_s, Loadgroup = Parameters_LOAD_GROUP_s, dbMapName, Parameters_LOAD_GROUP_s, Parameters_getMetadataList_s, pipelineName_s, Category, CorrelationId, start_t, end_t, TimeGenerated
| project ParentPL_ID = ParentPlId, ParentPL_Name = ParentPlName, LoadGroup_Name = Loadgroup, Map_Name = dbMapName, Status,Metadata = Parameters_getMetadataList_s, Category, CorrelationId, start_t, end_t
| project-away ParentPL_ID, ParentPL_Name, Category, CorrelationId
here in the above code,
extend dbMapName = tostring(parse_json(Parameters_getMetadataList_s)[0].dbMapName)
I am retrieving 0th element as default but I would like to retrieve all elements in sequence can somebody suggest me how to achieve this.
bag_keys() is just what you need.
For example, take a look at this query:
datatable(myjson: dynamic) [
dynamic({"a": 123, "b": 234, "c": 345}),
dynamic({"dd": 123, "ee": 234, "ff": 345})
]
| project keys = bag_keys(myjson)
Its output is:
|---------|
| keys |
|---------|
| [ |
| "a", |
| "b", |
| "c" |
| ] |
|---------|
| [ |
| "dd", |
| "ee", |
| "ff" |
| ] |
|---------|
If you want to have every key in a separate row, use mv-expand, like this:
datatable(myjson: dynamic) [
dynamic({"a": 123, "b": 234, "c": 345}),
dynamic({"dd": 123, "ee": 234, "ff": 345})
]
| project keys = bag_keys(myjson)
| mv-expand keys
The output of this query will be:
|------|
| keys |
|------|
| a |
| b |
| c |
| dd |
| ee |
| ff |
|------|
extend and mv-expand methods help in resolving this kind of scenario.
Solution:
extend rows = parse_json(Parameters_getMetadataList_s)
| mv-expand rows
| project Parameters_LOAD_GROUP_s,rows

How to create a data frame by based on the date value passed as a string in pyspark?

I have a Data set like below:
file : test.txt
149|898|20180405
135|379|20180428
135|381|20180406
31|898|20180429
31|245|20180430
135|398|20180422
31|448|20180420
31|338|20180421
I have created data frame by executing below code.
spark = SparkSession.builder.appName("test").getOrCreate()
sc = spark.sparkContext
sqlContext = SQLContext(sc)
df_transac = spark.createDataFrame(sc.textFile("test.txt")\
.map(lambda x: x.split("|")[:3])\
.map(lambda r: Row('cCode'= r[0],'pCode'= r[1],'mDate' = r[2])))
df_transac .show()
+-----+-----+----------+
|cCode|pCode| mDate|
+-----+-----+----------+
| 149| 898| 20180405 |
| 135| 379| 20180428 |
| 135| 381| 20180406 |
| 31| 898| 20180429 |
| 31| 245| 20180430 |
| 135| 398| 20180422 |
| 31| 448| 20180420 |
| 31| 338| 20180421 |
+-----+-----+----------+
my df.printSchemashow like below:
df_transac.printSchema()
root
|-- customerCode: string (nullable = true)
|-- productCode: string (nullable = true)
|-- quantity: string (nullable = true)
|-- date: string (nullable = true)
but I want to create a data frame based my input dates i.e date1="20180425" date2="20180501"
my expected output is:
+-----+-----+----------+
|cCode|pCode| mDate|
+-----+-----+----------+
| 135| 379| 20180428 |
| 31| 898| 20180429 |
| 31| 245| 20180430 |
+-----+-----+----------+
please help on this how can I achieve this.
Here is a simple filter applied to your df :
df_transac.where("mdate between '{}' and '{}'".format(date1,date2)).show()
+-----+-----+--------+
|cCode|pCode| mDate|
+-----+-----+--------+
| 135| 379|20180428|
| 31| 898|20180429|
| 31| 245|20180430|
+-----+-----+--------+

OpenCV - Thin Plate Spline

How to convert an image from one shape to other using thin plate spline in opencv python3. in c++ we have shape transformer class. in opencv python3 how can we implement it.
Thin plate spline indeed exists for opencv in python3.
You can use help function to get more info on which functions exist and how to use them like this:
>>> help(cv2.createThinPlateSplineShapeTransformer()) ## () braces matter !!
Help on ThinPlateSplineShapeTransformer object:
class ThinPlateSplineShapeTransformer(ShapeTransformer)
| Method resolution order:
| ThinPlateSplineShapeTransformer
| ShapeTransformer
| Algorithm
| builtins.object
|
| Methods defined here:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate
signature.
|
| __repr__(self, /)
| Return repr(self).
|
| getRegularizationParameter(...)
| getRegularizationParameter() -> retval
|
| setRegularizationParameter(...)
| setRegularizationParameter(beta) -> None
|
| ----------------------------------------------------------------------
| Methods inherited from ShapeTransformer:
|
| applyTransformation(...)
| applyTransformation(input[, output]) -> retval, output
|
| estimateTransformation(...)
| estimateTransformation(transformingShape, targetShape, matches) ->
None
|
| warpImage(...)
| warpImage(transformingImage[, output[, flags[, borderMode[,
borderValue]]]]) -> output
|
| ----------------------------------------------------------------------
| Methods inherited from Algorithm:
|
| clear(...)
| clear() -> None
|
| getDefaultName(...)
| getDefaultName() -> retval
|
| save(...)
| save(filename) -> None
Source

Nestled query Log analytics

Hi i'm trying to get to a log event by nestling a query in the "where" of another query. is this possible?
AzureDiagnostics
| where resource_workflowName_s == "[Workflow Name]"
| where resource_runId_s == (AzureDiagnostics | where trackedProperties_PayloadID_g == "[GUID]" | distinct resource_runId_s)
try:
AzureDiagnostics
| where resource_workflowName_s == "[Workflow Name]"
| where resource_runId_s in (
toscalar(AzureDiagnostics
| where trackedProperties_PayloadID_g == "[GUID]"
| distinct resource_runId_s))

Resources