How to check backfill property of materialized view after its creation in database? - azure

I have created the materialized view in ADX with backfill property set . If I have to check the backfill property after its creation, How can I check it using the kusto command .
Example :
.create async ifnotexists materialized-view with (**backfill=true, docString="Asset Trends",effectiveDateTime=datetime(2022-06-08)**) AssetTrend on table Variables {
Variables | summarize Normal = countif(value<=1), CheckSUM = countif(value>1 and value<=250), OutofSpecification = countif(value>250 and value<=500), MaintenanceRequired = countif(value>500 and value<=750), Failure = countif(value>750 and value<=1000) by bin(timestamp,1s) , model, objectId, tenantId, variable }

Update (following Yifat's comment):
.show materialized-view MyMV
| project EffectiveDateTime
EffectiveDateTime
2022-08-29T11:25:48.2667521Z
Search for the relevant ClientActivityId and then run this:
.show commands
| where ClientActivityId == ...
| project ResourcesUtilization.ScannedExtentsStatistics
| evaluate bag_unpack(ResourcesUtilization_ScannedExtentsStatistics)
| distinct *
MaxDataScannedTime
MinDataScannedTime
ScannedExtentsCount
ScannedRowsCount
TotalExtentsCount
TotalRowsCount
2022-08-29T11:25:52.0952791Z
2022-08-29T11:25:48.2667522Z
8
120000000
8
120000000
Here is the information from the table the MV was based on:
.show table r100k details
| project TotalExtents, TotalRowCount, MinExtentsCreationTime, MaxExtentsCreationTime
TotalExtents
TotalRowCount
MinExtentsCreationTime
MaxExtentsCreationTime
8
120000000
2022-08-29T11:25:48.2667522Z
2022-08-29T11:25:52.0952791Z

Related

Grafana azure log analytics transfer query from logs

I have this query that works in Azure logs when i set the scope to the specific application insights I want to use
let usg_events = dynamic(["*"]);
let mainTable = union pageViews, customEvents, requests
| where timestamp > ago(1d)
| where isempty(operation_SyntheticSource)
| extend name =replace("\n", "", name)
| where '*' in (usg_events) or name in (usg_events)
;
let queryTable = mainTable;
let cohortedTable = queryTable
| extend dimension =tostring(client_CountryOrRegion)
| extend dimension = iif(isempty(dimension), "<undefined>", dimension)
| summarize hll = hll(user_Id) by tostring(dimension)
| extend Users = dcount_hll(hll)
| order by Users desc
| serialize rank = row_number()
| extend dimension = iff(rank > 5, 'Other', dimension)
| summarize merged = hll_merge(hll) by tostring(dimension)
| project ["Country or region"] = dimension, Counts = dcount_hll(merged);
cohortedTable
but trying to use the same in grafana just gives an error.
"'union' operator: Failed to resolve table expression named 'pageViews'"
Which is the same i get in azure logs if i dont set the scope to the specific application insights resource. So my question is. how do i make it so grafana targets this specific scope inside the logs? The query jsut gets the countries of the users that log in
As far as I know, Currently, there is no option/feature to add Scope in Grafana.
The Scope is available only in the Azure Log Analytics Workspace.
If you want the Feature/Resolution, please raise a ticket in Grafana Community where all the issues are officially addressed.

how to add more than one dimension table while creating a materialized view in Azure Data Explorer

I am trying to create a materialized view with 2 dimension tables, but getting error with the below syntax,
.create async materialized-view with (backfill=true, dimensionTables = 'SalesUserTable','ProductTable') SalesSummary on table PurchaseTable
{
PurchaseTable | join kind = inner SalesUserTable on $left.SalesUserId == $right.SalesUserID
| join kind = inner ProductTable on $left.ProductID == $right.ProductID
| extend TotalPrice = (Quantity*Price)
| summarize
TotalQuantity = sum(Quantity),
TotalPrice = sum(TotalPrice)
by SalesUserId, ProductID, SalesUserName=Name, ProductName=Name1
}
Please help me with the right syntax to add multiple dimension table
Put table names in a single set of quotes:
.create async materialized-view with (backfill=true, dimensionTables = 'SalesUserTable, ProductTable') SalesSummary on table PurchaseTable

How to cache subquery result in WITH clause in Spark SQL

I wonder if Spark SQL support caching result for the query defined in WITH clause.
The Spark SQL query is something like this:
with base_view as
(
select some_columns from some_table
WHERE
expensive_udf(some_column) = true
)
... multiple query join based on this view
While this query works with Spark SQL, I noticed that the UDF were applied to the same data set multiple times.
In this use case, the UDF is very expensive. So I'd like to cache the query result of base_view so the subsequent queries would benefit from the cached result.
P.S. I know you can create and cache a table with the given query and then reference it in the subqueries. In this specific case, though, I can't create any tables or views.
That is not possible. The WITH result cannot be persisted after execution or substituted into new Spark SQL invocation.
The WITH clause allows you to give a name to a temporary result set so it ca be reused several times within a single query. I believe what he's asking for is a materialized view.
This can be done by excuting several sql query.
-- first cache sql
spark.sql("
CACHE TABLE base_view as
select some_columns
from some_table
WHERE
expensive_udf(some_column) = true")
-- then use
spark.sql("
... multiple query join based on this view
")
Not sure if you are still interested in the solution, but the following is a workaround to accomplish the same:-
spark.sql("""
| create temp view my_view
| as
| WITH base_view as
| (
| select some_columns
| from some_table
| WHERE
| expensive_udf(some_column) = true
| )
| SELECT *
| from base_view
""");
spark.sql("""CACHE TABLE my_view""");
Now you can use the my_view temp view to join to other tables as shown below-
spark.sql("""
| select mv.col1, t2.col2, t3.col3
| from my_view mv
| join tab2 t2
| on mv.col2 = t2.col2
| join tab3 t3
| on mv.col3 = t3.col3
""");
Remember to uncache the view after using-
spark.sql("""UNCACHE TABLE my_view""");
Hope this helps.

How to use a filter in subselect

I want to perform a subselect on a related set of data. That subdata needs to be filtered using data from the main query:
customEvents
| extend envId = tostring(customDimensions.EnvironmentId)
| extend organisation = tostring(customDimensions.OrganisationName)
| extend version = tostring(customDimensions.Version)
| extend app = tostring(customDimensions.Appname)
| where customDimensions.EventName contains "ApiSessionStartStart"
| extend dbInfo = toscalar(
customEvents
| extend dbInfo = tostring(customDimensions.dbInfo)
| extend serverEnvId = tostring(customDimensions.EnvironmentId)
| where customDimensions.EventName == "ServiceSessionStart" or customDimensions.EventName == "ServiceSessionContinuation"
| where serverEnvId = envId // This gives and error
| project dbInfo
| take 1)
| order by timestamp desc
| project timestamp, customDimensions.OrganisationName, customDimensions.Version, customDimensions.onBehalfOf, customDimensions.userId, customDimensions.Appname, customDimensions.apiKey, customDimensions.remoteIp, session_Id , dbInfo, envId
The above query results in an error:
Failed to resolve entity 'envId'
How can I filter the data in the subselect based on the field envId in the main query?
i believe you'd need to use join instead, where you'd join to get that value from the second query
docs for join: https://docs.loganalytics.io/docs/Language-Reference/Tabular-operators/join-operator
the left hand side of the join is your "outer" query, and the right hand side of the join would be that "inner" query, though instead of doing take 1, you'd probably do a simpler query that just gets distinct values of serverEnvId, dbInfo

How many events are stored in my PredictionIO event server?

I imported an unknown number of events into my PIO eventserver and now I want to know that number (in order to measure and compare recommendation engines). I could not find an API for that, so I had a look at the MySQL database my server uses. I found two tables:
mysql> select count(*) from pio_event_1;
+----------+
| count(*) |
+----------+
| 6371759 |
+----------+
1 row in set (8.39 sec)
mysql> select count(*) from pio_event_2;
+----------+
| count(*) |
+----------+
| 2018200 |
+----------+
1 row in set (9.79 sec)
Both tables look very similar, so I am still unsure.
Which table is relevant? What is the difference between pio_event_1 and pio_event_2?
Is there a command or REST API where I can look up the number of stored events?
You could go through the spark shell, described in the troubleshooting docs
Launch the shell with
pio-shell --with-spark
Then find all events for your app and count them
import io.prediction.data.store.PEventStore
PEventStore.find(appName="MyApp1")(sc).count
You could also filter to find different subsets of events by passing more parameters to find. See the api docs for more details. The LEventStore is also an option
Connect to your database
\c db_name
List tables
\dt;
Run query
select count(*) from pio_event_1;
PHP
<?php
$dbconn = pg_connect("host=localhost port=5432 dbname=db_name user=postgres");
$result = pg_query($dbconn, "select count(*) from pio_event_1");
if (!$result) {
echo "An error occurred.\n";
exit;
}
// Not the best way, but output the total number of events.
while ($row = pg_fetch_row($result)) {
echo '<P><center>'.number_format($row[0]) .' Events</center></P>';
} ?>

Resources