How do I calculate statistical mode in SPARQL

How do I calculate statistical mode in SPARQL - subquery

I am trying to calculate the statistical mode (value that occurs with highest frequency) of a dataset using SPARQL.
I can generate a list of data values and their frequencies like so:
SELECT (COUNT(?o) AS ?no) ?o
WHERE {?s ?p ?o
FILTER isLiteral(?o)
}
GROUP BY ?o ORDER by DESC(?no)
results like so:
| 410 | "yes"^^<http://www.w3.org/2001/XMLSchema#string>
| 19 | "true"^^<http://www.w3.org/2001/XMLSchema#string>
| 12 | "Offical"^^<http://www.w3.org/2001/XMLSchema#string> ...
However, I just want the first line of data i.e. the value and frequency of the most common object value in the dataset.
I have tried using MAX like so:
SELECT (MAX(?no) AS ?maxNo)
{
SELECT (COUNT(?o) AS ?no) ?o
WHERE {?s ?p ?o
FILTER isLiteral(?o)
}
GROUP BY ?o ORDER by DESC(?no)
}
and can get the count back like so:
---------
| maxNo |
=========
| 410 |
---------
but what I want to get back is both the count of the most frequently occuring data value and what that data value is like so:
| 410 | "yes"^^<http://www.w3.org/2001/XMLSchema#string>
I have tried binding ?o in the sub-query and adding ?o to the outer SELECT but both give me syntax errors.
What else can I try?
Thanks for reading.

ORDER-LIMIT, also known as "top n", should get you the answer you are looking for.
SELECT (COUNT(?o) AS ?no) ?o
WHERE {
?s ?p ?o
FILTER isLiteral(?o)
}
GROUP BY ?o
ORDER by DESC(?no)
LIMIT 1

Related

How can I combine duplicates of 1 column, then have multiple results in the same row of another column?

I am very new to kql, and i am stuck on this query. I am looking to have query display which users have had sign-ins from different states. I created this query, but i do not know how to count the results in the column "names".
SigninLogs
| project tostring(LocationDetails.state), UserDisplayName
| extend p =pack( 'Locations', LocationDetails_state)
| summarize names = make_set(p) by UserDisplayName
This generates a column "names" with a row like so:
[{"Locations":"Arkansas"},{"Locations":"Iowa"},{"Locations":""}]
Here is a simple query that grabs all sign-ins from users and another column with the locations.
SigninLogs
| where ResultType == "0"
| summarize by UserDisplayName, tostring(LocationDetails.state)
Is there a way to combine the duplicates of users column, and then display each location in the second? If so, could i count each location in order to filter by where location is > 1?

I am looking to have query display which users have had sign-ins from different states
Assuming I understood your question correctly, this could work (using array_length()):
SigninLogs
| project State = tostring(LocationDetails.state), UserDisplayName
| summarize States = make_set(State) by UserDisplayName
| where array_length(States) > 1 // filter users who had sign-ins from *more than 1 state*

How can i do a "GROUP BY WITH ROLLUP" in Kusto?

In T-SQL, when grouping results, you can also get a running total row when specifying "WITH ROLLUP".
How can i achieve this in Kusto? So, consider the following query:
customEvents | summarize counter = count() by name
The query above gives me a list of event names, and how often they occurred. This is what i need, but i also want a row with the running total (the count of all events).
It feels like there should be an easy way to achieve this, but i havent found anything in the docs ...

You can write 2 queries, the first query is used to count the number of each events, the second query is used to count the numbers of all the events. Then use the union operator to join them.
The query like below:
customEvents
| count
| extend name = "total",counter=Count
| project name,counter
| union
(customEvents
| summarize counter = count() by name)
Test result is as below:

How to do negation for 'CONTAINS'

I have Cassandra table with one column defined as set.
How can I achieve something like this:
SELECT * FROM <table> WHERE <set_column_name> NOT CONTAINS <value>
Proper secondary index in was already created.

From the documentation:
SELECT select_expression FROM keyspace_name.table_name WHERE
relation AND relation ... ORDER BY ( clustering_column ( ASC | DESC
)...) LIMIT n ALLOW FILTERING
then later:
relation is:
column_name op term
and finally:
op is = | < | > | <= | > | = | CONTAINS | CONTAINS KEY
So there's no native way to perform such query. You have to workaround by designing a new table to specifically satisfy this query.

Azure Application Insights Query - How to calculate percentage of total

I'm trying to create a row in an output table that would calculate percentage of total items:
Something like this:
ITEM | COUNT | PERCENTAGE
item 1 | 4 | 80
item 2 | 1 | 20
I can easily get a table with rows of ITEM and COUNT, but I can't figure out how to get total (5 in this case) as a number so I can calculate percentage in column %.
someTable
| where name == "Some Name"
| summarize COUNT = count() by ITEM = tostring( customDimensions.["SomePar"])
| project ITEM, COUNT, PERCENTAGE = (C/?)*100
Any ideas? Thank you.

It's a bit messy to create a query like that.
I've done it bases on the customEvents table in AI. So take a look and see if you can adapt it to your specific situation.
You have to create a table that contains the total count of records, you then have to join this table. Since you can join only on a common column you need a column that has always the same value. I choose appName for that.
So the whole query looks like:
let totalEvents = customEvents
// | where name contains "Opened form"
| summarize count() by appName
| project appName, count_ ;
customEvents
// | where name contains "Opened form"
| join kind=leftouter totalEvents on appName
| summarize count() by name, count_
| project name, totalCount = count_ , itemCount = count_1, percentage = (todouble(count_1) * 100 / todouble(count_))
If you need a filter you have to apply it to both tables.
This outputs:

It is not even necessary to do a join or create a table containing your totals
Just calculate your total and save it in a let like so.
let totalEvents = toscalar(customEvents
| where timestamp > "someDate"
and name == "someEvent"
| summarize count());
then you can simply add a row to your next table, where you need the percentage calcualtion by doing:
| extend total = totalEvents
This will add a new column to your table filled with the total you calculated.
After that you can calculate the percentages as described in the other two answers.
| extend percentages = todouble(count_)*100/todouble(total)
where count_ is the column created by your summarize count() which you presumably do before adding the percentages.
Hope this also helps someone.

I think following is more intuitive. Just extend the set with a dummy property and do a join on that...
requests
| summarize count()
| extend a="b"
| join (
requests
| summarize count() by name
| extend a="b"
) on a
| project name, percentage = (todouble(count_1) * 100 / todouble(count_))

This might work too:
someTable
| summarize count() by item
| as T
| extend percent = 100.0*count_/toscalar(T | summarize sum(count_))
| sort by percent desc
| extend row_cumsum(percent)

postgres join list with $ delimiter

From these tables:
select group, ids
from some.groups_and_ids;
Result:
group | group_ids
---+----
winners | 1$4
losers | 4
others | 2$3$4
and:
select id,name from some.ids_and_names;
id | name
---+----
1 | bob
2 | robert
3 | dingus
4 | norbert
How would you go about returning something like:
winners | bob, norbert
losers | norbert
others | robert, dingus, norbert

with normalized (group_name, id) as (
select group_name, unnest(string_to_array(group_ids,'$')::int[])
from groups_and_ids
)
select n.group_name, string_agg(p.name,',' order by p.name)
from normalized n
join ids_and_names p on p.id = n.id
group by n.group_name;
The first part (the common table expression) normalizes your broken table design by creating a proper view on the groups_and_ids table. The actual query then joins the ids_and_names table to the normalized version of your groups and the aggregates the names again.
Note I renamed group to group_name because group is a reserved keyword.
SQLFiddle: http://sqlfiddle.com/#!15/2205b/2

Is it possible to redesign your database? Putting all the group_ids into one column makes life hard. If your table was e.g.
group | group_id
winners | 1
winners | 4
losers | 4
etc. this would be trivially easy. As it is, the below query would do it, although I hesitated to post it, since it encourages bad database design (IMHO)!
p.s. I took the liberty of renaming some columns, because they are reserved words. You can escape them, but why make life difficult for yourself?
select group_name,array_to_string(array_agg(username),', ') -- array aggregation and make it into a string
from
(
select group_name,theids,username
from ids_and_names
inner join
(select group_name,unnest(string_to_array(group_ids,'$')) as theids -- unnest a string_to_array to get rows
from groups_and_ids) i
on i.theids = cast(id as text)) a
group by group_name

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I calculate statistical mode in SPARQL - subquery

ORDER-LIMIT, also known as "top n", should get you the answer you are looking for. SELECT (COUNT(?o) AS ?no) ?o WHERE { ?s ?p ?o FILTER isLiteral(?o) } GROUP BY ?o ORDER by DESC(?no) LIMIT 1

Related

How can I combine duplicates of 1 column, then have multiple results in the same row of another column?

How can i do a "GROUP BY WITH ROLLUP" in Kusto?

How to do negation for 'CONTAINS'

Azure Application Insights Query - How to calculate percentage of total

postgres join list with $ delimiter

Categories

Resources