SPARQL using subquery with limit

SPARQL using subquery with limit - nested

I am developing a java application that uses ARQ to execute SPARQL queries using a Fuseki endpoint over TDB.
The application needs a query that returns the place of birth of each person and other person that was born in the same place.
To start, I wrote this SPARQL query that returns person_ids and the place of birth of each person.
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 10
----------------------------------
| person_id | place_of_birth |
==================================
| fb:m.01vtj38 | "El Centro"#en |
| fb:m.01vsy7t | "Brixton"#en |
| fb:m.09prqv | "Pittsburgh"#en |
----------------------------------
After that, I added a subquery (https://jena.apache.org/documentation/query/sub-select.html) adding other person who was born there, but I get more than one person related and I only need one.
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth ?other_person_id
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
{
select ?other_person_id
where {
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
}
}
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 10
---------------------------------------------------
| person_id | place_of_birth | other_person_id |
===================================================
| fb:m.01vtj38 | "El Centro"#en | fb:m.01vtj38 |
| fb:m.01vtj38 | "El Centro"#en | fb:m.01vsy7t |
| fb:m.01vtj38 | "El Centro"#en | fb:m.09prqv |
---------------------------------------------------
I have tried to add a LIMIT 1 subquery but it seems that does not work ( the query is executed but never ends )
prefix fb: <http://rdf.freebase.com/ns/>
prefix fn: <http://www.w3.org/2005/xpath-functions#>
select ?person_id ?place_of_birth ?other_person_id
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
{
select ?other_person_id
where {
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
}
LIMIT 1
}
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 3
Is there a way to return only one result in the subquery, or can I not do that using SPARQL.

You can use limits with subqueries
You can use limits in subqueries. Here's an example:
select ?x ?y where {
values ?x { 1 2 3 4 }
{
select ?y where {
values ?y { 5 6 7 8 }
}
limit 2
}
}
limit 5
---------
| x | y |
=========
| 1 | 5 |
| 1 | 6 |
| 2 | 5 |
| 2 | 6 |
| 3 | 5 |
---------
As you can see, you get two values from the subquery (5 and 6), and these are combined with the bindings from the outer query, from which we get five rows in total (because of the limit).
Subqueries are evaluated innermost first
However, keep in mind that subqueries are evaluated from the innermost first, to the outermost. That means that in your query,
select ?person_id ?place_of_birth ?other_person_id
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
{
select ?other_person_id
where {
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
}
LIMIT 1
}
FILTER (langMatches(lang(?place_of_birth),"en"))
}
LIMIT 3
you are finding one match for
?place_of_birth_id fb:location.location.people_born_here ?other_person_id .
and passing the ?other_person_id binding out into the outer query. The rest of the outer query doesn't use ?other_person_id, though, so it doesn't really have any effect on the results.
What to do instead
If you need only one other person
The application needs a query that returns the place of birth of each person and other person that was born in the same place.
Conceptually, you could look at this as picking a person, finding their place of birth, and sampling one more person from the people born in that place. You can actually write the query like that, too:
select ?person_id ?place_of_birth (sample(?other_person_idx) as ?other_person_id)
where {
?person_id fb:type.object.type fb:people.person .
?person_id fb:people.person.place_of_birth ?place_of_birth_id .
?place_of_birth_id fb:type.object.name ?place_of_birth .
FILTER (langMatches(lang(?place_of_birth),"en"))
?place_of_birth_id fb:location.location.people_born_here ?other_person_idx .
filter ( ?other_person_idx != ?person_id )
}
group by ?person_id ?place_of_birth
If you need more than one
This is a much trickier problem if you need more than one "other result" for each result. That's the problem in Nested queries in sparql with limits. There's an approach in How to limit SPARQL solution group size? that can be used for this.

Related

how to handle greedydata to take custom word in a log line while creating grok pattern

01-10-2022 14:05:11.584 INFO - Destination IP:0:0:0:0:0:0:0:1 | Source System IP:0:0:0:0:0:0:0:1 | BrowserName:Chrome | BrowserVersion:105 | requestURI:/dashboard | Feature name:Dashboard | Application:null | SubFeature name:Line Check | UserId:tushar | ApiCalled:/ruambot/api/getGraph() | ApiStatus:Success | Login Time:01-10-2022 13:46:42
how to handle the word sub-feature name in greedy data

with grok filter split message into two parts and then apply kv filter for second part of the message
filter {
grok {
match => ["message", "%{GREEDYDATA:message_part1}\-%{GREEDYDATA:message_part2}"]
}
kv {
source => "message_part2"
field_split => "|"
value_split => ":"
}
}

Parse `key1=value1 key2=value2` in Kusto

I'm running Cilium inside an Azure Kubernetes Cluster and want to parse the cilium log messages in the Azure Log Analytics. The log messages have a format like
key1=value1 key2=value2 key3="if the value contains spaces, it's wrapped in quotation marks"
For example:
level=info msg="Identity of endpoint changed" containerID=a4566a3e5f datapathPolicyRevision=0
I couldn't find a matching parse_xxx method in the docs (e.g. https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parsecsvfunction ). Is there a possibility to write a custom function to parse this kind of log messages?

Not a fun format to parse... But this should work:
let LogLine = "level=info msg=\"Identity of endpoint changed\" containerID=a4566a3e5f datapathPolicyRevision=0";
print LogLine
| extend KeyValuePairs = array_concat(
extract_all("([a-zA-Z_]+)=([a-zA-Z0-9_]+)", LogLine),
extract_all("([a-zA-Z_]+)=\"([a-zA-Z0-9_ ]+)\"", LogLine))
| mv-apply KeyValuePairs on
(
extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
| summarize dict=make_bag(p)
)
The output will be:
| print_0 | dict |
|--------------------|-----------------------------------------|
| level=info msg=... | { |
| | "level": "info", |
| | "containerID": "a4566a3e5f", |
| | "datapathPolicyRevision": "0", |
| | "msg": "Identity of endpoint changed" |
| | } |
|--------------------|-----------------------------------------|

With the help of Slavik N, I came with a query that works for me:
let containerIds = KubePodInventory
| where Namespace startswith "cilium"
| distinct ContainerID
| summarize make_set(ContainerID);
ContainerLog
| where ContainerID in (containerIds)
| extend KeyValuePairs = array_concat(
extract_all("([a-zA-Z0-9_-]+)=([^ \"]+)", LogEntry),
extract_all("([a-zA-Z0-9_]+)=\"([^\"]+)\"", LogEntry))
| mv-apply KeyValuePairs on
(
extend p = pack(tostring(KeyValuePairs[0]), tostring(KeyValuePairs[1]))
| summarize JSONKeyValuePairs=parse_json(make_bag(p))
)
| project TimeGenerated, Level=JSONKeyValuePairs.level, Message=JSONKeyValuePairs.msg, PodName=JSONKeyValuePairs.k8sPodName, Reason=JSONKeyValuePairs.reason, Controller=JSONKeyValuePairs.controller, ContainerID=JSONKeyValuePairs.containerID, Labels=JSONKeyValuePairs.labels, Raw=LogEntry

Azure Log Analytics parse json

I have a query which results in a few columns but one of the columns, I am parsing JSON to retrieve the object value but there are multiple entries in it I want each entry in JSON to retrieve in a loop and display.
Below is the query,
let forEach_table = AzureDiagnostics
| where Parameters_LOAD_GROUP_s contains 'LOAD(AUTO)';
let ParentPlId = '';
let ParentPlName = '';
let commonKey = '';
forEach_table
| where Category == 'PipelineRuns'
| extend pplId = parse_json(Predecessors_s)[0].PipelineRunId, pplName = parse_json(Predecessors_s)[0].PipelineName
| extend dbMapName = tostring(parse_json(Parameters_getMetadataList_s)[0].dbMapName)
| summarize count(runId_g) by Resource, Status = status_s, Name=pipelineName_s, Loadgroup = Parameters_LOAD_GROUP_s, dbMapName, Parameters_LOAD_GROUP_s, Parameters_getMetadataList_s, pipelineName_s, Category, CorrelationId, start_t, end_t, TimeGenerated
| project ParentPL_ID = ParentPlId, ParentPL_Name = ParentPlName, LoadGroup_Name = Loadgroup, Map_Name = dbMapName, Status,Metadata = Parameters_getMetadataList_s, Category, CorrelationId, start_t, end_t
| project-away ParentPL_ID, ParentPL_Name, Category, CorrelationId
here in the above code,
extend dbMapName = tostring(parse_json(Parameters_getMetadataList_s)[0].dbMapName)
I am retrieving 0th element as default but I would like to retrieve all elements in sequence can somebody suggest me how to achieve this.

bag_keys() is just what you need.
For example, take a look at this query:
datatable(myjson: dynamic) [
dynamic({"a": 123, "b": 234, "c": 345}),
dynamic({"dd": 123, "ee": 234, "ff": 345})
]
| project keys = bag_keys(myjson)
Its output is:
|---------|
| keys |
|---------|
| [ |
| "a", |
| "b", |
| "c" |
| ] |
|---------|
| [ |
| "dd", |
| "ee", |
| "ff" |
| ] |
|---------|
If you want to have every key in a separate row, use mv-expand, like this:
datatable(myjson: dynamic) [
dynamic({"a": 123, "b": 234, "c": 345}),
dynamic({"dd": 123, "ee": 234, "ff": 345})
]
| project keys = bag_keys(myjson)
| mv-expand keys
The output of this query will be:
|------|
| keys |
|------|
| a |
| b |
| c |
| dd |
| ee |
| ff |
|------|

extend and mv-expand methods help in resolving this kind of scenario.
Solution:
extend rows = parse_json(Parameters_getMetadataList_s)
| mv-expand rows
| project Parameters_LOAD_GROUP_s,rows

grok pattern match not working

I have this pattern and I want the grok filter for this:
24 May 2016 23:04:03,003 [] [] [] INFO [listenerContainer-35] com.newworld.mmp.orderlist.NewDataUtil - | 1464048002998 | 201605233157123319 | Account | 67578625
09896 | DHW | 2016-05-23T23:59:56.621Z | 2016-05-24T00:00:02.676Z | STARTED PROCESSING
I wrote the pattern but it is incomplete:
%{MONTHDAY} %{MONTH} 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) %{DATA:junk} %{DATA:junk} %{DATA:junk} %{LOGLEVEL:level} %{DATA:junk1} %{JAVACLASS:class}
The %{POSINT:mynewint} or %{NUMBER:mynewint} for the 1464048002998 is not working.
Like %{MONTHDAY} %{MONTH} 20%{YEAR} %{HOUR}:?%{MINUTE}(?::?%{SECOND}) %{DATA:junk} %{DATA:junk} %{DATA:junk} %{LOGLEVEL:level} %{DATA:junk1} %{JAVACLASS:class}- | %{POSINT:mynewint}
I need help on this and the complete grok expression?

Your Log line:
24 May 2016 23:04:03,003 [] [] [] INFO [listenerContainer-35] com.newworld.mmp.orderlist.NewDataUtil - | 1464048002998 | 201605233157123319 | Account | 6757862509896 | DHW | 2016-05-23T23:59:56.621Z | 2016-05-24T00:00:02.676Z | STARTED PROCESSING
SAMPLE GROK PATTERN that matches your log record:
%{MONTHDAY:MonthDay} %{MONTH:Month} %{YEAR:Year} %{TIME:Time} \[] \[] \[] %{LOGLEVEL:LogLevel} %{NOTSPACE:ListenerContainer} %{JAVACLASS:JavaClass} - \| %{NUMBER:Number1} \| %{NUMBER:Number2} \| %{WORD:Account} \| %{NUMBER:Number3} \| %{WORD:DHW} \| %{TIMESTAMP_ISO8601:Timestamp1} \| %{TIMESTAMP_ISO8601:Timestamp2} \| %{JAVALOGMESSAGE:LogMessage}
This will give output fields as follows:
MonthDay = 24
Month = May
Year = 2016
Time = 23:04:03,003
LogLevel = INFO
ListenerContainer = [listenerContainer-35]
JavaClass = com.newworld.mmp.orderlist.NewDataUtil
Number1 = 1464048002998
Number2 = 201605233157123319
Account
Number3 = 6757862509896
DHW
Timestamp1 = 2016-05-23T23:59:56.621Z
Timestamp2 = 2016-05-24T00:00:02.676Z
LogMessage = STARTED PROCESSING
You can try your own grok filters, create parse and test on the following site: http://grokconstructor.appspot.com/do/construction

Struggling to parse array notation

I have a very simple grammar to parse statements.
Here are examples of the type of statements that need be parsed:
a.b.c
a.b.c == "88"
The issue I am having is that array notation is not matching. For example, things that are not working:
a.b[0].c
a[3][4]
I hope someone can point out what I am doing wrong here. (I am testing in ANTLRWorks)
Here is the grammar (generationUnit is my entry point):
grammar RatBinding;
generationUnit: testStatement | statement;
arrayAccesor : identifier arrayNotation+;
arrayNotation: '[' Number ']';
testStatement:
(statement | string | Number | Bool )
(greaterThanAndEqual
| lessThanOrEqual
| greaterThan
| lessThan | notEquals | equals)
(statement | string | Number | Bool )
;
part: identifier | arrayAccesor;
statement: part ('.' part )*;
string: ('"' identifier '"') | ('\'' identifier '\'');
greaterThanAndEqual: '>=';
lessThanOrEqual: '<=';
greaterThan: '>';
lessThan: '<';
notEquals : '!=';
equals: '==';
identifier: Letter (Letter|Digit)*;
Bool : 'true' | 'false';
ArrayLeft: '\u005B';
ArrayRight: '\u005D';
Letter
: '\u0024' |
'\u0041'..'\u005a' |
'\u005f '|
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff'
;
Digit
: '\u0030'..'\u0039' |
'\u0660'..'\u0669' |
'\u06f0'..'\u06f9' |
'\u0966'..'\u096f' |
'\u09e6'..'\u09ef' |
'\u0a66'..'\u0a6f' |
'\u0ae6'..'\u0aef' |
'\u0b66'..'\u0b6f' |
'\u0be7'..'\u0bef' |
'\u0c66'..'\u0c6f' |
'\u0ce6'..'\u0cef' |
'\u0d66'..'\u0d6f' |
'\u0e50'..'\u0e59' |
'\u0ed0'..'\u0ed9' |
'\u1040'..'\u1049'
;
WS : [ \r\t\u000C\n]+ -> channel(HIDDEN)
;

You referenced the non-existent rule Number in the arrayNotation parser rule.
A Digit rule does exist in the lexer, but it will only match a single-digit number. For example, 1 is a Digit, but 10 is two separate Digit tokens so a[10] won't match the arrayAccesor rule. You probably want to resolve this in two parts:
Create a Number token consisting of one or more digits.
Number
: Digit+
;
Mark Digit as a fragment rule to indicate that it doesn't form tokens on its own, but is merely intended to be referenced from other lexer rules.
fragment // prevents a Digit token from being created on its own
Digit
: ...
You will not need to change arrayNotation because it already references the Number rule you created here.

Bah, waste of space. I Used Number instead of Digit in my array declaration.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SPARQL using subquery with limit - nested

Related

how to handle greedydata to take custom word in a log line while creating grok pattern

Parse `key1=value1 key2=value2` in Kusto

Azure Log Analytics parse json

grok pattern match not working

Struggling to parse array notation

Categories

Resources