How to Generate Counts of Elements Returned from Map Function? - couchdb

I have a map function
function (doc) {
for(var n =0; n<doc.Observations.length; n++){
emit(doc.Scenario, doc.Observations[n].Label);
}
}
the above returns the following:
{"key":"Splunk","value":"Organized"},
{"key":"Splunk","value":"Organized"},
{"key":"Splunk","value":"Organized"},
{"key":"Splunk","value":"Generate"},
{"key":"Splunk","value":"Ingest"}
I"m looking to design a reduce function that will then return the counts of the above values, something akin to:
Organized: 3
Generate: 1
Ingest: 1
My map function has to filter on my Scenario field, hence why I have it as an emitted key in the map function.
I've tried using a number of the built in reduce functions, but I end up getting count of rows, or nothing at all as the functions available don't apply.
I just need to access the counts of each of the elements that appear in the values field. Also, the values present here are representative, there could 100s of different types of values found in the values field for what that's worth.
I really appreciate the help!
Here's sample input:
{
"_id": "dummyId",
"test": "test",
"Team": "Alpha",
"CreatedOnUtc": "2019-06-20T21:39:09.5940830Z",
"CreatedOnLocal": "2019-06-20T17:39:09.5940830-04:00",
"Participants": [
{
"Name": "A",
"Role": "Person"
}
],
"Observations": [
{
"Label": "Report",
},
{
"Label": "Ingest",
},
{
"Label": "Generate",
},
{
"Label": "Ingest",
}
]
}

You can set the map by "value" as your key and associate an increment to that key to make sure a count is maintained. And then you can print your map which should look as you are requesting for.

Related

Groovy: How do iterate through a map to create a new map with values baed on a specific condition

I am in no way an expert with groovy so please don't hold that against me.
I have JSON that looks like this:
{
"metrics": [
{
"name": "metric_a",
"help": "This tracks your A stuff.",
"type": "GAUGE",
"labels": [
"pool"
],
"unit": "",
"aggregates": [],
"meta": [
{
"category": "CAT A",
"deployment": "environment-a"
}
],
"additional_notes": "Some stuff (potentially)"
},
...
]
...
}
I'm using it as a source for automated documentation of all the metrics. So, I'm iterating through it in various ways to get the information I need. So far so good, I'm most of the way there. The problem is this all needs to be organized per the deployment environment. Meaning, multiple metrics will share the same value for deployment.
My thought was I could create a map with deployment as the key and the metric name for any metric that has a matching deployment as the value. Once I have that map, it should be easy for me to organize things the way they should be. I can't figure out how to do that. The result is all the metric names are added which is expected since I'm not doing anything to filter them out. I was thinking that groupBy would make sense here but I can't figure out how to use it effectively and frankly I'm not sure it will solve my problem by itself. Here is my code so far:
parentChild = [:]
children = []
metrics.each { metric ->
def metricName = metric.name
def depName = metric.meta.findResult{ it.deployment }
children.add(metricName)
parentChild.put(depName, children)
}
What is the best way to create a new map where the values for each key are based off a specific condition?
EDIT: The desired result would be each key in the resulting map would be a unique deployment value from all the metrics (as a string). Each value would be name of each metric that contains that deployment (as an array).
[environment-a:
[metric_a,metric_b,metric_c,...],
environment-b:
[metric_d,metric_e,metric_f,...]
...]
I would use a combo of withDefault() to pre-fill each map-entry value with a fresh TreeSet-instance (sorted no-duplicates set) and standard inject().
I reduced your sample data to the bare minimum and added some new nodes:
import groovy.json.*
String input = '''\
{
  "metrics": [
{
"name": "metric_a",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_b",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_c",
"meta": [
{
"deployment": "environment-a"
},
{
"deployment": "environment-b"
}
]
},
{
"name": "metric_d",
"meta": [
{
"deployment": "environment-b"
}
]
}
  ]
}'''
def json = new JsonSlurper().parseText input
def groupedByDeployment = json.metrics.inject( [:].withDefault{ new TreeSet() } ){ res, metric ->
  metric.meta.each{ res[ it.deployment ] << metric.name }
res
}
assert groupedByDeployment.toString() == '[environment-a:[metric_a, metric_b, metric_c], environment-b:[metric_c, metric_d]]'
If your metrics.meta array is supposed to have a single value, you can simplify the code by replacing the line:
metric.meta.each{ res[ it.deployment ] << metric.name }
with
res[ metric.meta.first().deployment ] << metric.name

Moving specific Index Data into a new Index within Elasticsearch

I have several million docs, that I need to move into a new index, but there is a condition on which docs should flow into the index. Say I have a field named, offsets, that needs to be queried against. The values I need to query for are: [1,7,99,32, ....., 10000432] (very large list) in the offset field..
Does anyone have thoughts on how I can move the specific docs, with those values in the list into a new elasticsearch index.? My first though was reindexing with a query, but there is no pattern for the offsets list..
Would it be a python loop appending each doc to a new index? Looking for any guidance.
Thanks
Are the documents really large, or can you add them into an jsonl file for bulk ingestion?
In what form is the selector list, the one shown as "[1,7,99,32, ....., 10000432]"?
I'd do it in Pandas, but here is an idea in ES parlance.
Whatever you do, do use the _bulk API, or the job will never finish.
You can run a query based upon as file as per
GET my_index/_search?_file="myquery_file"
You can put all the ids into a file, myquery_file, as below:
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
},
"format": "jsonl"
}
and output as jsonl to ingest.
You can do the above for the reindex API.
{
"source": {
"index": "source",
**"query": {
"match": {
"company": "cat"
}
}**
},
"dest": {
"index": "dest",
"routing": "=cat"
}
}
Unfortunately,
I was facing a time crunch, and had to throw in a personalized loop to query a very specific subset of indices..
df = pd.read_csv('C://code//part_1_final.csv')
offsets = df['OFFSET'].tolist()
# Offsets are the "unique" values I need to identify the docs by.. There is no pattern in these values, thus I must go one by one..
missedDocs = []
for i in offsets:
print(i)
try:
client.reindex({
"source": {
"index": "<source_index>,
"query": {
"bool": {
"must": [
{ "match" : {"<index_filed_1>": "1" }},
{ "match" : {"<index_with_that_needs_values_to_match": i }}
]
}
}
},
"dest": {
"index": "<dest_index>"
}
})
except KeyError:
print('error')
#missedDocs.append(query)
print('DOC ERROR')

Power Automate FIlter Array with Array Object as Attribute

i have a Object-Array1 with some Attributes that are Object-Array2. I want to filter my Object-Array1 only to these elements, that contain a special value in Object-Array2. How wo i do this? Example:
{
"value": [
{
"title": "aaa",
"ID": 1,
"Responsible": [
{
"EMail": "abc#def.de",
"Id": 1756,
},
{
"EMail: "xyz#xyz.com",
"Id": 289,
}
]
},
{
"title": "bbbb",
"ID": 2,
"Responsible": [
{
"EMail": "tzu#iop.de",
"Id": 1756,
}
]
}
]
}
I want to filter my Object-Array1 (with title & id) only to these elements, that contain abc#def.de
How do i do this in Power Automate with the "Filter Array" Object? I tried this way, but didn't work:
Firstly, you haven't entered an expression, you've entered text. That will never work.
Secondly, even if you did set that as an expression, I don't think you'll be able to make it work over an array, at least, not without specifying more properties and making it a little more complex.
I think the easiest way is to use a contains statement after turning the item into a string ...
The expression I am using on the left hand side is ...
string(item()?['Responsible'])
... and this is the result ...

CosmosDB sort results by a value into an array

I've some CosmosDB documents like the following
{
"ProductId": 1,
"Status": true,
"Code": "123456",
"IsRecall": false,
"ScanLog": [
{
"Location": {
"type": "Point",
"coordinates": [
13.5957758,
42.7111538
]
},
"TimeStamp": 201602160957190600,
"ScanType": 0,
"UserId": "1004"
},
{
"Location": {
"type": "Point",
"coordinates": [
13.5957907,
42.7111359
]
},
"TimeStamp": 201602161246336640,
"ScanType": 0,
"UserId": "1004"
}
]
}
How can I order the query results by the TimeStamp property? I've tried using this query
SELECT c.Code, b.TimeStamp FROM c JOIN b IN c.ScanLog ORDER BY b.TimeStamp
but I receive this error
Order-by over correlated collections is not supported.
What is the correct way to do this?
JOINs with ORDER BY are currently not supported.
However, here is a user defined function (UDF) that will do the trick:
function sortScanLog (scanLog) {
function compareTimeStamps(a, b) {
return a.TimeStamp - b.TimeStamp;
}
return scanLog.sort(compareTimeStamps);
}
You use with a query like this:
SELECT c.ProductId, udf.sortScanLog(c.ScanLog) as ScanLog FROM c
If you want the opposite sort order, simply swap the a and b. So, the signature of the compareTimeStamps inner function would be:
function compareTimeStamps(b, a)
Alternatively, you can sort client-side after the results are returned.
Right now, ORDER BY clauses mixed with JOINs are not supported, the engine can look at indexed properties for JOIN operations but cannot re-order results based on the JOIN result.
You'd have to go with something like Larry offered or do the JOIN on the Query and the Sort by your own code once the results arrive, if you use C#, you can sort them with Linq for example.

Performing a query on the lowest level of a tree-structured Dojo store

Let's say we have a nested data structure like so:
[
{
"name": "fruits",
"items": [
{ "name": "apple" ...}
{ "name": "lemon" ...}
{ "name": "peach" ...}
]
}
{
"name": "veggies",
"items": [
{ "name": "carrot" ...}
{ "name": "cabbage" ...}
]
}
{
"name": "meat",
"items": [
{ "name": "steak" ...}
{ "name": "pork" ...}
]
}
]
The above data is placed in a dojo/store/Memory. I want to perform a query for items that contain the letter "c", but only on the lower level (don't want to query the categories).
With a generic dojo/store/Memory, it's query function only applies a filter on the top level, so the code
store.query(function(item) {
return item.name.indexOf("c") != -1;
});
will only perform the query on the category names (fruits, veggies, etc) instead of the actual items.
Is there a straight-forward way to perform this query on the child nodes, and if there's a match, return all children as well as the parent? For instance, the "c" query would return the "fruits" node with it's "peach" child only, "veggies" would remain intact, and "meat" would be left out of the query results entirely.
You can of course define your own checking method in the store's query method. I don't check if this code runs perfectly, but I guess you could pretty much get what it's meant to do.
store.query(function(item) {
var found = {
name: "",
items: []
};
var children = item.items;
d_array.forEach(children, function(child) {
if (child.name.indexOf("c") != -1) {
found.name = item.name;
found.items.push(child);
}
});
return found;
});
Hope this helps.

Resources