I searched some prior threads, but can't quite figured out what I'm doing wrong based on how I've implemented my N1QL update query for Couchbase. The query I'm running to select my target updates is working fine:
SELECT p FROM `NoSQLDB` AS c UNNEST c.phones as p
WHERE c.type='com.model' AND p.typeCode != 'B_BUS'
this provides the following results:
[
{
"p": {
"type": "com.model.Phone",
"typeCode": "H_HOME",
}
},
{
"p": {
"type": "com.model.Phone",
"typeCode": "H_HOME",
}
}
]
Which is the data I want to update, however, when I go to update typeCode using the following update statement, it executes but doesn't actually change anything:
UPDATE `NoSQLDB` AS c SET p.typeCode = 'B_BUS'
FOR p WITHIN c.phones WHEN c.type='com.model' AND
p.typeCode != 'B_BUS' END LIMIT 1
I also tried this which did nothing as well, thinking maybe the parent type wouldn't be available:
UPDATE `NoSQLDB` AS c SET p.typeCode = 'B_BUS'
FOR p WITHIN c.phones WHEN p.type='com.model.Phone' AND
p.typeCode != 'B_BUS' END LIMIT 1
The parent JSON looks like the following if it's any relevancy:
[
{
"NoSQLDB": {
"id": "01234",
"keyType": "BANANA",
"phones": [
{
"type": "com.model.Phone",
"typeCode": "H_HOME",
},
{
"type": "com.model.Phone",
"typeCode": "B_BUS",
}
],
"type": "com.model",
"updatedTimestamp": "2021-03-24T17:53:52.997+0000"
}
}
]
Any insight on where I went astray is greatly appreciated; thank you in advance!
All - Thanks for anyone who was reviewing, but one of my team members figured it out...in case anyone ever happens across this issue, please see the update query that worked:
UPDATE `NoSQLDB` AS c
SET p.typeCode = “B_BUS” FOR p IN phones WHEN p.typeCode != ‘B_BUS’ END
WHERE c.type=‘com.model’
Related
I am in no way an expert with groovy so please don't hold that against me.
I have JSON that looks like this:
{
"metrics": [
{
"name": "metric_a",
"help": "This tracks your A stuff.",
"type": "GAUGE",
"labels": [
"pool"
],
"unit": "",
"aggregates": [],
"meta": [
{
"category": "CAT A",
"deployment": "environment-a"
}
],
"additional_notes": "Some stuff (potentially)"
},
...
]
...
}
I'm using it as a source for automated documentation of all the metrics. So, I'm iterating through it in various ways to get the information I need. So far so good, I'm most of the way there. The problem is this all needs to be organized per the deployment environment. Meaning, multiple metrics will share the same value for deployment.
My thought was I could create a map with deployment as the key and the metric name for any metric that has a matching deployment as the value. Once I have that map, it should be easy for me to organize things the way they should be. I can't figure out how to do that. The result is all the metric names are added which is expected since I'm not doing anything to filter them out. I was thinking that groupBy would make sense here but I can't figure out how to use it effectively and frankly I'm not sure it will solve my problem by itself. Here is my code so far:
parentChild = [:]
children = []
metrics.each { metric ->
def metricName = metric.name
def depName = metric.meta.findResult{ it.deployment }
children.add(metricName)
parentChild.put(depName, children)
}
What is the best way to create a new map where the values for each key are based off a specific condition?
EDIT: The desired result would be each key in the resulting map would be a unique deployment value from all the metrics (as a string). Each value would be name of each metric that contains that deployment (as an array).
[environment-a:
[metric_a,metric_b,metric_c,...],
environment-b:
[metric_d,metric_e,metric_f,...]
...]
I would use a combo of withDefault() to pre-fill each map-entry value with a fresh TreeSet-instance (sorted no-duplicates set) and standard inject().
I reduced your sample data to the bare minimum and added some new nodes:
import groovy.json.*
String input = '''\
{
"metrics": [
{
"name": "metric_a",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_b",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_c",
"meta": [
{
"deployment": "environment-a"
},
{
"deployment": "environment-b"
}
]
},
{
"name": "metric_d",
"meta": [
{
"deployment": "environment-b"
}
]
}
]
}'''
def json = new JsonSlurper().parseText input
def groupedByDeployment = json.metrics.inject( [:].withDefault{ new TreeSet() } ){ res, metric ->
metric.meta.each{ res[ it.deployment ] << metric.name }
res
}
assert groupedByDeployment.toString() == '[environment-a:[metric_a, metric_b, metric_c], environment-b:[metric_c, metric_d]]'
If your metrics.meta array is supposed to have a single value, you can simplify the code by replacing the line:
metric.meta.each{ res[ it.deployment ] << metric.name }
with
res[ metric.meta.first().deployment ] << metric.name
I have several million docs, that I need to move into a new index, but there is a condition on which docs should flow into the index. Say I have a field named, offsets, that needs to be queried against. The values I need to query for are: [1,7,99,32, ....., 10000432] (very large list) in the offset field..
Does anyone have thoughts on how I can move the specific docs, with those values in the list into a new elasticsearch index.? My first though was reindexing with a query, but there is no pattern for the offsets list..
Would it be a python loop appending each doc to a new index? Looking for any guidance.
Thanks
Are the documents really large, or can you add them into an jsonl file for bulk ingestion?
In what form is the selector list, the one shown as "[1,7,99,32, ....., 10000432]"?
I'd do it in Pandas, but here is an idea in ES parlance.
Whatever you do, do use the _bulk API, or the job will never finish.
You can run a query based upon as file as per
GET my_index/_search?_file="myquery_file"
You can put all the ids into a file, myquery_file, as below:
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
},
"format": "jsonl"
}
and output as jsonl to ingest.
You can do the above for the reindex API.
{
"source": {
"index": "source",
**"query": {
"match": {
"company": "cat"
}
}**
},
"dest": {
"index": "dest",
"routing": "=cat"
}
}
Unfortunately,
I was facing a time crunch, and had to throw in a personalized loop to query a very specific subset of indices..
df = pd.read_csv('C://code//part_1_final.csv')
offsets = df['OFFSET'].tolist()
# Offsets are the "unique" values I need to identify the docs by.. There is no pattern in these values, thus I must go one by one..
missedDocs = []
for i in offsets:
print(i)
try:
client.reindex({
"source": {
"index": "<source_index>,
"query": {
"bool": {
"must": [
{ "match" : {"<index_filed_1>": "1" }},
{ "match" : {"<index_with_that_needs_values_to_match": i }}
]
}
}
},
"dest": {
"index": "<dest_index>"
}
})
except KeyError:
print('error')
#missedDocs.append(query)
print('DOC ERROR')
Thanks for taking the time out to read this. I want to find a way of parsing the json below. I'm really struggling to get the correct values out. I am getting this info from an API, and want to save this data into a database.
I am really struggling to parse info_per_type because I first need to get the available_types. This can change depending on the info available (i.e. I might get 2 different types in the next call, there's a total of 4) so my code needs to be flexible enough to deal with this
```
{
"data": [
{
"home_team": "Ravenna",
"id": 82676,
"available_types": [
"type_a",
"type_b"
],
"info_per_type": {
"type_a": {
"options": {
"X": 0.302,
"X2": 0.61,
"X3": 0.692,
"X4": 0.698,
"X5": 0.39,
"X6": 0.308
},
"status": "pending",
"output": "12",
"option_values": {
"X": 3.026,
"X2": 1.347,
"X3": 1.516,
"X4": 1.316,
"X5": 2.936,
"X6": 2.339
}
},
"type_b": {
"options": {
"yes": 0.428,
"no": 0.572
},
"status": "pending",
"output": "no",
"option_values": {
"yes": null,
"no": null
}
}
}
}
]
}```
So far, I can get the available_types out. But after that, I'm stuck. I have tried eval and exec but I can't seem to get that working either.
```
r = requests.get(url, headers=headers).text
arrDetails = json.loads(r)
arrDetails = arrDetails['data']
x = arrDetails[0]['available_types']
print(x[1]) #I get the correct value here
y = exec("y = arrDetails[0]['info_per_type']['" + x[1] + "']")
print(y)```
When I print out y I get None. What I want is some way to reference that part of the json file, as the results within that node are what I need. Any help would be HIGHLY appreciated!
Something like this should work :
for row in arrDetails['data']:
for available_type in row['available_types']:
print(row['info_per_type'][available_type])
I have a map function
function (doc) {
for(var n =0; n<doc.Observations.length; n++){
emit(doc.Scenario, doc.Observations[n].Label);
}
}
the above returns the following:
{"key":"Splunk","value":"Organized"},
{"key":"Splunk","value":"Organized"},
{"key":"Splunk","value":"Organized"},
{"key":"Splunk","value":"Generate"},
{"key":"Splunk","value":"Ingest"}
I"m looking to design a reduce function that will then return the counts of the above values, something akin to:
Organized: 3
Generate: 1
Ingest: 1
My map function has to filter on my Scenario field, hence why I have it as an emitted key in the map function.
I've tried using a number of the built in reduce functions, but I end up getting count of rows, or nothing at all as the functions available don't apply.
I just need to access the counts of each of the elements that appear in the values field. Also, the values present here are representative, there could 100s of different types of values found in the values field for what that's worth.
I really appreciate the help!
Here's sample input:
{
"_id": "dummyId",
"test": "test",
"Team": "Alpha",
"CreatedOnUtc": "2019-06-20T21:39:09.5940830Z",
"CreatedOnLocal": "2019-06-20T17:39:09.5940830-04:00",
"Participants": [
{
"Name": "A",
"Role": "Person"
}
],
"Observations": [
{
"Label": "Report",
},
{
"Label": "Ingest",
},
{
"Label": "Generate",
},
{
"Label": "Ingest",
}
]
}
You can set the map by "value" as your key and associate an increment to that key to make sure a count is maintained. And then you can print your map which should look as you are requesting for.
I'm trying to execute some aggregate queries against data in TSI. For example:
{
"searchSpan": {
"from": "2018-08-25T00:00:00Z",
"to": "2019-01-01T00:00:00Z"
},
"top": {
"sort": [
{
"input": {
"builtInProperty": "$ts"
}
}
]
},
"aggregates": [
{
"dimension": {
"uniqueValues": {
"input": {
"builtInProperty": "$esn"
},
"take": 100
}
},
"measures": [
{
"count": {}
}
]
}
]
}
The above query, however, does not return any record, although there are many events stored in TSI for that specific searchSpan. Here is the response:
{
"warnings": [],
"events": []
}
The query is based on the examples in the documentation which can be found here and which is actually lacking crucial information for requirements and even some examples do not work...
Any help would be appreciated. Thanks!
#Vladislav,
I'm sorry to hear you're having issues. In reviewing your API call, I see two fixes that should help remedy this issue:
1) It looks like you're using our /events API with payload for /aggregates API. Notice the "events" in the response. Additionally, “top” will be redundant for /aggregates API as we don't support top-level limit clause for our /aggregates API.
2) We do not enforce "count" property to be present in limit clause (“take”, “top” or “sample”) and it looks like you did not specify it, so by default, the value was set to 0, that’s why the call is returning 0 events.
I would recommend that you use /aggregates API rather than /events, and that “count” is specified in the limit clause to ensure you get some data back.
Additionally, I'll note your feedback on documentation. We are ramping up a new hire on documentation now, so we hope to improve the quality soon.
I hope this helps!
Andrew