I am trying to rename the nested fields from Elasticsearch while migrating to Amazonelasticsearch
In the document, I want to change the
1.If the value field has JSON type. Change the value field to value-keyword and remove "value-whitespace" and "value-standard" if present
2.If the value field has a size of more than 15. Change the value field to value-standard
"_source": {
"applicationid" : "appid",
"interactionId": "716bf006-7280-44ea-a52f-c79da36af1c5",
"interactionInfo": [
{
"value": """{"edited":false}""",
"value-standard": """{"edited":false}""",
"value-whitespace" : """{"edited":false}"""
"title": "msgMeta"
},
{
"title": "msg",
"value": "hello testing",
},
{
"title": "testing",
"value": "I have a text that can be done and changed only the size exist more than 20 so we applied value-standard ",
}
],
"uniqueIdentifier": "a21ed89c-b634-4c7f-ca2c-8be6f31ae7b3",
}
}
the end result should be
"_source": {
"applicationid" : "appid",
"interactionId": "716bf006-7280-44ea-a52f-c79da36af1c5",
"interactionInfo": [
{
"value-keyword": """{"edited":false}""",
"title": "msgMeta"
},
{
"title": "msg",
"value": "hello testing",
},
{
"title": "testing",
"value-standard": "I have a text that can be done and changed only the size exist more than 20 and so we applied value-standard ",
}
],
"uniqueIdentifier": "a21ed89c-b634-4c7f-ca2c-8be6f31ae7b3",
}
}
For 2), you can do it like this:
filter {
if [_source][interactionInfo][2][value] =~ /.{15,15}/ {
mutate {
rename => ["[_source][interactionInfo][2][value]","[_source][interactionInfo][2][value-standard]"]
}
}
}
The regex .{15,15} matches any string 15 characters long. If the field is shorter than 15 characters long, the regex doesn't match and the mutate#rename isn't applied.
For 1), one possible solution would be trying to parse the field with the json filter and if there's no _jsonparsefailure tag, rename the field.
Founded the solution for this one. I have used a ruby filter in Logstash to check each and every document as well as nested document
Here is the ruby code
require 'json'
def register(param)
end
def filter(event)
infoarray = event.get("interactionInfo")
infoarray.each { |x|
if x.include?"value"
value = x["value"]
if value.length > 15
apply_only_keyword(x)
end
end
if x.include?"value"
value = x["value"]
if validate_json(value)
apply_only_keyword(x)
end
end
}
event.set("interactionInfo",infoarray)
return [event]
end
def validate_json(value)
if value.nil?
return false
end
JSON.parse(value)
return true
rescue JSON::ParserError => e
return false
end
def apply_only_keyword(x)
x["value-keyword"] = x["value"]
x.delete("value")
if x.include?"value-standard"
x.delete("value-standard")
end
if x.include?"value-whitespace"
x.delete("value-whitespace")
end
end
Related
I had a csv file that contains 10 user ids and the requirement is to construct a payload for every 5 users in the csv. Below is my code:
def start = (vars.get('__jm__Thread Group__idx') as int)
def offset = 5
def payload = [:]
def data = []
def file = 'C:/path/dataset.csv'
start.upto(offset, { index ->
def lineFromCsv = new File(file).readLines().get(index)
data.add(['userId': lineFromCsv.split(',')[0], 'groupId': lineFromCsv.split(',')[1]])
})
payload.put('data', data)
log.info("%%%The Payload is%%%:" + payload)
vars.put('payload', new groovy.json.JsonBuilder(payload).toPrettyString())
My 1st question is why there were 6 items in the first payload (1st iteration), where I was expecting 5. And there were 5 items in the 2nd payload (2nd iteration) as expected. Every payload was supposed to have the same # of items in it.
My 2nd question is that how do I make the 2nd payload start parsing from where the 1st payload left off. The 2nd payload was supposed to contain the next 5 users in the csv? There should not have any overlap items between each payloads.
Below is the payload:
1st payload:
POST data:
{
"data": [
{
"userId": "fakeUser3k0000002",
"groupId": "1"
},
{
"userId": "fakeUser3k0000003",
"groupId": "2"
},
{
"userId": "fakeUser3k0000004",
"groupId": "2"
},
{
"userId": "fakeUser3k0000005",
"groupId": "3"
},
{
"userId": "fakeUser3k0000006",
"groupId": "4"
},
{
"userId": "fakeUser3k0000007",
"groupId": "5"
}
]
}
2nd payload:
POST data:
{
"data": [
{
"userId": "fakeUser3k0000003",
"groupId": "2"
},
{
"userId": "fakeUser3k0000004",
"groupId": "2"
},
{
"userId": "fakeUser3k0000005",
"groupId": "3"
},
{
"userId": "fakeUser3k0000006",
"groupId": "4"
},
{
"userId": "fakeUser3k0000007",
"groupId": "5"
}
]
}
def start = (vars.get('__jm__Thread Group__idx') as int)
def offset = 5
i guess Thread Group is your jmeter loop name.
you have to build loop like this
(start*offset).step((start+1)*offset,1){ index->
println it
}
so for start=5 you'll have index 25 to 29
In order to get 5 items in first payload you need to define the offset to 4 as __jm__Thread Group__idx variable value is 0 during the first iteration, you can check it using Debug Sampler and View Results Tree listener combination
In order start 2nd iteration from position 4 you need to store the offset value into a JMeter Variable after constructing the first payload and read it during 2nd iteration.
I have a list example_list contains two dict objects, it looks like this:
[
{
"Meta": {
"ID": "1234567",
"XXX": "XXX"
},
"bbb": {
"ccc": {
"ddd": {
"eee": {
"fff": {
"xxxxxx": "xxxxx"
},
"www": [
{
"categories": {
"ppp": [
{
"content": {
"name": "apple",
"price": "0.111"
},
"xxx: "xxx"
}
]
},
"date": "A2020-01-01"
}
]
}
}
}
}
},
{
"Meta": {
"ID": "78945612",
"XXX": "XXX"
},
"bbb": {
"ccc": {
"ddd": {
"eee": {
"fff": {
"xxxxxx": "xxxxx"
},
"www": [
{
"categories": {
"ppp": [
{
"content": {
"name": "banana",
"price": "12.599"
},
"xxx: "xxx"
}
]
},
"date": "A2020-01-01"
}
]
}
}
}
}
}
]
now I want to filter the items and only keep "ID": "xxx" and the correspoding value for "price": "0.111", expected result can be something similar to :
[{"ID": "1234567", "price": "0.111"}, {"ID": "78945612", "price": "12.599"}]
or something like {"1234567":"0.111", "78945612":"12.599" }
Here's what I've tried:
map_list=[]
map_dict={}
for item in example_list:
#get 'ID' for each item in 'meta'
map_dict['ID'] = item['meta']['ID']
# get 'price'
data_list = item['bbb']['ccc']['ddd']['www']
for data in data_list:
for dataitem in data['categories']['ppp']
map_dict['price'] = item["content"]["price"]
map_list.append(map_dict)
print(map_list)
The result for this doesn't look right, feels like the item isn't iterating properly, it gives me result:
[{"ID": "78945612", "price": "12.599"}, {"ID": "78945612", "price": "12.599"}]
It gave me duplicated result for the second ID but where is the first ID?
Can someone take a look for me please, thanks.
Update:
From some comments from another question, I understand the reason for the output keeps been overwritten is because the key name in the dict is always the same, but I'm not sure how to fix this because the key and value needs to be extracted from different level of for loops, any help would be appreciated, thanks.
as #Scott Hunter has mentioned, you need to create a new map_dict everytime you are trying to do this. Here is a quick fix to your solution (I am sadly not able to test it right now, but it seems right to me).
map_list=[]
for item in example_list:
# get 'price'
data_list = item['bbb']['ccc']['ddd']['www']
for data in data_list:
for dataitem in data['categories']['ppp']:
map_dict={}
map_dict['ID'] = item['meta']['ID']
map_dict['price'] = item["content"]["price"]
map_list.append(map_dict)
print(map_list)
But what are you doing here is that you are basically just "forcing" your way through ... I recommend you to take a break and check out somekind of tutorial, which will help you to understand how it really works in the back-end. This is how I would have written it:
list_dicts = []
for example in example_list:
for www in item['bbb']['ccc']['ddd']['www']:
for www_item in www:
list_dicts.append({
'ID': item['meta']['ID'],
'price': www_item["content"]["price"]
})
Good luck with this problem and hope it helps :)
You need to create a new dictionary for map_dict for each ID.
I wrote an Elastic query which will check the condition (status="APPROVED") and Gets all approved_by objects.
This is my index (portfolio):
{
"settings": {},
"mappings": {
"portfolio": {
"properties": {
"status": {
"type": "keyword",
"normalizer": "lcase_ascii_normalizer"
},
"archived_at": {
"type": "date"
},
"approved_by": {
"id": "text",
"name":"text"
}
}
}
}
}
Currently I have 60 objects whose status are approved , so when i run the query it will show 60 objects,but in my case i am getting only one object(I debugged the code, total 60 objects are coming as expected, but still returning only single object), please help guys.
My query:
profiles = client.search(index='portfolio', doc_type='portfolio',
scroll='10m', size=1000,
body={
"query": {"match": {"status": "APPROVED"}}
})
sid = profiles['_scroll_id']
scroll_size = len(profiles['hits']['hits'])
while scroll_size > 0:
for info in profiles['hits']['hits']:
item = info['_source']
approved_by_obj = item.get('approved_by')
if approved_by_obj:
return (jsonify({"approved_by": approved_by_obj}))
Expected o/p format:
{
"approved_by": {
"id": "system",
"name": "system"
}
}
You're getting only one result because you're returning from your loop, effectively breaking out of it completely.
So, instead of returning from it, append the found approved_by_object to a list of your choice and then return that 60-member list:
profiles = client.search(index='portfolio', doc_type='portfolio',
scroll='10m', size=1000,
body={
"query": {"match": {"status": "APPROVED"}}
})
sid = profiles['_scroll_id']
scroll_size = len(profiles['hits']['hits'])
approved_hits_sources = [] # <-- add this
while scroll_size > 0:
for info in profiles['hits']['hits']:
item = info['_source']
approved_by_obj = item.get('approved_by')
if approved_by_obj:
approved_hits_sources.append({"approved_by": approved_by_obj}) # <--- append and not return
return jsonify({"approved_hits_sources": approved_hits_sources})
In my logic app, I have a JSON object (parsed from an API response) and it contains an object array.
How can I find a specific element based on attribute values... Example below where I want to find the (first) active one
{
"MyList" : [
{
"Descrip" : "This is the first item",
"IsActive" : "N"
},
{
"Descrip" : "This is the second item",
"IsActive" : "N"
},
{
"Descrip" : "This is the third item",
"IsActive" : "Y"
}
]
}
Well... The answer is in plain sight ... There's a FILTER ARRAY action, which works on a JSON Object (from PARSE JSON action).. couple this with an #first() expression will give the desired outcome.
You can use the Parse JSON Task to parse your JSON and a Condition to filter for the IsActive attribute:
Use the following Schema to parse the JSON:
{
"type": "object",
"properties": {
"MyList": {
"type": "array",
"items": {
"type": "object",
"properties": {
"Descrip": {
"type": "string"
},
"IsActive": {
"type": "string"
}
},
"required": [
"Descrip",
"IsActive"
]
}
}
}
}
Here how it looks like (I included the sample data you provided to test it):
Then you can add the Condition:
And perform whatever action you want within the If true section.
I'm using Elasticsearch + Logstash + kibana for windows eventlog analysis. And i get the following log:
{
"_index": "logstash-2015.04.16",
"_type": "logs",
"_id": "Ov498b0cTqK8W4_IPzZKbg",
"_score": null,
"_source": {
"EventTime": "2015-04-16 14:12:45",
"EventType": "AUDIT_FAILURE",
"EventID": "4656",
"Message": "A handle to an object was requested.\r\n\r\nSubject:\r\n\tSecurity ID:\t\tS-1-5-21-2832557239-2908104349-351431359-3166\r\n\tAccount Name:\t\ts.tekotin\r\n\tAccount Domain:\t\tIAS\r\n\tLogon ID:\t\t0x88991C8\r\n\r\nObject:\r\n\tObject Server:\t\tSecurity\r\n\tObject Type:\t\tFile\r\n\tObject Name:\t\tC:\\Folders\\Общая (HotSMS)\\Test_folder\\3\r\n\tHandle ID:\t\t0x0\r\n\tResource Attributes:\t-\r\n\r\nProcess Information:\r\n\tProcess ID:\t\t0x4\r\n\tProcess Name:\t\t\r\n\r\nAccess Request Information:\r\n\tTransaction ID:\t\t{00000000-0000-0000-0000-000000000000}\r\n\tAccesses:\t\tReadData (or ListDirectory)\r\n\t\t\t\tReadAttributes\r\n\t\t\t\t\r\n\tAccess Reasons:\t\tReadData (or ListDirectory):\tDenied by\tD:(D;OICI;CCDCLCSWRPWPLOCRSDRC;;;S-1-5-21-2832557239-2908104349-351431359-3166)\r\n\t\t\t\tReadAttributes:\tGranted by ACE on parent folder\tD:(A;OICI;0x1200a9;;;S-1-5-21-2832557239-2908104349-351431359-3166)\r\n\t\t\t\t\r\n\tAccess Mask:\t\t0x81\r\n\tPrivileges Used for Access Check:\t-\r\n\tRestricted SID Count:\t0",
"ObjectServer": "Security",
"ObjectName": "C:\\Folders\\Общая (HotSMS)\\Test_folder\\3",
"HandleId": "0x0",
"PrivilegeList": "-",
"RestrictedSidCount": "0",
"ResourceAttributes": "-",
"#timestamp": "2015-04-16T11:12:45.802Z"
},
"sort": [
1429182765802,
1429182765802
]
}
I get many log messages with different EventID, and when I recieve a log entry with EventID 4656 - i want to replace the value "4656" with a string "Access Failure". Is there a chance to do so?
You can do it when you are loading with logstash -- just do something like this:
filter {
if [EventID] == "4656" {
mutate {
replace => [ "EventID", "Access Failure" ]
}
}
}
If you have a lot of values, look at translate{}:
translate {
dictionary => [
"4656", "Access Failure",
"1234", "Another Value"
]
field => "EventID"
destination => "EventName"
}
I don't think translate{} will let you replace the original field. You could remove it, though, in favor of the new field.
use replace filter:
Replace a field with a new value. The new value can include %{foo} strings to help you build a new value from other parts of the event.
Example:
filter {
if [source] == "your code like 4656" {
mutate {
replace => { "message" => "%{source_host}: My new message" }
}
}
}