How can I avoid a forest of apostrophes? - python-3.x

Using Python 3.7, I have this confusing-looking, nested dictionary:
dict = \
{
'HBL_Posts':
{'vNames':[ 'id_no', 'display_msg_no', 'thread', 'headline', 'category', 'author',
'auth_addr', 'author_pic_line', 'postbody',
'last_msg_no', 'mf_lnk', 'subject_header' ],
'data_fname':'_Posts_plain.htm', 'tpl_fname':'_Posts_tpl.htm', 'addrs_fname':'_addrs.csv' },
'MOTM':
{'vNames':[ 'work_month', 'zoom', 'zoom_id', 'headline', 'description', 'subject_header' ],
'data_fname':'_Posts_plain.htm', 'tpl_fname':'_Posts_tpl.htm', 'addrs_fname':'_addrs.csv'},
'MOTM recording':
{'vNames':[ 'topic', 'description', 'wDate', 'box', 'chat'],
'data_fname':'_Recording_data.htm', 'tpl_fname':'_Recording_tpl.htm', 'addrs_fname':'_addrs.csv'},
'Enticement':
{'vNames':[ 'enticing_post', 'headline', 'hb_preface', 'postscript'],
'data_fname':'_Entice_data.htm', 'tpl_fname':'_Entice_tpl.htm', 'addrs_fname':'_entice.csv'}
}
If I initially set each variable to its own name, like: HBL_Posts = 'HBL_Posts', I can substitute this, much clearer and less typo-prone, code:
dict = \
{
HBL_Posts:
{vNames:[ id_no, display_msg_no, thread, headline, category, author,
auth_addr, author_pic_line, postbody,
last_msg_no, mf_lnk, subject_header ],
data_fname:_Posts_plain.htm, tpl_fname:_Posts_tpl.htm, addrs_fname:_addrs.csv },
MOTM:
{vNames:[ work_month, zoom, zoom_id, headline, description, subject_header ],
data_fname:_Posts_plain.htm, tpl_fname:_Posts_tpl.htm, addrs_fname:_addrs.csv},
MOTM recording:
{vNames:[ topic, description, wDate, box, chat],
data_fname:_Recording_data.htm, tpl_fname:_Recording_tpl.htm, addrs_fname:_addrs.csv},
Enticement:
{vNames:[ enticing_post, headline, hb_preface, postscript],
data_fname:_Entice_data.htm, tpl_fname:_Entice_tpl.htm, addrs_fname:_entice.csv}
}
In fact I accomplished this by just doing all the required assignments, one at a time. But that is about as complicated as the original dictionary set up, with the apostrophes. What I'd like is a function that would enable me to do this neatly and economically.
def self_name(s):
[?????]
Then I could have a list of all the variables, vars_lst, and loop through it setting each to the literal version of itself:
for item in vars_lst:
item = self_name(item)
To avoid having to use apostrophes in setting up vars_lst, I would accept doing:
HBL_Posts = vNames = id_no = . . . = ''
After many, many hours of struggle, I have been unable to supply the needed code for the self_name function. How can I do that, or how can I find another way of avoiding so many apostrophes?

Indent it like JSON:
{
"HBL_Posts": {
"vNames": [
"id_no",
"display_msg_no",
"thread",
"headline",
"category",
"author",
"auth_addr",
"author_pic_line",
"postbody",
"last_msg_no",
"mf_lnk",
"subject_header"
],
"data_fname": "_Posts_plain.htm",
"tpl_fname": "_Posts_tpl.htm",
"addrs_fname": "_addrs.csv"
},
"MOTM": {
"vNames": [
"work_month",
"zoom",
"zoom_id",
"headline",
"description",
"subject_header"
],
"data_fname": "_Posts_plain.htm",
"tpl_fname": "_Posts_tpl.htm",
"addrs_fname": "_addrs.csv"
},
"MOTM recording": {
"vNames": [
"topic",
"description",
"wDate",
"box",
"chat"
],
"data_fname": "_Recording_data.htm",
"tpl_fname": "_Recording_tpl.htm",
"addrs_fname": "_addrs.csv"
},
"Enticement": {
"vNames": [
"enticing_post",
"headline",
"hb_preface",
"postscript"
],
"data_fname": "_Entice_data.htm",
"tpl_fname": "_Entice_tpl.htm",
"addrs_fname": "_entice.csv"
}
}
or even store that in a .json file and load it via:
import json
with open('my_file.json', 'r') as f:
my_dict = json.load(f)
JSON is easy for most people to read and the indentation is easy to see. Plus it is easy to save and read from a file so you don't have to clutter your code.
FYI:
You can pretty print a dictionary using:
import json
my_dict = ...
print(json.dumps(my_dict, indent=4))
which is how I printed your dictionary.

Related

Groovy: How do iterate through a map to create a new map with values baed on a specific condition

I am in no way an expert with groovy so please don't hold that against me.
I have JSON that looks like this:
{
"metrics": [
{
"name": "metric_a",
"help": "This tracks your A stuff.",
"type": "GAUGE",
"labels": [
"pool"
],
"unit": "",
"aggregates": [],
"meta": [
{
"category": "CAT A",
"deployment": "environment-a"
}
],
"additional_notes": "Some stuff (potentially)"
},
...
]
...
}
I'm using it as a source for automated documentation of all the metrics. So, I'm iterating through it in various ways to get the information I need. So far so good, I'm most of the way there. The problem is this all needs to be organized per the deployment environment. Meaning, multiple metrics will share the same value for deployment.
My thought was I could create a map with deployment as the key and the metric name for any metric that has a matching deployment as the value. Once I have that map, it should be easy for me to organize things the way they should be. I can't figure out how to do that. The result is all the metric names are added which is expected since I'm not doing anything to filter them out. I was thinking that groupBy would make sense here but I can't figure out how to use it effectively and frankly I'm not sure it will solve my problem by itself. Here is my code so far:
parentChild = [:]
children = []
metrics.each { metric ->
def metricName = metric.name
def depName = metric.meta.findResult{ it.deployment }
children.add(metricName)
parentChild.put(depName, children)
}
What is the best way to create a new map where the values for each key are based off a specific condition?
EDIT: The desired result would be each key in the resulting map would be a unique deployment value from all the metrics (as a string). Each value would be name of each metric that contains that deployment (as an array).
[environment-a:
[metric_a,metric_b,metric_c,...],
environment-b:
[metric_d,metric_e,metric_f,...]
...]
I would use a combo of withDefault() to pre-fill each map-entry value with a fresh TreeSet-instance (sorted no-duplicates set) and standard inject().
I reduced your sample data to the bare minimum and added some new nodes:
import groovy.json.*
String input = '''\
{
  "metrics": [
{
"name": "metric_a",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_b",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_c",
"meta": [
{
"deployment": "environment-a"
},
{
"deployment": "environment-b"
}
]
},
{
"name": "metric_d",
"meta": [
{
"deployment": "environment-b"
}
]
}
  ]
}'''
def json = new JsonSlurper().parseText input
def groupedByDeployment = json.metrics.inject( [:].withDefault{ new TreeSet() } ){ res, metric ->
  metric.meta.each{ res[ it.deployment ] << metric.name }
res
}
assert groupedByDeployment.toString() == '[environment-a:[metric_a, metric_b, metric_c], environment-b:[metric_c, metric_d]]'
If your metrics.meta array is supposed to have a single value, you can simplify the code by replacing the line:
metric.meta.each{ res[ it.deployment ] << metric.name }
with
res[ metric.meta.first().deployment ] << metric.name

python regex usage: how to start with , least match , get content in middle [duplicate]

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

JSON Extract to dataframe using python

I have a JSON file and the structure of the file is as below
[json file with the structure][1]
I am trying to get all the details into dataframe or tabular form, Tried using denormalize and could not get the actual result.
{
"body": [{
"_id": {
"s": 0,
"i": "5ea6c8ee24826b48cc560e1c"
},
"fdfdsfdsf": "V2_1_0",
"dsd": "INDIA-",
"sdsd": "df-as-3e-ds",
"dsd": 123,
"dsds": [{
"dsd": "s_10",
"dsds": [{
"dsdsd": "OFFICIAL",
"dssd": {
"dsds": {
"sdsd": "IND",
"dsads": 0.0
}
},
"sadsad": [{
"fdsd": "ABC",
"dds": {
"dsd": "INR",
"dfdsfd": -1825.717444
},
"dsss": [{
"id": "A:B",
"dsdsd": "A.B"
}
]
}, {
"name": "dssadsa",
"sadds": {
"sdsads": "INR",
"dsadsad": 180.831415
},
"xcs": "L:M",
"sds": "L.M"
}
]
}
]
}
]
}
]
}
This structure is far too nested to put directly into a dataframe. First, you'll need to use the ol' flatten_json function. This function isn't in a library (to my knowledge), but you see it around a lot. Save it somewhere.
def flatten_json(nested_json):
"""
Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
Applying it to your data:
import json
with open('deeply_nested.json', r) as f:
flattened_json = flatten_json(json.load(f))
df = pd.json_normalize(flattened_json)
df.columns
Index(['body_0__id_s', 'body_0__id_i', 'body_0_schemaVersion',
'body_0_snapUUID', 'body_0_jobUUID', 'body_0_riskSourceID',
'body_0_scenarioSets_0_scenario',
'body_0_scenarioSets_0_modelSet_0_modelPolicyLabel',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_unit',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_name',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_id',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_underlyingRef',
'body_0_scenarioSets_0_modelSet_0_measures_1_name',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_1_riskFactors',
'body_0_scenarioSets_0_modelSet_0_measures_1_underlyingRef'],
dtype='object')

How to iterate through indexed field to add field from another index

I'm rather new to elasticsearch, so i'm coming here in hope to find advices.
I have two indices in elastic from two different csv files.
The index_1 has this mapping:
{'settings': {
'number_of_shards' : 3
},
'mappings': {
'properties': {
'place': {'type': 'keyword' },
'address': {'type': 'keyword' },
}
}
}
The file is about 400 000 documents long.
The index_2 with a much smaller file(about 50 documents) has this mapping:
{'settings': {
"number_of_shards" : 1
},
'mappings': {
'properties': {
'place': {'type': 'text' },
'address': {'type': 'keyword' },
}
}
}
The field "place" in index_2 is all of the unique values from the field "place" in index_1.
In both indices the "address" fields are postcodes of datatype keyword with a structure: 0000AZ.
Based on the "place" field keyword in index_1 I want to assign the term of field "address" from index_2.
I have tried using the pandas library but the index_1 file is too large. I have also to tried creating modules based off pandas and elasticsearch, quite unsuccessfully. Although I believe this is a promising direction. A good solution would be to stay into the elasticsearch library as much as possible as these indices will be later be used for further analysis.
If i understand correctly it sounds like you want to use updateByQuery.
the request body should look a little like this:
{
'query': {'term': {'place': "placeToMatch"}},
'script': 'ctx._source.address = "updatedZipCode"'
}
This will update the address field of all documents with the matched place.
EDIT:
So what we want to do is use updateByQuery while iterating over all the documents in index2.
First step: get all the documents from index2, will just do this using the basic search feature
{
"index": 'index2',
"size": 100 // get all documents, once size is over 10,000 you'll have to padginate.
"body": {"query": {"match_all": {}}}
}
Now we iterate over all the results and use updateByQuery for each of the results:
// sudo
doc = response[i]
// update by query request.
{
index: 'index1',
body: {
'query': {'term': {'address': doc._source.address}},
'script': 'ctx._source.place = "`${doc._source.place}`"'
}
}

delete if key = value Groovy

I have a ruby script that I need to convert to Groovy, its a simple case of removing a collection if key: value.
So in my setup I make a request to the github api
def jsonParse(def json) {
new groovy.json.JsonSlurperClassic().parseText(json)
}
def request = sh script: """curl https://api.github.com/repos/org/${repo}/releases?access_token=${env.TOKEN}""", returnStdout: true
def list = jsonParse(request)
return list
This returns output like so
[
[prerelease: 'true', author: [surname: 'surname', book: 'title'], surname: 'surname'],
[prerelease: 'false', author: [surname: 'surname', book: 'title'], surname: 'surname']
]
In Ruby i would do the following
array.delete_if { |key| key['prerelease'] == true }
How would I approach this with Groovy, if an explanation could be provided that would also be great so i can learn from it
Update
Using the approach from #Rao my list is exactly the same
def request = sh script: """curl https://api.github.com/repos/org/${repo}/releases?access_token=${env.TOKEN}""", returnStdout: true
def list = jsonParse(request)
list.removeAll(list.findAll{it.prerelease == 'true'})
return list
Raw response
[
{"prerelease": true, "author": [ {"surname": "surname", "book": "title"}, "surname": "surname"],
{"prerelease": false, "author": [ {"surname": "surname", "book": "title"}, "surname": "surname"]
]
e. g. with
array = array.findAll { it.prerelease != 'true' }
I guess you don't need anymore explanation?
The sample data is list of maps.
Need to remove the items from the list by filtering prerelease is equal to true. Hope this is string only as embedded between the quotes.
Here is the script which results filtered list.
def list = [
[prerelease: 'true', author: [surname: 'surname', book: 'title'], surname: 'surname'],
[prerelease: 'true', author: [surname: 'surname', book: 'title'], surname: 'surname'],
[prerelease: 'false', author: [surname: 'surname', book: 'title'], surname: 'surname']
]
//Remove all those elements(map) from list if matching the condition
list.removeAll(list.findAll{it.prerelease == 'true'})
//Show the updated list
println list
You can quickly try it online Demo
EDIT: based on OP comment.
Isn't this below output what you want?
EDIT2: Based on OP changed response as Json
def json = """[
{"prerelease": true, "author": [ {"surname": "surname", "book": "title"}]},
{"prerelease": false, "author": [ {"surname": "surname", "book": "title"}]}
]"""
list = new groovy.json.JsonSlurper().parseText(json)
println new groovy.json.JsonBuilder(list.findAll{ it.prerelease != true }).toPrettyString()

Resources