how to replace null to 'null' in a python dict - python-3.x

This might be a silly question. I would like to replace from null to 'null' in a dict in python.
mydict = {"headers": {"ai5": "8fa683e59c02c04cb781ac689686db07", "debug": null, "random": **null**, "sdkv": "7.6"}, "post": {"event": "ggstart", "ts": "1462759195259"}, "params": {}, "bottle": {"timestamp": "2016-05-09 02:00:00.004906", "game_id": "55107008"}}
I can't do any string operation in python as it throws error:
NameError: name 'null' is not defined
I have a huge file of 18000 this type of data, and I can't do it manually.
Please help.

You dont need to replace anything. Just load the file and convert the data to dict.
import json
import pprint
with open('x.json') as f:
data = json.load(f)
pprint.pprint(data)
Input (x.json)
{
"headers": {
"ai5": "8fa683e59c02c04cb781ac689686db07",
"debug": null,
"random": null,
"sdkv": "7.6"
},
"post": {
"event": "ggstart",
"ts": "1462759195259"
},
"params": {},
"bottle": {
"timestamp": "2016-05-09 02:00:00.004906",
"game_id": "55107008"
}
}
Output
{'bottle': {'game_id': '55107008', 'timestamp': '2016-05-09 02:00:00.004906'},
'headers': {'ai5': '8fa683e59c02c04cb781ac689686db07',
'debug': None,
'random': None,
'sdkv': '7.6'},
'params': {},
'post': {'event': 'ggstart', 'ts': '1462759195259'}}

You can try something like this. It replaces every null with 'null' and stores it into new file.
Storing it into same file or writing it to new file and then replacing it with original one is upto you.
import re
f_handle = open("test.txt","r+")
f_2 = open("result.txt","w+")
for f_string in f_handle.readlines():
print(f_string)
f_result = re.sub(r'(?:^|\W)null(?:$|\W)',"'null'",f_string)
print(f_result)
f_2.write(f_result)
f_handle.close()
f_2.close()

import json
x = json.loads(mydict)
and then null in your dict will change to be "".

Related

How to convert a dictionary according to a json scheme, Python3

I have a json scheme, which specifies the format of a dictionary in Python 3.
INPUT_SCHEME = {
"type": "object",
"properties": {
"a1": {
"type": "object",
"properties": {
"a1_1": {"type": ["string", "null"]},
"a1_2": {"type": ["number", "null"]},
},
"additionalProperties": False,
"minProperties": 2,
},
"a2": {
"type": "array",
"items": {"type": ["number", "null"]},
},
"a3": {
"type": ["number", "null"],
},
"a4": {
"type": "object",
"properties": {
"a4_1": {"type": ["string", "null"]},
"a4_2": {
"type": "object",
"properties": {
"a4_2_1": {"type": ["string", "null"]},
"a4_2_2": {"type": ["number", "null"]},
},
"additionalProperties": False,
"minProperties": 2,
},
},
"additionalProperties": False,
"minProperties": 2,
},
"a5": {
"type": "array",
"items": {
"type": "object",
"properties": {
"a5_1": {"type": ["string", "null"]},
"a5_2": {"type": ["number", "null"]},
},
"additionalProperties": False,
"minProperties": 2,
},
},
},
"additionalProperties": False,
"minProperties": 5,
}
And I want to write a function which can convert an arbitrary input dictionary to the format defined by the INPUT_SCHEME.
The rules are:
if the input dict misses a filed, then fill the filed with None or empty list in the output dict.
if the input dict has a key that is not defined in the INPUT_SCHEME, then remove it in the output dict.
For example, suppose I have a_input, where only 'a1' is correct. 'a2', 'a3', and 'a4' are missing. Each element in 'a5' misses one property. And 'a6' is an un-defined field.
The function I want to write should convert a_input to a_output. And you can use jsonschema.validate to check.
a_input = {
'a1': {'a1_1': 'apple', 'a1_2': 20.5},
'a5': [{'a5_1': 'pear'}, {'a5_2': 18.5}],
'a6': [1, 2, 3, 4],
}
a_output = {
'a1': {'a1_1': 'apple', 'a1_2': 20.5},
'a2': [],
'a3': None,
'a4': {
'a4_1': None,
'a4_2': {
'a4_2_1': None,
'a4_2_2': None,
}
},
'a5': [
{
'a5_1': 'pear',
'a5_2': None,
},
{
'a5_1': None,
'a5_2': 18.5,
}
]
}
jsonschema.validate(a_output, schema=INPUT_SCHEME)
I tried to write the function, but could not make it. Mainly because there are too many if-else check plus the nested structure, and I got lost. Could you please help me?
Thanks.
def my_func(a_from):
a_to = dict()
for key_1 in INPUT_SCHEME['properties'].keys():
if key_1 not in a_from:
a_to[key_1] = None # This is incorrect, since the structure of a_to[key_1] depends on INPUT_SCHEME.
continue
layer_1 = INPUT_SCHEME['properties'][key_1]
if 'properties' in layer_1: # like a1, a4
for key_2 in layer_1['properties'].keys():
layer_2 = layer_1['properties'][key_2]
...
# but it can be a nest of layers. Like a4, there are 3 layers. In real case, it can have more layers.
elif 'items' in layer_1:
if 'properties' in layer_1['items']: # like a5
...
else: # like a2
...
else: # like 3
...
return a_to
A recursive algorithm suits this.
I divided it into 2 different functions as removing undefined properties and filling non-existent ones from the schema are 2 different tasks. You can merge them into one if you wish.
For filling nonexistent properties, I just create arrays, objects and Nones, and then recurse inwards.
For removing the undefined properties, I compare the schema keys and remove unmatched keys, again, recursing inwards.
You may see comments and type checks in code:
def fill_nonexistent_properties(input_dictionary, schema):
"""
Fill missing properties in input_dictionary according to the schema.
"""
properties = schema['properties']
missing_properties = set(properties).difference(input_dictionary)
# Fill all missing properties.
for key in missing_properties:
value = properties[key]
if value['type'] == 'array':
input_dictionary[key] = []
elif value['type'] == 'object':
input_dictionary[key] = {}
else:
input_dictionary[key] = None
# Recurse inside all properties.
for key, value in properties.items():
# If it's an array of objects, recurse inside each item.
if value['type'] == 'array' and value['items']['type'] == 'object':
object_list = input_dictionary[key]
if not isinstance(object_list, list):
raise ValueError(
f"Invalid JSON object: {key} is not a list.")
for item in object_list:
if not isinstance(item, dict):
raise ValueError(
f"Invalid JSON object: {key} is not a list of objects.")
fill_nonexistent_properties(item, value['items'])
# If it's an object, recurse inside it.
elif value['type'] == 'object':
obj = input_dictionary[key]
if not isinstance(obj, dict):
raise ValueError(
f"Invalid JSON object: {key} is not a dictionary.")
fill_nonexistent_properties(obj, value)
def remove_undefined_properties(input_dictionary, schema):
"""
Remove properties in input_dictionary that are not defined in the schema.
"""
properties = schema['properties']
undefined_properties = set(input_dictionary).difference(properties)
# Remove all undefined properties.
for key in undefined_properties:
del input_dictionary[key]
# Recurse inside all existing sproperties.
for key, value in input_dictionary.items():
property_shcema = properties[key]
# If it's an array of objects, recurse inside each item.
if isinstance(value, list):
if not property_shcema['type'] == 'array':
raise ValueError(
f"Invalid JSON object: {key} is not a list.")
# We're only dealing with objects inside arrays.
if not property_shcema['items']['type'] == 'object':
continue
for item in value:
# Make sure each item is an object.
if not isinstance(item, dict):
raise ValueError(
f"Invalid JSON object: {key} is not a list of objects.")
remove_undefined_properties(item, property_shcema['items'])
# If it's an object, recurse inside it.
elif isinstance(value, dict):
# Make sure the object is supposed to be an object.
if not property_shcema['type'] == 'object':
raise ValueError(
f"Invalid JSON object: {key} is not an object.")
remove_undefined_properties(value, property_shcema)
import pprint
pprint.pprint(a_input)
fill_nonexistent_properties(a_input, INPUT_SCHEME)
remove_undefined_properties(a_input, INPUT_SCHEME)
print("-"*10, "OUTPUT", "-"*10)
pprint.pprint(a_input)
Output:
{'a1': {'a1_1': 'apple', 'a1_2': 20.5},
'a5': [{'a5_1': 'pear'}, {'a5_2': 18.5}],
'a6': [1, 2, 3, 4]}
---------- OUTPUT ----------
{'a1': {'a1_1': 'apple', 'a1_2': 20.5},
'a2': [],
'a3': None,
'a4': {'a4_1': None, 'a4_2': {'a4_2_1': None, 'a4_2_2': None}},
'a5': [{'a5_1': 'pear', 'a5_2': None}, {'a5_1': None, 'a5_2': 18.5}]}

How to access data in dictionary within list in python

I am currently working on a python program to query public github API url to get github user email address. The response from the python object is a huge list with a lot of dictionaries.
My code so far
import requests
import json
# username = ''
username = 'FamousBern'
base_url = 'https://api.github.com/users/{}/events/public'
url = base_url.format(username)
try:
res = requests.get(url)
r = json.loads(res.text)
# print(r) # List slicing
print(type(r)) # List that has alot dictionaries
for i in r:
if 'payload' in i:
print(i['payload'][6])
# matches = []
# for match in r:
# if 'author' in match:
# matches.append(match)
# print(matches)
# print(r[18:])
except Exception as e:
print(e)
# data = res.json()
# print(data)
# print(type(data))
# email = data['author']
# print(email)
By manually accessing this url in chrome browser i get the following
[
{
"id": "15069094667",
"type": "PushEvent",
"actor": {
"id": 32365949,
"login": "FamousBern",
"display_login": "FamousBern",
"gravatar_id": "",
"url": "https://api.github.com/users/FamousBern",
"avatar_url": "https://avatars.githubusercontent.com/u/32365949?"
},
"repo": {
"id": 332684394,
"name": "FamousBern/FamousBern",
"url": "https://api.github.com/repos/FamousBern/FamousBern"
},
"payload": {
"push_id": 6475329882,
"size": 1,
"distinct_size": 1,
"ref": "refs/heads/main",
"head": "f9c165226201c19fd6a6acd34f4ecb7a151f74b3",
"before": "8b1a9ac283ba41391fbf1168937e70c2c8590a79",
"commits": [
{
"sha": "f9c165226201c19fd6a6acd34f4ecb7a151f74b3",
"author": {
"email": "bernardberbell#gmail.com",
"name": "FamousBern"
},
"message": "Changed input functionality",
"distinct": true,
"url": "https://api.github.com/repos/FamousBern/FamousBern/commits/f9c165226201c19fd6a6acd34f4ecb7a151f74b3"
}
]
},
The json object is huge as well, i just sliced it. I am interested to get the email address in the author dictionary.
You're attempting to index into a dict() with i['payload'][6] which will raise an error.
My personal preferred way of checking for key membership in nested dicts is using the get method with a default of an empty dict.
import requests
import json
username = 'FamousBern'
base_url = 'https://api.github.com/users/{}/events/public'
url = base_url.format(username)
res = requests.get(url)
r = json.loads(res.text)
# for each dict in the list
for event in r:
# using .get() means you can chain .get()s for nested dicts
# and they won't fail even if the key doesn't exist
commits = event.get('payload', dict()).get('commits', list())
# also using .get() with an empty list default means
# you can always iterate over commits
for commit in commits:
# email = commit.get('author', dict()).get('email', None)
# is also an option if you're not sure if those keys will exist
email = commit['author']['email']
print(email)

python regex usage: how to start with , least match , get content in middle [duplicate]

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

JSON Extract to dataframe using python

I have a JSON file and the structure of the file is as below
[json file with the structure][1]
I am trying to get all the details into dataframe or tabular form, Tried using denormalize and could not get the actual result.
{
"body": [{
"_id": {
"s": 0,
"i": "5ea6c8ee24826b48cc560e1c"
},
"fdfdsfdsf": "V2_1_0",
"dsd": "INDIA-",
"sdsd": "df-as-3e-ds",
"dsd": 123,
"dsds": [{
"dsd": "s_10",
"dsds": [{
"dsdsd": "OFFICIAL",
"dssd": {
"dsds": {
"sdsd": "IND",
"dsads": 0.0
}
},
"sadsad": [{
"fdsd": "ABC",
"dds": {
"dsd": "INR",
"dfdsfd": -1825.717444
},
"dsss": [{
"id": "A:B",
"dsdsd": "A.B"
}
]
}, {
"name": "dssadsa",
"sadds": {
"sdsads": "INR",
"dsadsad": 180.831415
},
"xcs": "L:M",
"sds": "L.M"
}
]
}
]
}
]
}
]
}
This structure is far too nested to put directly into a dataframe. First, you'll need to use the ol' flatten_json function. This function isn't in a library (to my knowledge), but you see it around a lot. Save it somewhere.
def flatten_json(nested_json):
"""
Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
Applying it to your data:
import json
with open('deeply_nested.json', r) as f:
flattened_json = flatten_json(json.load(f))
df = pd.json_normalize(flattened_json)
df.columns
Index(['body_0__id_s', 'body_0__id_i', 'body_0_schemaVersion',
'body_0_snapUUID', 'body_0_jobUUID', 'body_0_riskSourceID',
'body_0_scenarioSets_0_scenario',
'body_0_scenarioSets_0_modelSet_0_modelPolicyLabel',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_unit',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_name',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_id',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_underlyingRef',
'body_0_scenarioSets_0_modelSet_0_measures_1_name',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_1_riskFactors',
'body_0_scenarioSets_0_modelSet_0_measures_1_underlyingRef'],
dtype='object')

How to get data from json format using groovy?

i have a request in soapui which returns a json response.
i'm using groovy to retrieve the content of the response.
response :
<item><response>{
"timestamp": "2016-04-01T16:40:34",
"data": [
{
"deleted_at": null,
"userid": "b6d66002-8da4-4c03-928c-46871f084fb8",
"updated_by": null,
"created_at": "2016-03-01T16:40:34",
"updated_at": "2016-03-01T16:40:34",
"created_by": null,
"value": "hBeO",
"setting": "test",
"name": "test2"
}
],
"success": true
}</response></item>
From this response i want to retrieve each node like:
deleted_at
created_at
so i use this groovy
import groovy.json.JsonSlurper
def response = context.expand( '${set_settings#Response#declare namespace ns1=\'https://wato.io/ns/20160131\'; //ns1:set_settings_resp[1]/ns1:item[1]/ns1:response[1]}' )
def slurper = new JsonSlurper()
def result = slurper.parseText(response)
testRunner.testCase.setPropertyValue("user_id", result.data.userid)
and i receive this error message:
groovy.lang.MissingMethodException: No signature of method: com.eviware.soapui.impl.wsdl.WsdlTestCasePro.setPropertyValue() is applicable for argument types: (java.lang.String, java.util.ArrayList) values: [userid, [b6df6662-8da4-4c03-928c-46871f084fb8]] Possible solutions: setPropertyValue(java.lang.String, java.lang.String), getPropertyValue(java.lang.String) error at line: 8
It works only for timestamp node.
any help please.
Thank you
It's because result.data is a list, so it returns a list (containing one item) for userid
You need to just get the first item from the list, so try:
testRunner.testCase.setPropertyValue("user_id", result.data.userid.head())

Resources