Need help to format a python dictionary string - string

I am unable to convert a file that I downloaded to a dictionary object so that I can access each element. I think the quotations are missing for the keys which prevent me from using json_loads() etc. Could you please help me with some solution. I have given the results of the download below. I need to format it.
{
success: true,
results: 2,
rows: [{
Symbol: "LITL",
CompanyName: "LancoInfratechLimited",
ISIN: "INE785C01048",
Ind: "-",
Purpose: "Results",
BoardMeetingDate: "26-Sep-2017",
DisplayDate: "19-Sep-2017",
seqId: "102121067",
Details: "toconsiderandapprovetheUn-AuditedFinancialResultsoftheCompanyonstandalonebasisfortheQuarterendedJune30,2017."
}, {
Symbol: "PETRONENGG",
CompanyName: "PetronEngineeringConstructionLimited",
ISIN: "INE742A01019",
Ind: "-",
Purpose: "Results",
BoardMeetingDate: "28-Sep-2017",
DisplayDate: "21-Sep-2017",
seqId: "102128225",
Details: "Toconsiderandapprove,interalia,theUnauditedFinancialResultsoftheCompanyforthequarterendedonJune30,2017."
}]
}

Here is one way to do it if you have a string of the dict. It is a little hacky but should work well.
import json
import re
regex_string = '(\w{1,}(?=:))'
regex = re.compile(regex_string, re.MULTILINE)
string = open('test_string', 'r').read() # I had the string in a file, but how
# just put the value here based on how you already had it stored.
string = regex.sub(r'"\1"', string)
python_object = json.loads(string)
# Now you can access the python_object just like any normal python dict.
print python_object["results"]
Here is the dict after it has been put through the regex. Now you can read it in with json
{
"success": true,
"results": 2,
"rows": [{
"Symbol": "LITL",
"CompanyName": "LancoInfratechLimited",
"ISIN": "INE785C01048",
"Ind": "-",
"Purpose": "Results",
"BoardMeetingDate": "26-Sep-2017",
"DisplayDate": "19-Sep-2017",
"seqId": "102121067",
"Details": "toconsiderandapprovetheUn-AuditedFinancialResultsoftheCompanyonstandalonebasisfortheQuarterendedJune30,2017."
}, {
"Symbol": "PETRONENGG",
"CompanyName": "PetronEngineeringConstructionLimited",
"ISIN": "INE742A01019",
"Ind": "-",
"Purpose": "Results",
"BoardMeetingDate": "28-Sep-2017",
"DisplayDate": "21-Sep-2017",
"seqId": "102128225",
"Details": "Toconsiderandapprove,interalia,theUnauditedFinancialResultsoftheCompanyforthequarterendedonJune30,2017."
}]
}

Related

python string(js variable type) to dictionary

How do I convert this string to a dictionary
I'm trying regex r"},\s\]", but didn't get the desired result.
HTTP request body
Content-Type: text/html
var sample = {
"date": "2022. 03. 23. 16:18",
"list":
[
{
"name": "USD",
"var1":"1236.26",
"var2":"1193.74",
"var3":"1226.90",
"var4":"1203.10",
"var5":"1215.00"
},
{
"name": "JPY",
"var1":"1020.81",
"var2":"985.71",
"var3":"1013.09",
"var4":"993.43",
"var5":"1003.26"
},
]
}
First we need to get rid of any trailing commas:
_sample = sample.replace("\n", "").replace(" ", "").replace(",]", "]")
You can also use regex for this:
import re
_sample = re.sub(r",(\s)+]", "]", sample)
Then use json.loads() to parse it into a dict.
import json
outcome = json.loads(_sample)
Beyond the OQ
If it is also possible that we'd have trailing commas after the last key-value of a dictionary: e.g. {"var5": "1215.00",} you can update the above code to:
_sample = sample.replace("\n", "").replace(" ", "").replace(",}", "}").replace(",]", "]")
Or with regex:
regex = r"(,(\s)+]|,(\s)+})"
import json
sample = """{
"date": "2022. 03. 23. 16:18",
"list":
[
{
"name": "USD",
"var1": "1236.26",
"var2": "1193.74",
"var3": "1226.90",
"var4": "1203.10",
"var5": "1215.00"
},
{
"name": "JPY",
"var1": "1020.81",
"var2": "985.71",
"var3": "1013.09",
"var4": "993.43",
"var5": "1003.26"
},
]
}"""
to_dict = json.loads(sample)

python regex usage: how to start with , least match , get content in middle [duplicate]

I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'

JSON Extract to dataframe using python

I have a JSON file and the structure of the file is as below
[json file with the structure][1]
I am trying to get all the details into dataframe or tabular form, Tried using denormalize and could not get the actual result.
{
"body": [{
"_id": {
"s": 0,
"i": "5ea6c8ee24826b48cc560e1c"
},
"fdfdsfdsf": "V2_1_0",
"dsd": "INDIA-",
"sdsd": "df-as-3e-ds",
"dsd": 123,
"dsds": [{
"dsd": "s_10",
"dsds": [{
"dsdsd": "OFFICIAL",
"dssd": {
"dsds": {
"sdsd": "IND",
"dsads": 0.0
}
},
"sadsad": [{
"fdsd": "ABC",
"dds": {
"dsd": "INR",
"dfdsfd": -1825.717444
},
"dsss": [{
"id": "A:B",
"dsdsd": "A.B"
}
]
}, {
"name": "dssadsa",
"sadds": {
"sdsads": "INR",
"dsadsad": 180.831415
},
"xcs": "L:M",
"sds": "L.M"
}
]
}
]
}
]
}
]
}
This structure is far too nested to put directly into a dataframe. First, you'll need to use the ol' flatten_json function. This function isn't in a library (to my knowledge), but you see it around a lot. Save it somewhere.
def flatten_json(nested_json):
"""
Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
Applying it to your data:
import json
with open('deeply_nested.json', r) as f:
flattened_json = flatten_json(json.load(f))
df = pd.json_normalize(flattened_json)
df.columns
Index(['body_0__id_s', 'body_0__id_i', 'body_0_schemaVersion',
'body_0_snapUUID', 'body_0_jobUUID', 'body_0_riskSourceID',
'body_0_scenarioSets_0_scenario',
'body_0_scenarioSets_0_modelSet_0_modelPolicyLabel',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_unit',
'body_0_scenarioSets_0_modelSet_0_valuation_pv_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_name',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_0_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_id',
'body_0_scenarioSets_0_modelSet_0_measures_0_riskFactors_0_underlyingRef',
'body_0_scenarioSets_0_modelSet_0_measures_1_name',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_unit',
'body_0_scenarioSets_0_modelSet_0_measures_1_value_value',
'body_0_scenarioSets_0_modelSet_0_measures_1_riskFactors',
'body_0_scenarioSets_0_modelSet_0_measures_1_underlyingRef'],
dtype='object')

Access a dictionary value based on the list of keys

I have a nested dictionary with keys and values as shown below.
j = {
"app": {
"id": 0,
"status": "valid",
"Garden": {
"Flowers":
{
"id": "1",
"state": "fresh"
},
"Soil":
{
"id": "2",
"state": "stale"
}
},
"BackYard":
{
"Grass":
{
"id": "3",
"state": "dry"
},
"Soil":
{
"id": "4",
"state": "stale"
}
}
}
}
Currently, I have a python method which returns me the route based on keys to get to a 'value'. For example, if I want to access the "1" value, the python method will return me a list of string with the route of the keys to get to "1". Thus it would return me, ["app","Garden", "Flowers"]
I am designing a service using flask and I want to be able to return a json output such as the following based on the route of the keys. Thus, I would return an output such as below.
{
"id": "1",
"state": "fresh"
}
The Problem:
I am unsure on how to output the result as shown above as I will need to parse the dictionary "j" in order to build it?
I tried something as the following.
def build_dictionary(key_chain):
d_temp = list(d.keys())[0]
...unsure on how to
#Here key_chain contains the ["app","Garden", "Flowers"] sent to from the method which parses the dictionary to store the key route to the value, in this case "1".
Can someone please help me to build the dictionary which I would send to the jsonify method. Any help would be appreciated.
Hope this is what you are asking:
def build_dictionary(key_chain, j):
for k in key_chain:
j = j.get(k)
return j
kchain = ["app","Garden", "Flowers"]
>>> build_dictionary(kchain, j)
{'id': '1', 'state': 'fresh'}

Creating Nested JSON from Dataframe

I have a dataframe and have to convert it into nested JSON.
countryname name text score
UK ABC Hello 5
Right now, I have some code that generates JSON, grouping countryname and name.
However, I want to firstly group by countryname and then group by name. Below is the code and output:
cols = test.columns.difference(['countryname','name'])
j = (test.groupby(['countryname','name'])[cols]
.apply(lambda x: x.to_dict('r'))
.reset_index(name='results')
.to_json(orient='records'))
test_json = json.dumps(json.loads(j), indent=4)
Output:
[
{
"countryname":"UK"
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
]
However, I am expecting an output like this:
[
{
"countryname":"UK"
{
"name":"ABC"
"results":[
{
"text":"Hello"
"score":"5"
}
]
}
}
]
Can anyone please help in fixing this?
This would be the valid JSON. Note the comma , usage, is required as you may check here.
[
{
"countryname":"UK",
"name":"ABC",
"results":[
{
"text":"Hello",
"score":"5"
}
]
}
]
The other output you try to achieve is also not according to the standard:
[{
"countryname": "UK",
"you need a name in here": {
"name": "ABC",
"results": [{
"text": "Hello",
"score": "5"
}]
}
}]
I improved that so you can figure out what name to use.
For custom JSON output you will need to use custom function to reformat your object first.
l=df.to_dict('records')[0] #to get the list
print(l, type(l)) #{'countryname': 'UK', 'name': 'ABC', 'text': 'Hello', 'score': 5} <class 'dict'>
e = l['countryname']
print(e) # UK
o=[{
"countryname": l['countryname'],
"you need a name in here": {
"name": l['name'],
"results": [{
"text": l['text'],
"score": l['score']
}]
}
}]
print(o) #[{'countryname': 'UK', 'you need a name in here': {'name': 'ABC', 'results': [{'text': 'Hello', 'score': 5}]}}]

Resources