Insert muliple dict to mysqldb using mysql - python-3.x

I have been struggling with this for hours and I feel like crying now as I'm unable to fathom out what is happening.
Here is a simplified version of my data:
{
"first_name": {
"0": "OBELISK",
"1": "RA" }, "golongan": {
"0": 88,
"1": 99 }, "last_name": {
"0": "GOD",
"1": "GOD" }, "nik": {
"0": 666679,
"1": 666678 }, "status_aktif": {
"0": 1,
"1": 1 }, "tgl_kerja": {
"0": "Sat, 20 Nov 2021 16:28:00 GMT",
"1": "Thu, 25 Nov 2021 16:28:00 GMT" } }
This is the code I have :
app.route('/karyawan-import-excel')
def import_excel():
# df = pd.read_csv(r'C:\xampp\htdocs\python\coba_read_write_excel\testing.csv')
df = pd.read_excel(r'C:\xampp\htdocs\python\coba_read_write_excel\testing.xlsx')
data = df.to_dict()
cur = mysql.connection.cursor(curMysql)
sql = "INSERT INTO zzz_customers (name, address) VALUES (%s, %s)"
val = [
(data.get('nik'), data.get('first_name'))
# ('Peter', 'Lowstreet 4'),
# ('Amy', 'Apple st 652')
]
cur.executemany(sql, val )
mysql.connection.commit()

The data structure you have created by doing data = df.to_dict() is a nested dictionary. There's not a straightforward way that I know of to get a data structure like that into a MySQL table.
Instead you can make this small change to that line, as shown below, which will give you a list of dicts -- an easier data structure to work with which you can then insert into a MySQL table.
data = df.to_dict('records') # produces list of dicts
Then you can "get" -- as you were trying to do -- just the values you want into a list of tuples:
list_of_tuples = [
(d['nik'], d['first_name']) for d in data
]
cur.executemany(sql, list_of_tuples)
# ...
This is what list_of_tuples should look like:
In [11]: list_of_tuples
Out[11]: [(666679, 'OBELISK'), (666678, 'RA')]

Related

find and replace element in file using python 3.7

I am learning python and in the code below I am trying to replace all values where "id": null, with "id": na,...
there may not always be an id in every block. If c does not have that pattern in it then it is skipped. Thank you :)
file
{
"objects": [
{
"version": "1",
"id": null,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
desired
{
"objects": [
{
"version": "1",
"id": na,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
python3
import json
with open('file.json',encoding='utf8') as in_file:
data = json.load(in_file)
for c in data['objects']: # read all elements in object into c
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
You want to use None in python for a NoneType not null i.e. if not obj['id']: but None is falsy so you can just use if not
import json
with open('file.json', encoding='utf8') as in_file:
data = json.load(in_file)
for obj in data['objects']:
if not obj['id']:
obj['id'] = 'na'
print(data)
"id": an is invalid as Json and in a Python dictionary I think you want to either initialise an as a variable or use a string "id": "an"
Or more concisely use comprehensions:
import json
with open('file.json', encoding='utf8') as in_file:
print(json.dumps({'objects': [{k: 'na' if k == 'id' and not v else v for k, v in entry.items()} for entry in json.load(in_file)['objects']]}))
As #Mohamed Fathallah suggests use json.dump() to write the data to a file or json.dumps() to display it as a json formatted string.
The Error is in following code
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
As #Dan-Dev mentioned, the null in c#, c++, etc. is None in Python. You can use this code and read my code as side information to understand more.
import json
with open('file.json', 'r') as file:
data = json.load(file)
for obj in data['objects']:
if obj.get('id') is None:
obj['id'] = 'na'
# if you want to write back
with open('data.json', 'w') as file:
json.dump(data, file, indent=2)

Is there a shorter way of exporting python nested dictionary to CSV?

I have a nested dictionary which comprises of multiple lists and dictionaries. The "Stations" key contains contains the values which I want to convert to CSV file. I am only after the certain values. A snippet of the dictionary is as below:
data = { "brands": {...},
"fueltypes": {...},
"stations": {"items": [
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2126",
"name": "Cumnock General Store",
"address": "31 Obley St, CUMNOCK NSW 2867",
"location": {
"latitude": -32.928744,
"longitude": 148.755153
},
"state": "NSW"
},
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2200",
"name": "Tea Tree Cafe",
"address": "160 Mount Darragh Rd, SOUTH PAMBULA NSW 2549",
"location": {
"latitude": -36.944277,
"longitude": 149.845399
},
"state": "NSW"
}....]}}
In order to obtain certain values in "Stations" key, I created blank lists for each of those values and appended accordingly. After that I used the ZIP function to combine the list and converted to a CSV. The Code that I have used is as below:
Station_Code = []
Station_Name = []
Latitude = []
Longitude = []
Address = []
Brand = []
for k,v in data["stations"].items():
for item in range(len(v)):
Station_Code.append(v[item]["code"])
Station_Name.append(v[item]["name"])
Latitude.append(v[item]["location"]["latitude"])
Longitude.append(v[item]["location"]["longitude"])
Address.append(v[item]["address"])
Brand.append(v[item]["brand"])
#print(f'{v[item]["code"]} - {v[item]["name"]} - {v[item]["location"]["latitude"]}')
rows = zip(Station_Code, Station_Name, Latitude, Longitude, Address, Brand )
with open("Exported_File.csv", "w") as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
Is there any other alternate/short ways of extracting this information?
If you're using pandas, there's a fairly easy way to do this.
import pandas as pd
# Convert dict to a pandas DataFrame
df = pd.DataFrame(data["stations"]["items"])
# 'location' is a dict, so we need to extract the 'latitude' and 'longitude'.
df['latitude'] = df['location'].apply(lambda x: x['latitude'])
df['longitude'] = df['location'].apply(lambda x: x['longitude'])
# Select subset of columns for final csv
df = df[['code', 'name', 'latitude', 'longitude', 'address', 'brand']]
df.to_csv('exported-file.csv', index=False, header=False)

Convert CSV to dictionary without using libraries

I have this CSV:
color,property,type,id
red,house,building,02
I'm trying to convert a csv to dictionary with the following structure:
{
"0": {"val1": 1, "val2": 2, "val3": 3, ..., "valn": n},
"1": {"val1": 45, "val2": 7, "val3": None, ..., "valn": 68},
}
Where as val1, val2 and so on are the header names of the columns and "0" and "1" are the number of rows.
So we should have:
CSV content is like this:
color,property,type,id
red,house,building,02
blue,department,flat,04
{
"0": {"color": "red", "property": "house", "type": "building", ..., "valn": n},
"1": {"color": "blue", "property": "farm", "type": "area", ..., "valn": n},
}
How can I achieve this result without using any library? I'd like to implement it from the scratch and don't use CSV library or the like.
Thank you.
Try this approach:
inp = """color,property,type,id
red,house,building,02
blue,department,flat,04
cyan,,flat,10
"""
lines = inp.split('\n')
colnames = list(map(lambda x: x.strip(), lines[0].split(',')))
lines = lines[1:]
res = {}
for i, line in enumerate(lines[:-1]):
res[i] = {
colname: val if val != '' else None
for colname, val in zip(colnames, map(lambda x: x.strip(), line.split(',')))
}
print(res)
However for additional features like type deduction code will be more complex: you can follow answers to this question

Convert list of Pandas Dataframe JSON objects

I have a Dataframe with one column where each cell in the column is a JSON object.
players
0 {"name": "tony", "age": 57}
1 {"name": "peter", age": 46}
I want to convert this to a data frame as:
name age
tony 57
peter 46
Any ideas how I do this?
Note: the original JSON object looks like this...
{
"players": [{
"age": 57,
"name":"tony"
},
{
"age": 46,
"name":"peter"
}]
}
Use DataFrame constructor if types of values are dicts:
#print (type(df.loc[0, 'players']))
#<class 'str'>
#import ast
#df['players'] = df['players'].apply(ast.literal_eval)
print (type(df.loc[0, 'players']))
<class 'dict'>
df = pd.DataFrame(df['players'].values.tolist())
print (df)
age name
0 57 tony
1 46 peter
But better is use json_normalize from jsons object as suggested #jpp:
json = {
"players": [{
"age": 57,
"name":"tony"
},
{
"age": 46,
"name":"peter"
}]
}
df = json_normalize(json, 'players')
print (df)
age name
0 57 tony
1 46 peter
This can do it:
df = df['players'].apply(pd.Series)
However, it's slow:
In [20]: timeit df.players.apply(pd.Series)
1000 loops, best of 3: 824 us per loop
#jezrael suggestion is faster:
In [24]: timeit pd.DataFrame(df.players.values.tolist())
1000 loops, best of 3: 387 us per loop

Flatting Pandas JSON Dataframe for a specific path

I have the following JSON
ds = [{
"name": "groupA",
"subGroups": [{
"subGroup": 1,
"categories": [{
"category1": {
"value": 10
}
},
{
"category2": {}
},
{
"category3": {}
}
]
}]
},
{
"name": "groupB",
"subGroups": [{
"subGroup": 1,
"categories": [{
"category1": {
"value": 500
}
},
{
"category2": {}
},
{
"category3": {}
}
]
}]
}]
I can get a dataframe for all the categories by doing:
json_normalize(ds, record_path=["subGroups", "categories"], meta=['name', ['subGroups', 'subGroup']], record_prefix='cat.')
This will give me:
cat.category1 cat.category2 cat.category3 subGroups.subGroup name
0 {'value': 10} NaN NaN 1 groupA
1 NaN {} NaN 1 groupA
2 NaN NaN {} 1 groupA
3 {'value': 500} NaN NaN 1 groupB
4 NaN {} NaN 1 groupB
5 NaN NaN {} 1 groupB
But, I don't care about category 2 and category 3 at all. I only care about the category 1.
So'd I prefer something like:
cat.category1 subGroups.subGroup name
0 {'value': 10} 1 groupA
1 {'value': 500} 1 groupB
Any ideas how I get to this?
And even better, I really want the value of value in category1. So something like:
cat.category1.value subGroups.subGroup name
0 10 1 groupA
1 500 1 groupB
Any ideas?
The problem is that category1 is not considered a record by json_normalize. An informal definition of record is a key in a dictionary that maps to an list of dicts. You can't access category1 (and therefore value) through record_path argument because it doesn't map to an list of dicts.
This is the best solution I could find:
import pandas as pd
df = pd.io.json.json_normalize(ds,
record_path=['subGroups', 'categories'],
errors='ignore',
meta=['name',
['subGroups', 'subGroup'],
],
record_prefix='cat.')
df = df.drop(['cat.category2', 'cat.category3'], axis=1)
for i in range(df.shape[0]):
row = df.at[i, 'cat.category1']
if isinstance(row, dict) and 'value' in row:
df.at[i, 'cat.category1'] = row['value']
else:
df.at[i, 'cat.category1'] = np.nan
# EDIT: if you want to remove rows for which cat.category1 column has NAN values
df = df[pd.notnull(df['cat.category1'])]
Output of df is the desired form of the dataframe.
On the other hand, if your JSON structure looked like this (notice the list brackets around the value dict):
ds = [{
"name": "groupA",
"subGroups": [{
"subGroup": 1,
"categories": [{
"category1": [{
"value": 10
}]
}]
}]
},
{
"name": "groupB",
"subGroups": [{
"subGroup": 1,
"categories": [{
"category1": [{
"value": 500
}]
}]
}]
}]
You would be able to use json_normalize like this:
df = pd.io.json.json_normalize(ds,
record_path=['subGroups', 'categories', 'category1'],
errors='ignore',
meta=['name',
['subGroups', 'subGroup'],
],
record_prefix='cat.')
And you would get this:
cat.value name subGroups.subGroup
10 groupA 1
500 groupB 1
Try using YAML for this purpose it has yaml dump to write output in a human readable format and other functions to rewrite the output in json.
Check the basic video tutorial here :
https://www.youtube.com/watch?v=hSuHnuNC8L4

Resources