Error While Creating Data frame for Dictionary Variable - python-3.x

working with USDA for product Databse , the Database which in JSON format .However, i have cracked it using json package
USDA food Data json form Cracked into DataFrame:
But, few of the variables in the Data is on Dictionary Form
Am trying to create a DataFrame for variable ' Nutrients' .But getting the below Error
Error while Creating the Database:
Please help me in getting rid of error, below mentioned is the code
nutrients[]
for rec in db:
fnuts = DataFrame(rec['nutrients'])
fnuts['id'] = rec['id']
nutrients.append(fnuts)`

import pandas as pd
nutrients = pd.DataFrame(db1['nutrients'])

Related

Python: get data from a query MS Access with pandas and sqlalchemy

Hello, I need your help for this error
I don’t have experience In Python, I am new on it
I need to export a query from MS Access to excel
I create the connection with my database and the connection seems OK
However, when I call the dataframe from Pandas I have this message:
Empty DataFrame
Columns: [Cofor6, TypePce, Ste, CompteDeNfact]
Index: []
This is my code:
connection_string = (
r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
r"DBQ=C:\xxx\xx\Python_Projects\xxx\xx\xx\mydb.mdb;"
r"ExtendedAnsiSQL=1;")`
connection_url = sa.engine.URL.create("access+pyodbc",query={"odbc_connect": connection_string})
engine = sa.create_engine(connection_url)
print ('connected')
RAFAELquery= '''SELECT SAP_CONSO.Cofor6,SAP_CONSO.TypePce,SAP_CONSO.Ste,Count(SAP_CONSO.Nfact) AS CompteDeNfact FROM SAP_CONSO GROUP BY SAP_CONSO.Cofor6,SAP_CONSO.TypePce,SAP_CONSO.Ste,SAP_CONSO.SaiParFlux HAVING (((SAP_CONSO.TypePce) Not Like 'y\*') AND ((SAP_CONSO.SaiParFlux) Like '\*E')) ORDER BY SAP_CONSO.TypePce DESC;'''
df=pd.read_sql_query(RAFAELquery,engine)
print(df)
Anyone can help me? Thanks a lot

How to get values based on 2 user inputs in Python

As per below data, using Python how can I get Headers column value for the corresponding given input from DB & Table column.
DB Table Headers
Oracle Cust Id,Name,Mail,Phone,City,County
Oracle Cli Cid,shopNo,State
Oracle Addr Street,Area,City,Country
SqlSer Usr Name,Id,Addr
SqlSer Log LogId,Env,Stg
MySql Loc Flat,Add,Pin,Country
MySql Data Id,Txt,TaskId,No
Output: Suppose if i pass, Oracle & Cli as parameters, then it should return the value as "Cid,shopNo,State" in a list.
Trying with python dictionary, but it takes 2 values key and value. But i have 3 values. how to get ?
Looks like your data is in some sort of tabular format. In that case I would recommend using the pandas package, which is very convenient if you are working with tabular data.
pandas can read data into a DataFrame from a CSV file using pandas.read_csv. This dataframe you can then filter using the column names and the required values.
In the example below I assume that your data is tab (\t) separated. I read in the data from a string using io.StringIO. Normally you would just use pandas.read_csv('filename.csv').
import pandas as pd
import io
data = """DB\tTable\tHeaders
Oracle\tCust\tId,Name,Mail,Phone,City,County
Oracle\tCli\tCid,shopNo,State
Oracle\tAddr\tStreet,Area,City,Country
SqlSer\tUsr\tName,Id,Addr
SqlSer\tLog\tLogId,Env,Stg
MySql\tLoc\tFlat,Add,Pin,Country
MySql\tData\tId,Txt,TaskId,No"""
dataframe = pd.read_csv(io.StringIO(data), sep='\t')
db_is_oracle = dataframe['DB'] == 'Oracle'
table_is_cli = dataframe['Table'] == 'Cli'
filtered_dataframe = dataframe[db_is_oracle & table_is_cli]
print(filtered_dataframe)
This will result in :
DB Table Headers
1 Oracle Cli Cid,shopNo,State
Or to get the actual headers of the first match:
print(filtered_dataframe['Headers'].iloc[0])
>>> Cid,shopNo,State

Is there any way to make relationship of neo4j in Pandas dataframe?

I have created node with using py2neo package
from py2neo import Graph
from py2neo import Node
This is my pandas dataframe
I can create successfully node.
I have been trying to working with relationship getting error!
graph.create(Relationship(pmid, "Having_author", auth))
TypeError: Values of type <class 'pandas.core.series.Series'> are not supported
I have also refer stack overflow question but still getting Error!
Here is the link
Is there any other way to create a relationship with pandas dataframe ?
Your code is failing because the ID for a node must be a literal (integer or string), but you set the ID as a Series when you wrote ...id = data['PMID']) . It appears py2neo allowed you to create the node object with a faulty ID, but it really shouldn't, because all relationships with that node will fail since the ID is bad.
Recreate the Node Classes with an integer for the ID, and then the Relationship between them should be created without issues.
Note, I haven't tested this code, but this is how you would loop through a df and create nodes as you go.
for i, row in data.iterrows():
pmid = row['PMID'] #this is the integer PMID based on your df
pmi_node = Node("PMID row " + str(i), id=pmid) #create node
authid = row['AU'] #this is a string author name based on your df
auth_node = Node("Auth row " + str(i), id=authid) #create node
graph.create(pmi_node | auth_node)
#create a relationship between the PMI and Auth for that row
graph.create(Relationship(pmi_node , "Having_author", auth_node))
PS -- The SO link you referenced is not using the py2neo package, but is instead simply sending cypher code strings to the database using the neo4j python package. I'd recommend this route if you are a beginner.
I have convert series object in zip then it works for me
for pmid, au in tqdm(zip(data.PMID, data.AU),total = data.shape[0]):
pmid_node = Node("pmid", name=int(pmid))
au_node = Node("author", name=au)
graph.create(Relationship(pmid_node, "HAVING_AUTHOR", au_node))

How to solve InvalidRequestError encountered during execution of pandas.to_sql() using sqlalchemy connection?

I am trying to replace an existing table in MySQL database. I used the below piece of code to convert the data frame called frame to a database table:
import pandas as pd
import sqlalchemy
from sqlalchemy.types import VARCHAR
database_username = 'root'
database_password = '1234'
database_ip = 'localhost'
database_name = 'my_new_database'
database_connection = sqlalchemy.create_engine('mysql+mysqlconnector://{0}:{1}#{2}/{3}'.format(database_username, database_password, database_ip, database_name),pool_size=3,pool_recycle=3600)
frame.to_sql(schema=database_name,con=database_connection,name='table1',if_exists='replace',chunksize=1000,dtype={'Enrollment No': VARCHAR(frame.index.get_level_values('Enrollment No').str.len().max())})
table1 gets created successfully. But when I rerun the last line of the above code i.e. frame.to_sql(), it throws the below error:
InvalidRequestError: Could not reflect: requested table(s) not available in Engine(mysql+mysqlconnector://root:***#localhost/my_new_database) schema 'my_new_database': (table1)
I want to know why this error is thrown when the table already exists, even though I've used if_exists='replace' and why it works correctly only when creating the table for the first time. What must be done to avoid getting this error?
N.B.: Answers to similar questions only suggest using the table name in lowercase, which I'm following by naming the table as 'table1'.

Extracting data from a complex MongoDB database using PyMongo and converting it to a .csv file

I have a complex MongoDB database, consisting of documents nested upto 7 levels deep. I need to use PyMongo to extract the data, and then convert the extracted data to a .csv file.
You can try using json_normalize.
It is used to flatten the json.Reads data to a dataframe which can be stored in csv later.
For eg:
from pandas.io.json import json_normalize
# mongo_value is your mongo query
mongo_aggregate = db.events.aggregate(mongo_value)
mongo_df = json_normalize(list(mongo_aggregate))
# print(mongo_df)
mongo_columns = list(mongo_df.columns.values)
#just picks the column_name instead of properties.something.something.column_name
for w in range(len(mongo_columns)):
mongo_columns[w] = mongo_columns[w].split('.')[-1].lower()
mongo_df.columns = mongo_columns
For reference read this https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html

Resources