I have a requirement to compare db migrated data to s3 created csv file for same table using python script with pandas library.
While doing this,I am facing dtype issue as data type has changed when it moves to csv file. for exmaple: table created dataframe has dtype as object however csv file has dtype as float.
and while doing df1table.equals(df2csv) ,getting result as false.
Even ,I tried to change the dtype of table data frame got error saying can't change string to float. Also facing issue with Null values of the table data frame compare to csv data frame.
I need a generic solution which work for all table and respective csv file.
Any better way to compare them. For ex: change both data frame into same type and compare.
looking for your reply.Thanks!
To prevent Pandas inferring the data type, you can use dtype=object as parameter of pd.read_csv:
df2csv = pd.read_csv('file.csv', dtype=object, # other params)
Example:
df1 = pd.read_csv('data.csv')
df2 = pd.read_csv('data.csv', dtype=object)
# df1
A B C
0 0.988888 1.789871 12.7
# df2
A B C
0 0.988887546565 1.789871131 12.7
CSV file:
A,B,C
0.988887546565,1.789871131,12.7
I am building an API to save CSVs from Sharepoint Rest API using python 3. I am using a public dataset as an example. The original csv has 3 columns Group,Team,FIFA Ranking with corresponding data in the rows.For reference. the original csv on sharepoint ui looks like this:
after using data=response.content the output of data is:
b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\nA,Netherlands,8\r\nB,England,5\r\nB,Iran,20\r\nB,United States,16\r\nB,Wales,19\r\nC,Argentina,3\r\nC,Saudi Arabia,51\r\nC,Mexico,13\r\nC,Poland,26\r\nD,France,4\r\nD,Australia,38\r\nD,Denmark,10\r\nD,Tunisia,30\r\nE,Spain,7\r\nE,Costa Rica,31\r\nE,Germany,11\r\nE,Japan,24\r\nF,Belgium,2\r\nF,Canada,41\r\nF,Morocco,22\r\nF,Croatia,12\r\nG,Brazil,1\r\nG,Serbia,21\r\nG,Switzerland,15\r\nG,Cameroon,43\r\nH,Portugal,9\r\nH,Ghana,61\r\nH,Uruguay,14\r\nH,South Korea,28\r\n'
how do I convert the above to csv that pandas can manipulate with the columns being Group,Team,FIFA and then the corresponding data dynamically so this method works for any csv.
I tried:
data=response.content.decode('utf-8', 'ignore').split(',')
however, when I convert the data variable to a dataframe then export the csv the csv just returns all the values in one column.
I tried:
data=response.content.decode('utf-8') or data=response.content.decode('utf-8', 'ignore') without the split
however, pandas does not take this in as a valid df and returns invalid use of dataframe constructor
I tried:
data=json.loads(response.content)
however, the format itself is invalid json format as you will get the error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Given:
data = b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\n' #...
If you just want a CSV version of your data you can simply do:
with open("foo.csv", "wt", encoding="utf-8", newline="") as file_out:
file_out.writelines(data.decode())
If your objective is to load this data into a pandas dataframe and the CSV is not actually important, you can:
import io
import pandas
foo = pandas.read_csv(io.StringIO(data.decode()))
print(foo)
I am trying to read a table from a HANA database in Python using SQLAlchemy library. Typically, I would use the Pandas package and use the pd.read_sql() method for this operation. However, for some reason, the environment I am using does not support the Pandas package. Therefore, I need to read the table without the Pandas library. So far, the following is what I have been able to do:
query = ('''SELECT * FROM "<schema_name>"."<table_name>"'''
''' WHERE <conditional_clauses>'''
)
with engine.connect() as con:
table = con.execute(query)
row = table.fetchone()
However, while this technique allows me to read table row by row, I am do not get the column names of the table.
How can I fix this?
Thanks
I am do not get the column names of the table
You won't get the column names of the table but you can get the column names (or aliases) of the result set:
with engine.begin() as conn:
row = conn.execute(sa.text("SELECT 1 AS foo, 2 AS bar")).fetchone()
print(row.items()) # [('foo', 1), ('bar', 2)]
#
# or, for just the column names
#
print(row.keys()) # ['foo', 'bar']
Question 1: The file phone.txt stores the lines in the format code:number
import pandas as pd
import sqlite3
con = sqlite3.connect('database.db')
data = pd.read_csv('phone.txt', sep='\t', header=None)
data.to_sql('post_table', con, if_exists='replace', index=False)
I want to load all the data from the phone.txt file into the database.db database. But I have everything loaded in one column. And I need to load in two columns:
code
number
How to do it?
Question 2: after downloading the information to the database, how can I find the number by code? For example, if I want to find out what number code = 7 (answer: 9062621390).
Question 1
In your example pandas is not able to distinguish between the code and the number since your file is :-separated. When reading your file you need to change the separator to : and also specify columns since your csv doesn't seem to have a header like so
data = pd.read_csv('phone.txt',
sep=':',
names=['code', 'number'])
Question 2
After putting your data to the database you can query it as follows
number = pd.read_sql_query('SELECT number FROM post_table WHERE code = (?)',
con,
params=(code,))
where con is your sqlite connection.
I have a complex MongoDB database, consisting of documents nested upto 7 levels deep. I need to use PyMongo to extract the data, and then convert the extracted data to a .csv file.
You can try using json_normalize.
It is used to flatten the json.Reads data to a dataframe which can be stored in csv later.
For eg:
from pandas.io.json import json_normalize
# mongo_value is your mongo query
mongo_aggregate = db.events.aggregate(mongo_value)
mongo_df = json_normalize(list(mongo_aggregate))
# print(mongo_df)
mongo_columns = list(mongo_df.columns.values)
#just picks the column_name instead of properties.something.something.column_name
for w in range(len(mongo_columns)):
mongo_columns[w] = mongo_columns[w].split('.')[-1].lower()
mongo_df.columns = mongo_columns
For reference read this https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html