How to get values based on 2 user inputs in Python - python-3.x

As per below data, using Python how can I get Headers column value for the corresponding given input from DB & Table column.
DB Table Headers
Oracle Cust Id,Name,Mail,Phone,City,County
Oracle Cli Cid,shopNo,State
Oracle Addr Street,Area,City,Country
SqlSer Usr Name,Id,Addr
SqlSer Log LogId,Env,Stg
MySql Loc Flat,Add,Pin,Country
MySql Data Id,Txt,TaskId,No
Output: Suppose if i pass, Oracle & Cli as parameters, then it should return the value as "Cid,shopNo,State" in a list.
Trying with python dictionary, but it takes 2 values key and value. But i have 3 values. how to get ?

Looks like your data is in some sort of tabular format. In that case I would recommend using the pandas package, which is very convenient if you are working with tabular data.
pandas can read data into a DataFrame from a CSV file using pandas.read_csv. This dataframe you can then filter using the column names and the required values.
In the example below I assume that your data is tab (\t) separated. I read in the data from a string using io.StringIO. Normally you would just use pandas.read_csv('filename.csv').
import pandas as pd
import io
data = """DB\tTable\tHeaders
Oracle\tCust\tId,Name,Mail,Phone,City,County
Oracle\tCli\tCid,shopNo,State
Oracle\tAddr\tStreet,Area,City,Country
SqlSer\tUsr\tName,Id,Addr
SqlSer\tLog\tLogId,Env,Stg
MySql\tLoc\tFlat,Add,Pin,Country
MySql\tData\tId,Txt,TaskId,No"""
dataframe = pd.read_csv(io.StringIO(data), sep='\t')
db_is_oracle = dataframe['DB'] == 'Oracle'
table_is_cli = dataframe['Table'] == 'Cli'
filtered_dataframe = dataframe[db_is_oracle & table_is_cli]
print(filtered_dataframe)
This will result in :
DB Table Headers
1 Oracle Cli Cid,shopNo,State
Or to get the actual headers of the first match:
print(filtered_dataframe['Headers'].iloc[0])
>>> Cid,shopNo,State

Related

pandas db created dataframe compare with csv created dataframe having type issue while comparing

I have a requirement to compare db migrated data to s3 created csv file for same table using python script with pandas library.
While doing this,I am facing dtype issue as data type has changed when it moves to csv file. for exmaple: table created dataframe has dtype as object however csv file has dtype as float.
and while doing df1table.equals(df2csv) ,getting result as false.
Even ,I tried to change the dtype of table data frame got error saying can't change string to float. Also facing issue with Null values of the table data frame compare to csv data frame.
I need a generic solution which work for all table and respective csv file.
Any better way to compare them. For ex: change both data frame into same type and compare.
looking for your reply.Thanks!
To prevent Pandas inferring the data type, you can use dtype=object as parameter of pd.read_csv:
df2csv = pd.read_csv('file.csv', dtype=object, # other params)
Example:
df1 = pd.read_csv('data.csv')
df2 = pd.read_csv('data.csv', dtype=object)
# df1
A B C
0 0.988888 1.789871 12.7
# df2
A B C
0 0.988887546565 1.789871131 12.7
CSV file:
A,B,C
0.988887546565,1.789871131,12.7

How do I convert my response with byte characters to readable CSV - PYTHON

I am building an API to save CSVs from Sharepoint Rest API using python 3. I am using a public dataset as an example. The original csv has 3 columns Group,Team,FIFA Ranking with corresponding data in the rows.For reference. the original csv on sharepoint ui looks like this:
after using data=response.content the output of data is:
b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\nA,Netherlands,8\r\nB,England,5\r\nB,Iran,20\r\nB,United States,16\r\nB,Wales,19\r\nC,Argentina,3\r\nC,Saudi Arabia,51\r\nC,Mexico,13\r\nC,Poland,26\r\nD,France,4\r\nD,Australia,38\r\nD,Denmark,10\r\nD,Tunisia,30\r\nE,Spain,7\r\nE,Costa Rica,31\r\nE,Germany,11\r\nE,Japan,24\r\nF,Belgium,2\r\nF,Canada,41\r\nF,Morocco,22\r\nF,Croatia,12\r\nG,Brazil,1\r\nG,Serbia,21\r\nG,Switzerland,15\r\nG,Cameroon,43\r\nH,Portugal,9\r\nH,Ghana,61\r\nH,Uruguay,14\r\nH,South Korea,28\r\n'
how do I convert the above to csv that pandas can manipulate with the columns being Group,Team,FIFA and then the corresponding data dynamically so this method works for any csv.
I tried:
data=response.content.decode('utf-8', 'ignore').split(',')
however, when I convert the data variable to a dataframe then export the csv the csv just returns all the values in one column.
I tried:
data=response.content.decode('utf-8') or data=response.content.decode('utf-8', 'ignore') without the split
however, pandas does not take this in as a valid df and returns invalid use of dataframe constructor
I tried:
data=json.loads(response.content)
however, the format itself is invalid json format as you will get the error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Given:
data = b'Group,Team,FIFA Ranking\r\nA,Qatar,50\r\nA,Ecuador,44\r\nA,Senegal,18\r\n' #...
If you just want a CSV version of your data you can simply do:
with open("foo.csv", "wt", encoding="utf-8", newline="") as file_out:
file_out.writelines(data.decode())
If your objective is to load this data into a pandas dataframe and the CSV is not actually important, you can:
import io
import pandas
foo = pandas.read_csv(io.StringIO(data.decode()))
print(foo)

Reading a database table without Pandas

I am trying to read a table from a HANA database in Python using SQLAlchemy library. Typically, I would use the Pandas package and use the pd.read_sql() method for this operation. However, for some reason, the environment I am using does not support the Pandas package. Therefore, I need to read the table without the Pandas library. So far, the following is what I have been able to do:
query = ('''SELECT * FROM "<schema_name>"."<table_name>"'''
''' WHERE <conditional_clauses>'''
)
with engine.connect() as con:
table = con.execute(query)
row = table.fetchone()
However, while this technique allows me to read table row by row, I am do not get the column names of the table.
How can I fix this?
Thanks
I am do not get the column names of the table
You won't get the column names of the table but you can get the column names (or aliases) of the result set:
with engine.begin() as conn:
row = conn.execute(sa.text("SELECT 1 AS foo, 2 AS bar")).fetchone()
print(row.items()) # [('foo', 1), ('bar', 2)]
#
# or, for just the column names
#
print(row.keys()) # ['foo', 'bar']

How to load information in two columns?

Question 1: The file phone.txt stores the lines in the format code:number
import pandas as pd
import sqlite3
con = sqlite3.connect('database.db')
data = pd.read_csv('phone.txt', sep='\t', header=None)
data.to_sql('post_table', con, if_exists='replace', index=False)
I want to load all the data from the phone.txt file into the database.db database. But I have everything loaded in one column. And I need to load in two columns:
code
number
How to do it?
Question 2: after downloading the information to the database, how can I find the number by code? For example, if I want to find out what number code = 7 (answer: 9062621390).
Question 1
In your example pandas is not able to distinguish between the code and the number since your file is :-separated. When reading your file you need to change the separator to : and also specify columns since your csv doesn't seem to have a header like so
data = pd.read_csv('phone.txt',
sep=':',
names=['code', 'number'])
Question 2
After putting your data to the database you can query it as follows
number = pd.read_sql_query('SELECT number FROM post_table WHERE code = (?)',
con,
params=(code,))
where con is your sqlite connection.

Extracting data from a complex MongoDB database using PyMongo and converting it to a .csv file

I have a complex MongoDB database, consisting of documents nested upto 7 levels deep. I need to use PyMongo to extract the data, and then convert the extracted data to a .csv file.
You can try using json_normalize.
It is used to flatten the json.Reads data to a dataframe which can be stored in csv later.
For eg:
from pandas.io.json import json_normalize
# mongo_value is your mongo query
mongo_aggregate = db.events.aggregate(mongo_value)
mongo_df = json_normalize(list(mongo_aggregate))
# print(mongo_df)
mongo_columns = list(mongo_df.columns.values)
#just picks the column_name instead of properties.something.something.column_name
for w in range(len(mongo_columns)):
mongo_columns[w] = mongo_columns[w].split('.')[-1].lower()
mongo_df.columns = mongo_columns
For reference read this https://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html

Resources