Why are utf-8 emojis not getting rendered in my pandas dataframe when I read from the SQL database? - python-3.x

I have the following line to read from a csv file:
coronavirus_df = pd.read_csv('Path\coronavirus_March-3-2020.csv')
I Have this other lines to read from MSSQL:
import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=MyServer;'
'Database=Mydb;'
'Trusted_Connection=yes;')
cursor = conn.cursor()
sql_tweets_df = pd.read_sql_query('SELECT * FROM my table',conn)
In both cases, I can get the data from the data sources and create a data frame, but there is an important difference:
coronavirus_df['text'].loc[9] gives the result:
-> 'YEP 👍some more text.'
sql_tweets_df['Text'].loc[9] gives this other result:
-> 'YEP ðŸ‘\x8d some more text'
Why is this happening?, the emoji is not rendered when I'm getting the information from the database.
In both the database and in the excel file, that record seems to be precisely the same.
I'm using python 3 and Jupyter notebooks

Related

Import a random csv as a table on the fly - Postgresql and Python

I am using a pgadmin client. I have multiple csv files.
I would like to import each csv file as a table.
When I tried the below
a) Click create table
b) Enter the name of table and save it.
c) I see the table name
d) Click on "Import csv"
e) selected columns as "header"
f) Clicked "Import"
But I got an error message as below
ERROR: extra data after last expected column
CONTEXT: COPY Test_table, line 2: "32,F,52,Single,WHITE,23/7/2180 12:35,25/7/2180..."
I also tried the python psycopg2 version as shown below
import psycopg2
conn = psycopg2.connect("host='xxx.xx.xx.x' port='5432' dbname='postgres' user='abc' password='xxx'")
cur = conn.cursor()
f = open(r'test.csv', 'r')
cur.copy_from(f,public.test, sep=',') #while I see 'test' table under my schema, how can I give here the schema name etc. I don't know wht it says table not defined
f.close()
UndefinedTable: relation "public.test" does not exist
May I check whether it is possible to import some random csv as table using pgadmin import?
Pandas will do this easily. Create a table with a structure as some csv.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
The csv is first read by read_csv to a Dataframe
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Regards Niels
As I understand the requirement, a new table is wanted for every csv. The code below illustrates that. It can be customized and datatypes can be elaborated, see the documentation for Pandas.DataFrame.to_sql. I think, actually, that the heavy lifting is done by SQLAlchemy
import io
import os
import pandas as pd
import psycopg2
buf_t1 = io.StringIO()
buf_t1.write("a,b,c,d\n")
buf_t1.write("1,2,3,4\n")
buf_t1.seek(0)
df_t1 = pd.read_csv(buf_t1)
df_t1.to_sql(name="t1", con="postgresql+psycopg2://host/db", index=False, if_exists='replace')
#
buf_t2 = io.StringIO()
buf_t2.write("x,y,z,t\n")
buf_t2.write("1,2,3,'Hello World'\n")
buf_t2.seek(0)
df_t2 = pd.read_csv(buf_t2)
df_t2.to_sql(name="t2", con="postgresql+psycopg2://host/db", index=False, if_exists='replace')
This will result in two new tables, t1 and t2. Defined as like this:
create table t1
(
a bigint,
b bigint,
c bigint,
d bigint
);
create table t2
(
x bigint,
y bigint,
z bigint,
t text
);

Extract Salesforce Objects and Load them into SQLLite Database Tables- Python3

I am trying to collect data from salesforce and then load them into sqllite tables.
Here is my code:
from simple_salesforce import Salesforce, SFType, SalesforceLogin
from pandas import DataFrame, read_csv
import json
import pandas as pd
from pprint import pprint as pp
#Connect to salesforce site
session_id, instance = SalesforceLogin(username=username, password=password, security_token=security_token)
#Create Instance
sf = Salesforce(instance=instance, session_id=session_id)
desc = sf.Opportunity.describe()
# Below is what you need
field_names = [field['name'] for field in desc['fields']]
soql = "SELECT {} FROM Opportunity ".format(','.join)
results = sf.query_all(soql)
sf_df = pd.DataFrame(results['records']).drop(columns='attributes')
sf_df.to_csv('/Users/ma/test1.csv')
This collects the opportunity table and writes it to a CSV file. Any suggestions on how to improve this step and also the next step which is to create a sqllite table out of the salesforce generated csv files? I am new to salesforce and sqllite and am stuck on these steps.

Binding Teradata Query in python not returning anything

I am trying to automate some usual db queries via python and was testing the sql parameterization
import teradata
import pyodbc
import sys
from pandas import DataFrame
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
udaExec = teradata.UdaExec (appName="HelloWorld", version="1.0",
logConsole=False)
session = udaExec.connect(method="odbc", system="db",
username="username", password="password");
t = 'user_id' #dynamic column to be selected
cursor = session.cursor();
"""The below query returned only the user_id column
>>> sw_overall1
0
0 user_id
"""
sw_overall1=cursor.execute("""select distinct ? from
table""" ,(t,)).fetchall()
sw_overall1 = DataFrame(sw_overall1)
cursor = session.cursor();
#The below query returned the correct result
sw_overall2=cursor.execute("""select distinct user_id from
table""" ).fetchall()
Am I doing the binding incorrectly ? since without binding I get the correct output.

Querying from Microsoft SQL to a Pandas Dataframe

I am trying to write a program in Python3 that will run a query on a table in Microsoft SQL and put the results into a Pandas DataFrame.
My first try of this was the below code, but for some reason I don't understand the columns do not appear in the order I ran them in the query and the order they appear in and the labels they are given as a result change, stuffing up the rest of my program:
import pandas as pd, pyodbc
result_port_mapl = []
# Use pyodbc to connect to SQL Database
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' +
<database>
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
# Run SQL Query
cursor.execute("""
SELECT <field1>, <field2>, <field3>
FROM result
""")
# Put data into a list
for row in cursor.fetchall():
temp_list = [row[2], row[1], row[0]]
result_port_mapl.append(temp_list)
# Make list of results into dataframe with column names
## FOR SOME REASON HERE row[1] AND row[0] DO NOT CONSISTENTLY APPEAR IN THE
## SAME ORDER AND SO THEY ARE MISLABELLED
result_port_map = pd.DataFrame(result_port_mapl, columns={'<field1>', '<field2>', '<field3>'})
I have also tried the following code
import pandas as pd, pyodbc
# Use pyodbc to connect to SQL Database
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' + <database>
cnxn = pyodbc.connect(con_string)
cursor = cnxn.cursor()
# Run SQL Query
cursor.execute("""
SELECT <field1>, <field2>, <field3>
FROM result
""")
# Put data into DataFrame
# This becomes one column with a list in it with the three columns
# divided by a comma
result_port_map = pd.DataFrame(cursor.fetchall())
# Get column headers
# This gives the error "AttributeError: 'pyodbc.Cursor' object has no
# attribute 'keys'"
result_port_map.columns = cursor.keys()
If anyone could suggest why either of those errors are happening or provide a more efficient way to do it, it would be greatly appreciated.
Thanks
If you just use read_sql? Like:
import pandas as pd, pyodbc
con_string = 'DRIVER={SQL Server};SERVER='+ <server> +';DATABASE=' + <database>
cnxn = pyodbc.connect(con_string)
query = """
SELECT <field1>, <field2>, <field3>
FROM result
"""
result_port_map = pd.read_sql(query, cnxn)
result_port_map.columns.tolist()

pandas creating a dataframe from mysql database

So I have been trying to create a dataframe from a mysql database using pandas and python but I have encountered an issue which I need help on.
The issue is when writing the dataframe to excel, it only writes the last row ie, it overwrites all the previous entries and only the last row is written. Please see the code below
import pandas as pd
import numpy
import csv
with open('C:path_to_file\\extract_job_details.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
jobid = str(row[1])
statement = """select jt.job_id ,jt.vendor_data_type,jt.id as TaskId,jt.create_time as CreatedTime,jt.job_start_time as StartedTime,jt.job_completion_time,jt.worker_path, j.id as JobId from dspe.job_task jt JOIN dspe.job j on jt.job_id = j.id where jt.job_id = %(jobid)s"""",
df_mysql = pd.read_sql(statement1, con=mysql_cn)
try:
with pd.ExcelWriter(timestr+'testResult.xlsx', engine='xlsxwriter') as writer:
df_mysql.to_excel(writer, sheet_name='Sheet1')
except pymysql.err.OperationalError as error:
code, message = error.args
mysql_cn.close()
Please can anyone help me identify where I am going wrong?
PS i am a new to pandas and python.
Thanks Carlos
I'm not really sure what you're trying to do reading from disk and a database at the same time...
First, you don't need csv when you're already using Pandas:
df = pd.read_csv("path/to/input/csv")
Next you can simply provide a file path as an argument to to_excel instead of an ExcelWriter instance:
df.to_excel("path/to/desired/excel/file")
If it doesn't actually need to be an excel file you can use:
df.to_csv("path/to/desired/csv/file")

Resources