Extract large data in Postgresql using python(preferbly in dataframe format) - psycopg2

I have imported many large csv files into tables to my postgresql database, I know how to connect to the database with this code:
import psycopg2
try:
connection = psycopg2.connect(user = "xxx",
password = "xxx",
host = "xxx",
port = "xxx",
database = "xxx")
cursor = connection.cursor()
# Print PostgreSQL Connection properties
print ( connection.get_dsn_parameters(),"\n")
# Print PostgreSQL version
cursor.execute("SELECT version();")
record = cursor.fetchone()
print("You are connected to - ", record,"\n")
except (Exception, psycopg2.Error) as error :
print ("Error while connecting to PostgreSQL", error)
finally:
#closing database connection.
if(connection):
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
But I struggle to extract data from here, is it possible to transform these tables to dataframe format, since I will be doing some ML analysis on these tables.
I'm new to Postgresql, please help me with this issue.

There are a few ways to do it.
A very simple way would be to iterate through the cursor with fetchall()
cursor.execute(query)
rows = cursor.fetchall()
data = []
for row in rows:
data.append({'field1':row[0],'field2':row[1])})
If you are using Pandas Dataframe, you could do:
rows = pd.DataFrame(rows,columns=['field1','field2'])

Related

How to dynamically input the table name and fetch the results in SQLite?

I have just started learning SQLite and was creating a project which has a .sqlite file in which there are multiple tables. I want to ask the user to input the table_name and then the program will fetch the columns present in that particular table.
So far I have done this.
app_database.py
def column_names(table_name):
conn = sqlite3.connect('northwind_small.sqlite')
c = conn.cursor()
c.execute("PRAGMA table_info(table_name)")
columns = c.fetchall()
for c in columns :
print(c[1])
conn.commit()
conn.close()
our-app.py
import app_database
table_name = input("Enter the table name = ")
app_database.column_names(table_name)
when I run our-app.py I don't get anything.
C:\Users\database-project>python our-app.py
Enter the table name = Employee
C:\Users\database-project>
Can anyone tell me how should I proceed?

python, SQLite3 showing existing Tables and data

I have been given a .db file, that has already been populated with both Tables and Data. However, no description of the content of the database has been made available.
Is there a way for me to retrieve individual lists listing the different tables, and their respective sets of columns using SQLite3 and python?
This code help you to show tables with keys , when you get tables and their keys you can get data.
import sqlite3
def readDb():
connection = sqlite3.connect('data.db')
connection.row_factory = sqlite3.Row
cursor = connection.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
rows = cursor.fetchall()
tabs=[]
for row in rows:
for r in row:
tabs.append(r)
d={}
for tab in tabs:
cursor.execute("SELECT * FROM "+tab+";")
rows = cursor.fetchone()
t=[]
for row in rows.keys():
t.append(row)
d[tab]=t
connection.commit()
return d
print(readDb())

fetchall method converting Postgresql timestamptz field to different timezone

First question on here, so let me know if more information is needed. I am using the Python psycopg2-binary==2.7.7 package in an attempt to pull PostgreSQL 9.6.11 timestamptz fields out of a database.
With that said, the 'psycopg2' package seems to be coercing the timestamptz date-times to a different timezone than is present in the database.
For instance, the following query will return the correct offset if run in a PostgreSQL client:
SQL
SELECT row_to_json(t)
FROM (
SELECT '2019-01-24T08:24:00-05:00'::timestamptz AS tz
)t;
Result
{"tz":"2019-01-24 08:24:00-05"}
However, if I run the same query via the psycopg2.cursor.fetchall method, I get a different offset than expected/returned:
import time
import psycopg2
import logging
logger = logging.getLogger()
def getRows(query, printRows=False, **kwargs):
try:
cs = "dbname={dbname} user={dbuser} password={dbpass} host={server} port={port}".format(
**kwargs)
con = psycopg2.connect(cs)
con.set_session(readonly=True, autocommit=True)
except Exception:
logger.exception("-->>>>Something went wrong connecting to db")
return None
end = None
try:
start = time.time()
cur = con.cursor()
cur.execute(query)
rows = cur.fetchall()
if printRows:
for i in rows:
print(i)
cur.close()
con.commit()
con.close()
end = time.time()
logger.info(
"-->>>>Query took {} seconds...".format(round(end - start, 2)))
return rows
except Exception:
end = time.time()
cur.close()
con.commit()
con.close()
logger.exception("-->>>>Something went wrong with the query...")
logger.info(
"-->>>>Query took {} seconds...".format(round(end - start, 2)))
if __name__ == '__main__':
test = getRows("""SELECT row_to_json(t) AS "result"
FROM(
SELECT '2019-01-24T08:24:00-05:00'::timestamptz AS tz
)t;
""", printRows=True, **DBSECRETS)
print(test[0][0])
Result
{'tz': '2019-01-24T05:24:00-08:00'}
As seen above, the EST timezone (offset of -5)to PostgreSQL is being converted to a -08:00 offset via the psycopg2 package.
I've checked the psycopg2 documentation but could not find any conclusive examples to fix this issue. Specifically, I've checked here:
http://initd.org/psycopg/docs/cursor.html#cursor.tzinfo_factory
It turns out that the SQL Client, Dbeaver, coerces a timestamptz to the local OS timezone, which in this case is EST.
How to change DBeaver timezone / How to stop DBeaver from converting date and time
The PostgreSQL server, however, has a native timezone of Pacific time or PST. Thus, the psycopg2 package was interpreting the timestamptz correctly according to the server, i.e. PST.

Python 3 script not writing to Postgres table

The first part of the script returns all of my AD users with values converted to Python str: draft = [('Display Name', 'username'),]
I want to write this to my main_associate table (Postgres 9.5) avoiding duplicates. I know I have records in the list that are not duplicates and should be written. This returns no errors but doesn't write my records:
try:
new_conn = psycopg2.connect("dbname='test' user='usr' host='localhost' password='pswd'")
except:
print("Unable to connect to the associates database.")
sql = """INSERT INTO main_associate(displayname,username) VALUES(%s,%s)
ON CONFLICT (username) DO NOTHING"""
one_cur = new_conn.cursor()
for grp in draft:
#print(grp)
one_cur.execute(sql, (grp[0],grp[1],))
new_conn.commit
one_cur.close()
new_conn.close()
If you install sqlalchemy...
from sqlalchemy import create_engine, MetaData
engine = create_engine('postgresql://postgres:pswd#localhost/test')
meta = MetaData()
meta.reflect(bind=engine)
table = meta.tables['main_associate']
for grp in draft:
ins = table.insert({"displayname":grp[0],"username":grp[1]}).on_conflict_do_nothing(index_elements=['username'])
engine.execute(ins)

How to use passed parameter as table Name in Select query python?

i have the following function which extracts data from table, but i want to pass the table name in function as parameter...
def extract_data(table):
try:
tableName = table
conn_string = "host='localhost' dbname='Aspentiment' user='postgres' password='pwd'"
conn=psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("SELECT aspects_name, sentiments FROM ('%s') " %(tableName))
rows = cursor.fetchall()
return rows
finally:
if conn:
conn.close()
when i call function as extract_data(Harpar) : Harpar is table name
but it give an error that 'Harpar' is not defined.. any hepl ?
Update: As of psycopg2 version 2.7:
You can now use the sql module of psycopg2 to compose dynamic queries of this type:
from psycopg2 import sql
query = sql.SQL("SELECT aspects_name, sentiments FROM {}").format(sql.Identifier(tableName))
cursor.execute(query)
Pre < 2.7:
Use the AsIs adapter along these lines:
from psycopg2.extensions import AsIs
cursor.execute("SELECT aspects_name, sentiments FROM %s;",(AsIs(tableName),))
Without the AsIs adapter, psycopg2 will escape the table name in your query.

Resources