I have imported many large csv files into tables to my postgresql database, I know how to connect to the database with this code:
import psycopg2
try:
connection = psycopg2.connect(user = "xxx",
password = "xxx",
host = "xxx",
port = "xxx",
database = "xxx")
cursor = connection.cursor()
# Print PostgreSQL Connection properties
print ( connection.get_dsn_parameters(),"\n")
# Print PostgreSQL version
cursor.execute("SELECT version();")
record = cursor.fetchone()
print("You are connected to - ", record,"\n")
except (Exception, psycopg2.Error) as error :
print ("Error while connecting to PostgreSQL", error)
finally:
#closing database connection.
if(connection):
cursor.close()
connection.close()
print("PostgreSQL connection is closed")
But I struggle to extract data from here, is it possible to transform these tables to dataframe format, since I will be doing some ML analysis on these tables.
I'm new to Postgresql, please help me with this issue.
There are a few ways to do it.
A very simple way would be to iterate through the cursor with fetchall()
cursor.execute(query)
rows = cursor.fetchall()
data = []
for row in rows:
data.append({'field1':row[0],'field2':row[1])})
If you are using Pandas Dataframe, you could do:
rows = pd.DataFrame(rows,columns=['field1','field2'])
Related
I have just started learning SQLite and was creating a project which has a .sqlite file in which there are multiple tables. I want to ask the user to input the table_name and then the program will fetch the columns present in that particular table.
So far I have done this.
app_database.py
def column_names(table_name):
conn = sqlite3.connect('northwind_small.sqlite')
c = conn.cursor()
c.execute("PRAGMA table_info(table_name)")
columns = c.fetchall()
for c in columns :
print(c[1])
conn.commit()
conn.close()
our-app.py
import app_database
table_name = input("Enter the table name = ")
app_database.column_names(table_name)
when I run our-app.py I don't get anything.
C:\Users\database-project>python our-app.py
Enter the table name = Employee
C:\Users\database-project>
Can anyone tell me how should I proceed?
I have been given a .db file, that has already been populated with both Tables and Data. However, no description of the content of the database has been made available.
Is there a way for me to retrieve individual lists listing the different tables, and their respective sets of columns using SQLite3 and python?
This code help you to show tables with keys , when you get tables and their keys you can get data.
import sqlite3
def readDb():
connection = sqlite3.connect('data.db')
connection.row_factory = sqlite3.Row
cursor = connection.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
rows = cursor.fetchall()
tabs=[]
for row in rows:
for r in row:
tabs.append(r)
d={}
for tab in tabs:
cursor.execute("SELECT * FROM "+tab+";")
rows = cursor.fetchone()
t=[]
for row in rows.keys():
t.append(row)
d[tab]=t
connection.commit()
return d
print(readDb())
First question on here, so let me know if more information is needed. I am using the Python psycopg2-binary==2.7.7 package in an attempt to pull PostgreSQL 9.6.11 timestamptz fields out of a database.
With that said, the 'psycopg2' package seems to be coercing the timestamptz date-times to a different timezone than is present in the database.
For instance, the following query will return the correct offset if run in a PostgreSQL client:
SQL
SELECT row_to_json(t)
FROM (
SELECT '2019-01-24T08:24:00-05:00'::timestamptz AS tz
)t;
Result
{"tz":"2019-01-24 08:24:00-05"}
However, if I run the same query via the psycopg2.cursor.fetchall method, I get a different offset than expected/returned:
import time
import psycopg2
import logging
logger = logging.getLogger()
def getRows(query, printRows=False, **kwargs):
try:
cs = "dbname={dbname} user={dbuser} password={dbpass} host={server} port={port}".format(
**kwargs)
con = psycopg2.connect(cs)
con.set_session(readonly=True, autocommit=True)
except Exception:
logger.exception("-->>>>Something went wrong connecting to db")
return None
end = None
try:
start = time.time()
cur = con.cursor()
cur.execute(query)
rows = cur.fetchall()
if printRows:
for i in rows:
print(i)
cur.close()
con.commit()
con.close()
end = time.time()
logger.info(
"-->>>>Query took {} seconds...".format(round(end - start, 2)))
return rows
except Exception:
end = time.time()
cur.close()
con.commit()
con.close()
logger.exception("-->>>>Something went wrong with the query...")
logger.info(
"-->>>>Query took {} seconds...".format(round(end - start, 2)))
if __name__ == '__main__':
test = getRows("""SELECT row_to_json(t) AS "result"
FROM(
SELECT '2019-01-24T08:24:00-05:00'::timestamptz AS tz
)t;
""", printRows=True, **DBSECRETS)
print(test[0][0])
Result
{'tz': '2019-01-24T05:24:00-08:00'}
As seen above, the EST timezone (offset of -5)to PostgreSQL is being converted to a -08:00 offset via the psycopg2 package.
I've checked the psycopg2 documentation but could not find any conclusive examples to fix this issue. Specifically, I've checked here:
http://initd.org/psycopg/docs/cursor.html#cursor.tzinfo_factory
It turns out that the SQL Client, Dbeaver, coerces a timestamptz to the local OS timezone, which in this case is EST.
How to change DBeaver timezone / How to stop DBeaver from converting date and time
The PostgreSQL server, however, has a native timezone of Pacific time or PST. Thus, the psycopg2 package was interpreting the timestamptz correctly according to the server, i.e. PST.
The first part of the script returns all of my AD users with values converted to Python str: draft = [('Display Name', 'username'),]
I want to write this to my main_associate table (Postgres 9.5) avoiding duplicates. I know I have records in the list that are not duplicates and should be written. This returns no errors but doesn't write my records:
try:
new_conn = psycopg2.connect("dbname='test' user='usr' host='localhost' password='pswd'")
except:
print("Unable to connect to the associates database.")
sql = """INSERT INTO main_associate(displayname,username) VALUES(%s,%s)
ON CONFLICT (username) DO NOTHING"""
one_cur = new_conn.cursor()
for grp in draft:
#print(grp)
one_cur.execute(sql, (grp[0],grp[1],))
new_conn.commit
one_cur.close()
new_conn.close()
If you install sqlalchemy...
from sqlalchemy import create_engine, MetaData
engine = create_engine('postgresql://postgres:pswd#localhost/test')
meta = MetaData()
meta.reflect(bind=engine)
table = meta.tables['main_associate']
for grp in draft:
ins = table.insert({"displayname":grp[0],"username":grp[1]}).on_conflict_do_nothing(index_elements=['username'])
engine.execute(ins)
i have the following function which extracts data from table, but i want to pass the table name in function as parameter...
def extract_data(table):
try:
tableName = table
conn_string = "host='localhost' dbname='Aspentiment' user='postgres' password='pwd'"
conn=psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("SELECT aspects_name, sentiments FROM ('%s') " %(tableName))
rows = cursor.fetchall()
return rows
finally:
if conn:
conn.close()
when i call function as extract_data(Harpar) : Harpar is table name
but it give an error that 'Harpar' is not defined.. any hepl ?
Update: As of psycopg2 version 2.7:
You can now use the sql module of psycopg2 to compose dynamic queries of this type:
from psycopg2 import sql
query = sql.SQL("SELECT aspects_name, sentiments FROM {}").format(sql.Identifier(tableName))
cursor.execute(query)
Pre < 2.7:
Use the AsIs adapter along these lines:
from psycopg2.extensions import AsIs
cursor.execute("SELECT aspects_name, sentiments FROM %s;",(AsIs(tableName),))
Without the AsIs adapter, psycopg2 will escape the table name in your query.