I've to transfer data from one postgreSQL DB (old) into another postgresSQL DB (new).
Old is encoded in win1252. New is encoded in utf-8.
I've already tried different methods ex. pandas.to_sql, sqlalchemy, psycopg2 and so on but failing all the time due to encoding "issues". I've done some researches and the most valid thing looks like an issue on the driver side. As far as I know psycopg2 uses the unicode driver but with my source database version (PostgreSQL 9.4.20 on x86_64) I've to use ANSI to bypass these encoding issues.
I've tested with an ETL tool if it's possible to export the affected table without encoding issues. It was possible without issues. Due to this test I'm pretty sure it's no real encoding issue instead of an driver handling issue.
When I just used a sample to test if loading the data in general works, I already noticed pandas is to slow. I've to load 1.2 mio reccords. But this runs for ever. Therefore the postgreSQL copy method is may prefered method. From my perspective psycopg2 is useing the standard connection string (https://halvar.at/python/odbc_dsn_connection_strings/). But I've to use the ANSI driver.
I tried to pass an SQLAlchemy to thy psycopg2 connector. But this does not work.
stage_engine_string = ("{PostgreSQL ANSI}+psycopg2://" + str(stage_user) + ":" + str(stage_password) + "#" + str(stage_host) + ":" + str(stage_port) + "/" + str(stage_database))
because
conn = psycopg2.connect(**params)
only allows to pass the arguments.
host =
database =
user =
password =
port =
Before I tried the above I tried for ex.
cur.copy_to(open("sql_tmp_export.csv", "w", encoding="utf-8", errors="ignore"), "table", sep=";", columns=("no","description"))
,
conn.decode("win1250").encode("utf8")
and
conn.set_client_encoding("win1250")
but I receive an encoidng issue all the time. Based on the doc of postgres switching between utf8 and win1250 should never be an problem.
On the ETL tool I'd a similar issue but was able to solve it via sending an
"set client_encoding=\"windows-1250\"
after esthablishing the connection to the database.
But if I try this in psycopg2 to
cur.execute("set client_encoding=\"windows-1250\;select * from table")
I stil get the encoding issue.
Any clue if I have an option to pass the driver on builing up a psycopg2 connection? I think this should solve my issue.
My real issue (getting data from db) wasn't solved because of follow up issues. If you want to get into, I'm happy to discuss on my next question: Downloading a postgreSQL pg_dump file from a remote server using Python
But I was able to solve this question. If you want to use an ANSI you've to install the last ODBC driver from https://www.postgresql.org/ftp/odbc/versions/msi/
Then you can swith the psycopg2 connection to an pyodbc connection.
import pyodbc
conn_str = (
"DRIVER={PostgreSQL Ansi(x64)};"
"DATABASE="+database+";"
"UID="+user+";"
"PWD="+password+";"
"SERVER="+host+";"
"PORT="+port+";"
)
conn = pyodbc.connect(conn_str)
cur = conn.execute("SELECT 1")
row = cur.fetchone()
print(row)
cur.close()
conn.close()
My general problem has been fixed now as well. But the solution was strange. If someone stucks on something similar, I simply run the same script twice but first of all with limit and offset.
def any_postrgres_method_to_load_data_from_db:
conn = some_lib.conect(var1, var2)
cur = conn.cursor()
sql_pre_statement = """\
set client_encoding = "Windows-1250"
"""
cur.execute(sql_pre_statement)
sql_statement = """\
select * from n
"""
cur.execute(sql_statement)
df = pandas.read_sql_query(sql, conn)
df.to_csv("sql_tmp_export.csv", index=False)
The script above returned several encoding issues.
After running the script slightly adjusted as shown below ones, I was able to run the original one working.
def any_postrgres_method_to_load_data_from_db:
conn = some_lib.conect(var1, var2)
cur = conn.cursor()
sql_pre_statement = """\
set client_encoding = "Windows-1250"
"""
cur.execute(sql_pre_statement)
sql_statement = """\
select * from n offset 500 limit 1000
"""
cur.execute(sql_statement)
df = pandas.read_sql_query(sql, conn)
df.to_csv("sql_tmp_export.csv", index=False)
I can't really explain this. I've just the feeling that there was something strange in the cache on the remote db.
Related
DataStax driver for Cassandra Version 3.25.0,
Python version 3.9
Session.execute() fetches the first 100 records. As per the documentation, the driver is supposed to
tranparently fetch next pages as we reach the end of first page. However, it fetches the same page again and again and hence the first 100 records is all that is ever accessible.
The for loop that prints records goes infinite.
ssl_context.verify_mode = CERT_NONE
cluster = Cluster(contact_points=[db_host], port=db_port,
auth_provider = PlainTextAuthProvider(db_user, db_pwd),
ssl_context=ssl_context
)
session = cluster.connect()
query = "SELECT * FROM content_usage"
statement = SimpleStatement(query, fetch_size=100)
results = session.execute(statement)
for row in results:
print(f"{row}")
I could see other similar threads, but they are not answered too. Has anyone encountered this issue before? Any help is appreciated.
I'm a bit confused by the initial statement of the problem. You mentioned that the initial page of results is fetched repeatedly and that these are the only results available to your program. You also indicated that the for loop responsible for printing results turns into an infinite loop when you run the program. These statements seem contradictory to me; how can you know what the driver has fetched if you never get any output? I'm assuming that's what you meant by "goes infinite"... if I'm wrong please correct me.
The following code seems to run as expected against Cassandra 4.0.0 using cassandra-driver 3.25.0 on Python 3.9.0:
import argparse
import logging
import time
from cassandra.cluster import Cluster, SimpleStatement
def setupLogging():
log = logging.getLogger()
log.setLevel('DEBUG')
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
log.addHandler(handler)
def setupSchema(session):
session.execute("""create keyspace if not exists "baz" with replication = {'class':'SimpleStrategy', 'replication_factor':1};""")
session.execute("""create table if not exists baz.qux (run_ts bigint, idx int, uuid timeuuid, primary key (run_ts,idx))""")
session.execute("""truncate baz.qux""")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-d','--debug', action='store_true')
args = parser.parse_args()
cluster = Cluster()
session = cluster.connect()
if args.debug:
setupLogging()
setupSchema(session)
run_ts = int(time.time())
insert_stmt = session.prepare("""insert into baz.qux (run_ts,idx,uuid) values (?,?,now())""")
for idx in range(10000):
session.execute(insert_stmt, [run_ts, idx])
query = "select * from baz.qux"
stmt = SimpleStatement(query, fetch_size=100)
results = session.execute(stmt)
for row in results:
print(f"{row}")
cluster.shutdown()
$ time (python foo.py | wc -l)
10000
real 0m12.452s
user 0m3.786s
sys 0m2.197s
You might try running your sample app with debug logging enabled (see sample code above for how to enable this). It sounds like something might be off in your Cassandra configuration (or perhaps your client setup); the additional logging might help you identify what (if anything) is getting in the way.
The logic in your code is only calling execute() once so the contents of results will only ever be the same list of 100 rows.
You need to call execute() in your loop to get the next page of results like this:
query = "SELECT * FROM content_usage"
statement = SimpleStatement(query, fetch_size=100)
for row in session.execute(statement):
process_row(row)
For more info, see Paging with the Python driver. Cheers!
Below is the code snippet that finally worked for me, after restricting the driver version to 3.20:
statement = session.prepare(query)
# Execute the query once and retrieve the first page of results
results = self.session.execute(statement, params)
for row in results.current_rows:
process_row(row)
# Fetch more pages until they exhaust
while results.has_more_pages:
page_state = results.paging_state
results = session.execute(statement, parameters=params, paging_state=page_state)
for row in results.current_rows:
process_row(row)
I am trying execute below block of code with cx_oracle by bind variables, but getting below mentioned error everytime. Not sure what is missing.
Anyone has idea on this
Code :
a = input("Please enter your name ::")
conn = cx_Oracle.connect('hello/123#oracle')
cur = conn.cursor()
text1 = "select customer from visitors where name = :myvalue;"
cur.execute(text1,myvalue=str(a))
ERROR observed :
cx_Oracle.DatabaseError: ORA-00933: SQL command not properly ended
Remove the semi-colon at the end of your SQL statement.
I have been playing around with pyodbc and for some reason when trying to connect if I get the Server property from a input() it cannot find the server, but if I take the same server name and declare it as a variable before hand it works fine with the exact same code. Does anyone know what is going on here?
Code with input()
import pyodbc
driver = '{ODBC Driver 17 for SQL Server}'
instance = input("Please Enter your SQL Instance: ")
connectionstring = f'Driver={driver}; Server={instance}; Trusted_Connection=yes'
conn = pyodbc.connect(connectionstring)
cursor = conn.cursor()
cursor.execute('SELECT name FROM sys.databases')
for row in cursor:
print(row)
Output
Code output server name from Input()
Code with Variable
import pyodbc
driver = '{ODBC Driver 17 for SQL Server}'
instance = 'DESKTOP-J7PBL8S\\NORTHWIND'
connectionstring = f'Driver={driver}; Server={instance}; Trusted_Connection=yes'
conn = pyodbc.connect(connectionstring)
cursor = conn.cursor()
cursor.execute('SELECT name FROM sys.databases')
for row in cursor:
print(row)
Output
Code output server name from variable
I have tried using str() on the input with no luck. Not really sure why it doesn't like when I get the server name from an input because it returns a string and the connection string is exactly the same as the code that works.
I am kind of curious why this is the case not really looking to use input() on any real project
I tried:
ins1 = input("Please Enter your SQL Instance: ")
ins2 = 'DESKTOP-J7PBL8S\\NORTHWIND'
print(ins1)
print(ins2)
Then pass the DESKTOP-J7PBL8S\\NORTHWIND as input and the output was:
DESKTOP-J7PBL8S\\NORTHWIND
DESKTOP-J7PBL8S\NORTHWIND
So I think we found the problem.
The input() returns the raw string without considering \ special meaning.
So if you use just one \-> DESKTOP-J7PBL8S\NORTHWIND, you should be able to connect to the intended server.
I am trying to connect to an Oracle Data instance (ORAD) with a Python script.
Here is the basic script:
import cx_Oracle
conn = cx_Oracle.connect("username/password#//server:1560/orad")
c = conn.cursor()
c.execute('select distinct * from table1')
for row in c:
print(row)
conn.close()
I currently have the instance's port, SID, and hostname, too, if that helps.
Running this script yields a: cx_Oracle.DatabaseError: ORA-12514: TNS:listener does not currently know of service requested in connect descriptor error,
while using the other connections (that is commented out) yields an error SyntaxError: invalid syntax
I am unsure of what I am doing wrong. I did check my TNSNAMES.ORA file which contains a few ifile links to DBA secured (I don't have access to see or edit) other files.
I have viewed this post and this post, but I don't have the IP, just the host name.
Any assistance would be appreciated.
The following answer and script worked:
def get_data(database, username, password, sql_statement):
import cx_Oracle
dsn_tns = cx_Oracle.makedsn('<server>', '<port>', '<sid>')
connection = cx_Oracle.connect(username, password, dsn_tns)
c = connection.cursor()
c.execute(sql_statement)
# Print the returning dataset
for row in c:
print(row)
# Close the connection
connection.close()
can someone help me solve this problem?
I'm using Blender 2.74 and Python 3.4 with the correct connector for MySQL. (By the way, I'm just a beginner in using Blender and Python.)
What I want is to make a login UI and save the inputted name into the database, but my code seems a bit off or wrong. When I try to run the code, it didn't save the value in the variable, but when i try to run it in python IDE (PyCharm) it worked.
Here's the code:
import sys
sys.path.append('C:\Python34\Lib\site-packages')
sys.path.append('C:\Python34\DLLs')
import mysql.connector
import bge
bge.render.showMouse(1)
cont = bge.logic.getCurrentController()
own = cont.owner
sensor = cont.sensors ["enter"]
pname = own.get("prpText")
enter = cont.sensors ["enter"]
numpadenter = cont.sensors ["numpadenter"]
if enter.positive or numpadenter.positive:
db = mysql.connector.connect(user='root', password='', host='localhost', database='dbname')
cursor = db.cursor()
cursor.execute("INSERT INTO tblname VALUE(%s", (pname))
#this are the other codes that i have tried so far:
#add_player = ("INSERT INTO storymode " "(PlayerName) " "VALUES (%s)")
#data_player = (pname)
#cursor.execute(add_player, data_player)
#cursor.execute("INSERT INTO storymode" "(PlayerName)" "VALUES (%(pname)s)")
db.commit()
db.close()
The Error is:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your mysql server version for the right syntax to use near '%s' at line 1.
Can someone tell what i need to do here? Do I need some add-ons for it to work?
Thank you very much for reading my post and for the people who will give their opinions.
Looks like you're missing a closing parenthesis and an 'S' in you're sql INSERT statement?
INSERT INTO tblname VALUE (%s
needs to be
INSERT INTO tblname VALUES (%s)