Errors writing a dataframe to DB2 using Pandas to_sql - python-3.x

I am trying to load data from a pandas dataframe to an IBM DB2 Data Warehouse environment. The table already exists so I am just appending rows to the table. I have built the dataframe to mirror every field in the table exactly.
I am using Pandas to_sql method to try to get the dataframe data to the table. I already know that I am connected to the database, but when I run the code I am getting the following error:
AttributeError: 'function' object has no attribute 'cursor'
I didn't see anything in the pandas documentation about having to define a cursor when using to_sql. Any help would be appreciated.
I tried writing a direct sql insert statement rather than using to_sql but couldn't get that to work properly either. I already have a to_csv method where I'm writing the dataframe to a csv file, so I would like to just use the same dataframe to insert to the table.
I cannot add too much code as this is a company project, but the table has 15 columns with differing datatypes (decimal, character, timestamp).
This is my to_sql statement:
`output_df.to_sql(name='PD5', con=self.db2_conn, schema='REBTEAM', if_exists='append', index=False)`
I expect the table to be loaded with the rows. The test file I'm using has 880 rows, so I would expect the table to have 880 rows.
Here is the entire error message I'm getting:
Warning (from warnings module):
File "C:\Users\dt24358\lib\site-packages\pandas\core\generic.py", line 2531
dtype=dtype, method=method)
UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\dt24358\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "C:\Users\dt24358\Scripts\Pricing Tool\Rebate_GUI_SQL.py", line 100, in <lambda>
command= lambda: self.submit_click(self.path, self.fileName, self.save_location, self.request_var.get(), self.execution_var.get(),self.dt_user_id, self.rebateAggregator))
File "C:\Users\dt24358\Scripts\Pricing Tool\Rebate_GUI_SQL.py", line 210, in submit_click
output_df.to_sql(name='PD5', con=self.db2_conn, schema='REBTEAM', if_exists='append', index=False)
File "C:\Users\dt24358\lib\site-packages\pandas\core\generic.py", line 2531, in to_sql
dtype=dtype, method=method)
File "C:\Users\dt24358\lib\site-packages\pandas\io\sql.py", line 460, in to_sql
chunksize=chunksize, dtype=dtype, method=method)
File "C:\Users\dt24358\lib\site-packages\pandas\io\sql.py", line 1546, in to_sql
table.create()
File "C:\Users\dt24358\lib\site-packages\pandas\io\sql.py", line 572, in create
if self.exists():
File "C:\Users\dt24358\lib\site-packages\pandas\io\sql.py", line 560, in exists
return self.pd_sql.has_table(self.name, self.schema)
File "C:\Users\dt24358\lib\site-packages\pandas\io\sql.py", line 1558, in has_table
return len(self.execute(query, [name, ]).fetchall()) > 0
File "C:\Users\dt24358\lib\site-packages\pandas\io\sql.py", line 1426, in execute
cur = self.con.cursor()
AttributeError: 'function' object has no attribute 'cursor'

Related

Why am I not able to create table in SQLyog using pycharm, while pycharm is connecting to SQLyog?

I was trying to connect and make tables in database using python. I was using pycharm for that. I successfully connect pycharm with SQLyog database.
import pymysql
def CreateConn():
return pymysql.connect(host="localhost",database="myfirstDB",user="root",password="",port=3306)
CreateConn()
but as I was trying to make table by code , it shows some lines of error which I don't understand. I changed SQL database engine to SQLite and also tried to changed IDE to jupyter, still it shows error don't know why.
I tried below code for table creation in SQLyog;
def CreateTable():
conn=CreateConn()
cursor=conn.cursor() #helping to execute your query
query="create table student(sid int primary key auto_increment,name VARCHAR(50),email VARCHAR(50),city VARCHAR(50)"
cursor.execute(query)
conn.commit()
print("table created")
conn.close()
CreateTable()
I expected below table in SQLyog database;
expected result of above code in SQLyog database using pycharm
what is got as a result is below error lines;
Traceback (most recent call last):
File "C:\Users\asus\PycharmProjects\pythonProject\Database\Database.py", line 29, in <module>
CreateTable() #CALLING CREATE TABLE FUNCTION
File "C:\Users\asus\PycharmProjects\pythonProject\Database\Database.py", line 24, in CreateTable
cursor.execute(query)
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\cursors.py", line 148, in execute
result = self._query(query)
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\cursors.py", line 310, in _query
conn.query(q)
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\connections.py", line 548, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\connections.py", line 775, in _read_query_result
result.read()
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\connections.py", line 1156, in read
first_packet = self.connection._read_packet()
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\connections.py", line 725, in _read_packet
packet.raise_for_error()
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\protocol.py", line 221, in raise_for_error
err.raise_mysql_exception(self._data)
File "C:\Users\asus\PycharmProjects\pythonProject\venv\lib\site-packages\pymysql\err.py", line 143, in raise_mysql_exception
raise errorclass(errno, errval)
pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1")

select row in heavy csv

i search how can i select some row with word in line so i use this script
import pandas
import datetime
df = pandas.read_csv(
r"C:StockEtablissement_utf8(1)\StockEtablissement_utf8.csv",
sep=",",
)
communes = ["PERPIGNAN"]
print()
df = df[~df["libelleCommuneEtablissement"].isin(communes)]
print()
so my script work well with a normal csv
but with a heavy Csv (4Go) the scipt say :
Traceback (most recent call last):
File "C:lafinessedufiness.py", line 5, in <module>
df = pandas.read_csv(r'C:StockEtablissement_utf8(1)\StockEtablissement_utf8.csv',
File "C:\Users\\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read
return parser.read(nrows)
File "C:\Users\\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers\readers.py", line 1250, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 883, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1026, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1072, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1172, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1731, in pandas._libs.parsers._try_int64
MemoryError: Unable to allocate 128. KiB for an array with shape (16384,) and data type int64
do you know how can i fix this error please?
The pd.read_csv() function has an option to read the file in chunks, rather than loading it all at once. Use iterator=True and specify a reasonable chunk size (rows per chunk).
import pandas as pd
path = r'C:StockEtablissement_utf8(1)\StockEtablissement_utf8.csv'
it = pd.read_csv(path, sep=',', iterator=True, chunksize=10_000)
communes = ['PERPIGNAN']
filtered_chunks = []
for chunk_df in it:
chunk_df = chunk_df.query('libelleCommuneEtablissement not in #communes')
filtered_chunks.append(chunk_df)
df = pd.concat(filtered_chunks)
As you can see, you don't have enough memory available for Pandas to load that file entirely into memory.
One reason is that based on Python38-32 in the traceback, you're running a 32-bit version of Python, where 4 gigabytes (or is it 3 gigabytes?) is the limit for memory allocations anyway. If your system is 64-bit, you should switch to the 64-bit version of Python, so that's one obstacle less.
If that doesn't help, you'll just also need more memory. You could configure Windows's virtual memory, or buy more actual memory and install it in your system.
If those don't help, then you'll have to come up with a better approach than to load the big CSV entirely into memory.
For one, if you really only care about rows with the string PERPIGNAN (no matter the column; you can really filter it again in your code), you could do grep PERPIGNAN data.csv > data_perpignan.csv and work with that (assuming you have grep; you can do the same filtering with a short Python script).
Since read_csv() accepts any iterable of lines, you can also just do something like
def lines_from_file_including_strings(file, strings):
for i, line in enumerate(file):
if i == 0 or any(string in line for string in strings):
yield line
communes = ["PERPIGNAN", "PARIS"]
with open("StockEtablissement_utf8.csv") as f:
df = pd.read_csv(lines_from_file_including_strings(f, communes), sep=",")
for an initial filter.

Pandas to_sql TypeError unsupported operand type

I am doing a database insertion using Pandas to_sql to move millions of rows into sqlalchemy. I've created a small test csv with only 4 rows so that I know exactly what data is in the file.
Here is the csv format
column_one,column_two,column_three,column_four
0001-1234,db38ad21b3,https://example.com,2
0034-1201,38db21adb3,https://example-two.com,3
My database table is defined with the exact same column names.
df = pd.read_csv("test_repositories.csv",
header=0,
sep=',',
quotechar='"',
dtype={'column_one': str,
'column_two': str,
'column_three': str,
'column_four': int},
error_bad_lines=False)
df = df.where(pd.notnull(df), None)
df.to_sql(self.staging_table, db.engine, self.chunksize, method='multi')
This seems like it should work, however I keep getting the following TypeError saying that the operation schema + "." + name can't support str + int
File "/ingest/utils.py", line 59, in copy_csv_to_temp_table
df.to_sql(self.staging_table, db.engine, self.chunksize, method='multi')
File "/venv/lib/python3.8/site-packages/pandas/core/generic.py", line 2776, in to_sql
sql.to_sql(
File "/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 590, in to_sql
pandas_sql.to_sql(
File "/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 1382, in to_sql
table = SQLTable(
File "/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 700, in __init__
self.table = self._create_table_setup()
File "/venv/lib/python3.8/site-packages/pandas/io/sql.py", line 966, in _create_table_setup
return Table(self.name, meta, *columns, schema=schema)
File "<string>", line 2, in __new__
File "/venv/lib/python3.8/site-packages/sqlalchemy/util/deprecations.py", line 139, in warned
return fn(*args, **kwargs)
File "/venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 537, in __new__
key = _get_table_key(name, schema)
File "/venv/lib/python3.8/site-packages/sqlalchemy/sql/schema.py", line 77, in _get_table_key
return schema + "." + name
TypeError: unsupported operand type(s) for +: 'int' and 'str'
I understand what this error means. However, I don't understand why schema or name would cause a problem as all the column names are clearly strings. Any help is appreciated.
The function signature is:
DataFrame.to_sql(name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None, method=None)
Note that schema is the 3rd positional argument defaulted to None.
Therefore by using:
df.to_sql(self.staging_table, db.engine, self.chunksize, method='multi')
What you think is the chunksize is being interpreted as the schema argument, so change your chunksize to be explicitly named, eg:
df.to_sql(self.staging_table, db.engine, chunksize=self.chunksize, method='multi')

Not able to parse csv file from pandas

I am writing python script in which i am generating two different csv files and then reading these file by using pandas. I am able to read file1 with pandas but getting error while reading file2 which in same format(same column name) as file1 but different/same values. Please find the below error that i am getting and sample code that i am using.
Error:
Traceback (most recent call last):
File "MSReport.py", line 168, in <module>
fail = pd.read_csv('/home/cisapp/msLogFailure.csv', sep=',')
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/cisapp/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 532, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
Code:
df = pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', engine='python')
f_output = df.groupby('MSISDN').last()
#print(df)
print(f_output)
fail = pd.read_csv(BASE_LOCATION+'/msLogFailure.csv', engine='python')
fail = fail['MSISDN']
fail = fail.tolist()
for i in fail:
succ = f_output[f_output.MSISDN != i]
In above sample code there is no error while reading file df = pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', engine='python') but while reading file fail = pd.read_csv(BASE_LOCATION+'/msLogFailure.csv', engine='python') i am facing the error as mentioned above. Please help to resolve.
Note: I am running code by using python3.
I faced the same problem and resolved. So you can check using below idea.
Check the delimitator and mention like below examples
pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', encoding='utf-16', sep='\t')
pd.read_csv(BASE_LOCATION+'/msLog_Success.csv', delim_whitespace=True)
You can also add 'r' before file path.
Otherwise share the file image
Your sample of msLogFailure file looks OK - 6 column names and 6 data fields.
I looked for posts concerning just this error message and I found an advice to:
read the input file into a string variable,
read_csv from this string, e.g. pd.read_csv(io.StringIO(txt),...).
Maybe this will help.

How can I read the csv file in pandas which is separated with ";"?

I started working with pandas in python 3.4 for couple of days. I chose to work on Book-Crossing data set.
The book information table is like this:
The Book rating table is like this:
I want to grab the "ISBN","Book-title" from the book information table and merge it with the book-rating table in which both match the "ISBN" and after that write the results in another csv file.
I used the code below:
udata = pd.read_csv('1', names = ('User_ID', 'ISBN', 'Book-Rating'), encoding="ISO-8859-1", sep=';', usecols=[0,1,2])
uitem = pd.read_csv('2', names = ('ISBN', 'Book-Title'), encoding="ISO-8859-1", sep=';', usecols=[0,1])
ratings = pd.merge(udata, uitem, on='ISBN')
ratings.to_csv('ratings.csv', index=False)
Unfortunately it doesn't work and it gives an error:
Traceback (most recent call last):
File "C:\Users\masoud\Desktop\Dataset\data2\a.py", line 2, in <module>
udata = pd.read_csv('2.csv', names = ('User_ID', 'ISBN', 'Book-Rating'),encoding="ISO-8859-1", sep=';', usecols=[0,1,2])
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 491, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 278, in _read
return parser.read()
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 740, in read
ret = self._engine.read(nrows)
File "C:\WinPython-64bit-3.4.3.6\python-3.4.3.amd64\lib\site-packages\pandas\io\parsers.py", line 1187, in read
data = self._reader.read(nrows)
File "pandas\parser.pyx", line 758, in pandas.parser.TextReader.read (pandas\parser.c:7919)
File "pandas\parser.pyx", line 780, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:8175)
File "pandas\parser.pyx", line 833, in pandas.parser.TextReader._read_rows (pandas\parser.c:8868)
File "pandas\parser.pyx", line 820, in pandas.parser.TextReader._tokenize_rows (pandas\parser.c:8736)
File "pandas\parser.pyx", line 1732, in pandas.parser.raise_parser_error (pandas\parser.c:22105)
pandas.parser.CParserError: Error tokenizing data. C error: Expected 8 fields in line 6452, saw 9
I was wondering if anybody could fix the error?
In the first and second row, change sep to ;.
sep=';'

Resources