I am attempting to update a column on an oracle database with new values which have been calculated and inputted into a pandas dataframe. The table name in the database is protein_info and the column I want to update is pct. I'm getting the following error when I run my code:
Traceback (most recent call last):
File "./update_nsaf.py", line 81, in
df.to_sql(protein_info, engine, index=False, if_exists='replace')
AttributeError: type object 'protein_info' has no attribute 'lower'
df = df[['id', 'pct']]
engine=create_engine('oracle://scott:tiger#localhost:5432/mydatabase', echo=False)
connect = engine.raw_connection()
df.to_sql(protein_info, engine, index=False, if_exists='replace')
sql = """
UPDATE protein_info
SET protein_info.pct = pct
FROM protein_info
WHERE protein_info.id = id
"""
connect.execute(sql)
connect.close()
Related
I want to delete rows based on a datetime filter.
I created a table with DateTime column without timezone using similar script.
class VolumeInfo(Base):
...
date: datetime.datetime = Column(DateTime, nullable=False)
Then I try to delete rows using such filter
days_interval = 10
to_date = datetime.datetime.combine(
datetime.datetime.utcnow().date(),
datetime.time(0, 0, 0, 0),
).replace(tzinfo=None)
from_date = to_date - datetime.timedelta(days=days_interval)
query = delete(VolumeInfo).where(Volume.date < from_date)
Unexpectedly, sometimes there is no error, but sometimes there is the error:
Traceback (most recent call last):
...
File "script.py", line 381, in delete_volumes
db.execute(query)
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/session.py", line 1660, in execute
) = compile_state_cls.orm_pre_session_exec(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 1843, in orm_pre_session_exec
update_options = cls._do_pre_synchronize_evaluate(
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 2007, in _do_pre_synchronize_evaluate
matched_objects = [
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/persistence.py", line 2012, in <listcomp>
and eval_condition(state.obj())
File "/usr/local/lib/python3.10/site-packages/sqlalchemy/orm/evaluator.py", line 211, in evaluate
return operator(eval_left(obj), eval_right(obj))
TypeError: can't compare offset-naive and offset-aware datetimes
Using python3.10 in docker (image python:3.10-slim) and postgreSQL database with psycopg2 driver.
I have already tried all possible options, but this error appears every once in a while
How can i solve this? or where I made a mistake?
UPD1:
I encoded a value from input file and inserted into Sqlite DB
cur.execute('''INSERT INTO Locations (address, geodata)
VALUES ( ?, ? )''', (memoryview(address.encode()), memoryview(data.encode()) ) )
Now I'm trying to decode it but I'm getting
Traceback (most recent call last):
File "return.py", line 9, in
print(c.decode('utf-8'))
AttributeError: 'tuple' object has no attribute 'decode'
My code looks like this:
import sqlite3
conn = sqlite3.connect('geodata.sqlite')
cur = conn.cursor()
cur.execute('SELECT address FROM Locations')
for c in cur:
print(c.decode('utf-8'))
Regardless of how many columns are selected, rows are returned as tuples. You would get the first element of the tuple the usual way.
I see couple of posts post1 and post2 which are relevant to my question. However while following post1 solution I am running into below error.
joinedDF = df.join(df_agg, "company")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark/python/pyspark/sql/dataframe.py", line 1050, in join
jdf = self._jdf.join(other._jdf, on, how)
AttributeError: 'NoneType' object has no attribute '_jdf'
Entire code snippet
df = spark.read.format("csv").option("header", "true").load("/home/ec2-user/techcrunch/TechCrunchcontinentalUSA.csv")
df_agg = df.groupby("company").agg(func.sum("raisedAmt").alias("TotalRaised")).orderBy("TotalRaised", ascending = False).show()
joinedDF = df.join(df_agg, "company")
on the second line you have .show at the end
df_agg = df.groupby("company").agg(func.sum("raisedAmt").alias("TotalRaised")).orderBy("TotalRaised", ascending = False).show()
remove it like this:
df_agg = df.groupby("company").agg(func.sum("raisedAmt").alias("TotalRaised")).orderBy("TotalRaised", ascending = False)
and your code should work.
You used an action on that df and assigned it to df_agg variable, thats why your variable is NoneType(in python) or Unit(in scala)
I am reading a CSV file with pandas, and then I try to find a word like "Net income" in the first column. Then I want to use the whole row which has this structure: string/number/number/number/... to do some calculations with the numbers.
The problem is that find is not working.
data = pd.read_csv(name)
data.str.find('Net income')
Traceback (most recent call last):
File "C:\Users\thoma\Desktop\python programme\manage.py", line 16, in <module>
data.str.find('Net income')
I am using CSV files from here: Income Statement for Deutsche Lufthansa AG (DLAKF) from Morningstar.com
I found this: Python | Pandas Series.str.find() - GeeksforGeeks
Traceback (most recent call last):
File "C:\Users\thoma\Desktop\python programme\manage.py", line 16, in <module>
data.str.find('Net income')
File "C:\Users\thoma\AppData\Roaming\Python\Python37\site-packages\pandas\core\generic.py", line 5067, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'str'
So, it works now. But I still have a question. After using the describe function with pandas I get this:
<bound method NDFrame.describe of 2014-12 615
2015-12 612
2016-12 636
2017-12 713
2018-12 736
Name: Goodwill, dtype: object>
I have problems to use the data. So how can I f.e. use the second column here? I tried to do a new table:
new_Table['Goodwill'] = data1['Goodwill'].describe
but this does not work.
I also would like to add more "second" columns to new_Table.
Hi you should filter the column name like df[‘col name’].str.find(x) this required a series not a data frame.
I recommend setting your header row if pandas isnt recognizing named rows in your CSV file.
Something like:
new_header = data.iloc[0] #grab the first row for the header
data = data[1:] #take the data less the header row
data.columns = new_header
From there you can summarize each column by name:
data['Net Income'].describe
edit: I looked at the csv file, I recommend reshaping the data first before analyzing columns.Something like...
data=data.transpose
So in summation:
data = pd.read_csv(name)
data=data.transpose #flip the columns/rows
new_header = data.iloc[0] #grab the first row for the header
data = data[1:] #take the data less the header row
data.columns = new_header
data['Net Income'].describe #analyze
I have a large matrix (15000 rows x 2500 columns) stored using PyTables and getting see how to iterate over the columns of a row. In the documentation I only see how to access each row by name manually.
I have columns like:
ID
X20160730_Day10_123a_2
X20160730_Day10_123b_1
X20160730_Day10_123b_2
The ID column value is a string like '10692.RFX7' but all other cell values are floats. This selection works and I can iterate the rows of results but I cannot see how to iterate over the columns and check their values:
from tables import *
import numpy
def main():
h5file = open_file('carlo_seth.h5', mode='r', title='Three-file test')
table = h5file.root.expression.readout
condition = '(ID == b"10692.RFX7")'
for row in table.where(condition):
print(row['ID'].decode())
for col in row.fetch_all_fields():
print("{0}\t{1}".format(col, row[col]))
h5file.close()
if __name__ == '__main__':
main()
If I just iterate with "for col in row" nothing happens. As the code is above, I get a stack:
10692.RFX7
Traceback (most recent call last):
File "tables/tableextension.pyx", line 1497, in tables.tableextension.Row.__getitem__ (tables/tableextension.c:17226)
KeyError: b'10692.RFX7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tables/tableextension.pyx", line 126, in tables.tableextension.get_nested_field_cache (tables/tableextension.c:2532)
KeyError: b'10692.RFX7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./read_carlo_pytable.py", line 31, in <module>
main()
File "./read_carlo_pytable.py", line 25, in main
print("{0}\t{1}".format(col, row[col]))
File "tables/tableextension.pyx", line 1501, in tables.tableextension.Row.__getitem__ (tables/tableextension.c:17286)
File "tables/tableextension.pyx", line 133, in tables.tableextension.get_nested_field_cache (tables/tableextension.c:2651)
File "tables/utilsextension.pyx", line 927, in tables.utilsextension.get_nested_field (tables/utilsextension.c:8707)
AttributeError: 'numpy.bytes_' object has no attribute 'encode'
Closing remaining open files:carlo_seth.h5...done
You can access a column value by name in each row:
for row in table:
print(row["10692.RFX7"])
Iterate over all columns:
names = table.coldescrs.keys()
for row in table:
for name in names:
print(name, row[name])