Following query in SQLite Studio for a date in 2021 or 2022 returns all rows correspondingly:
select * FROM daily_reports
WHERE strftime('%Y', daily_reports.reporting_date) in ('2021','2022');
Same query in Python returns no rows and the where clause
created is ... where 1 = 0:
q = ...
q = q.filter((func.strftime('%Y', DailyReportItem.reporting_date) in ['2021','2022']))
How to make it work?
The problem is that the statement you passed into .filter() does not get converted into SQL, it is executed as false in Python right away and then passed down to .filter() as argument.
The way you do the IN operator in SQLAlchemy is via DailyReportItem.reporting_date.in_(['2021', '2022']).
So in your case it should be something like this: q.filter(func.strftime('%Y', DailyReportItem.reporting_date).in_(['2021', '2022']))
I would rather just check if date is bigger than the beginning of the year 2021 and lower than the end of 2022. This way you would also get a better use of indexes.
first post / total Python novice so be patient with my slow understanding!
I have a dataframe containing a list of transactions by order of transaction date.
I've appended an additional new field/column called ["DB/CR"], that dependant on the presence of "-" in the ["Amount"] field populates 'Debit', else 'Credit' in the absence of "-".
Noting the transactions are in date order, I've included another new field/column called [Top x]. The output of which is I want to populate and incremental independent number (starting at 1) for both debits and credits on a segregated basis.
As such, I have created a simple loop with a associated 'if' / 'elif' (prob could use else as it's binary) statement that loops through the data sent row 0 to the last row in the df and using an if statement 1) "Debit" or 2) "Credit" increments the number for each independently by "Debit" 'i' integer, and "Credit" 'ii' integer.
The code works as expected in terms of output of the 'Top x'; however, I always receive a warning "A value is trying to be set on a copy of a slice from a DataFrame".
Trying to perfect my script, without any warnings I've been trying to understand what I'm doing incorrect but not getting it in terms of my use case scenario.
Appreciate if someone can kindly shed light on / propose how the code needs to be refactored to avoid receiving this error.
Code (the df source data is an imported csv):
#top x debits/credits
i = 0
ii = 0
for ind in df.index:
if df["DB/CR"][ind] == "Debit":
i = i+1
df["Top x"][ind] = i
elif df["DB/CR"][ind] == "Credit":
ii = ii+1
df["Top x"][ind] = ii
Interpreter
df["Top x"][ind] = i
G:\Finances Backup\venv\Statementsv.03.py:173: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Top x"][ind] = ii
Many thanks :)
You should use df.loc["DB/CR", ind] = "Debit"
Use iterrows() to iterate over the DF. However, updating DF while iterating is not preferable
see documentation here
Refer to the documentation here Iterrows()
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.
I'm using python 3.6 to do this. Below are just a few important columns that I'm interested to query out.
Auto-Gen Index : Product Container : Ship Date :.......
0 : Large Box : 2017-01-09:.......
1 : Large Box : 2012-07-15:.......
2 : Small Box : 2012-07-18:.......
3 : Large Box : 2012-07-31:.......
I would like to query rows that indicate Large Box as their product container and the shipping date must be in the period of July in the year of 2012.
file_name = r'''Sample-Superstore-Subset-Excel.xlsx'''
df = read_excel(file_name, sheet_name = my_sheet)
lb = df.loc[df['Product Container'] == 'Large Box'] //Get large box
july = lb[(lb['Ship Date'] > '2012-07-01') & (lb['Ship Date'] < '2012-07-31')]
I just wonder how to use query and where condition by python(pd.query())?
If your question is when to use loc vs where, see my answer here:
Think of loc as a filter - give me only the parts of the df that
conform to a condition.
where originally comes from numpy. It runs over an array and checks if
each element fits a condition. So it gives you back the entire array,
with a result or NaN. A nice feature of where is that you can also get
back something different, e.g. df2 = df.where(df['Goals']>10,
other='0'), to replace values that don't meet the condition with 0.
If you are asking when to use query, AFAIK there is no real reason to do besides performance. If you have a very large dataset, query is expected to be faster. More on high-level performance here.
I have data in a csv file as follows:
60,27702,1938470,13935,18513,8
60,32424,1933740,16103,15082,11
60,20080,1946092,9335,14970,2
60,28236,1937936,13799,16871,6
60,22717,1943455,10809,16726,4
120,37702,2938470,23935,28513,8
120,42424,2933740,26103,25082,11
120,30080,2946092,2335,24970,2
120,38236,2937936,23799,26871,6
120,32717,2943455,20809,26726,4
180,47702,3938470,33935,8513,8
180,52424,3933740,36103,5082,11
180,40080,3946092,3335,4970,2
180,48236,3937936,33799,6871,6
180,42717,3943455,30809,6726,4
I then used the following code to insert column heading:
df = pd.read_csv("contikiMAC_new_out.csv", names=['Energest','CPU','LPM','Transmit','Listen','ID'])
I used df.groupby(['ID']) to see the data in group according to column 'ID'.
The problem is the data in column 'LPM' gets reset after some time so I would like to add the previous value with the new value whenever the new value in LPM column is smaller for specific 'ID' .
I tried doing :
for x in df.groupby(['ID']):
for i in df.ID:
if (df.loc[i, 'LPM'] < df.loc[i - 1, 'LPM']):
df.loc[i, 'LPM'] = df.loc[i, 'LPM'] + df.loc[i - 1, 'LPM']
But actually not getting the fruitful result I desire because it mixes with the 'LPM' value of different 'ID' and the process takes a long time. Can anyone please help me in suggesting a way to write the data group wise in a csv file based on 'ID' after performing the sum operation ?
The data structure I like to see is as follows:
60,27702,1938470,13935,18513,8
120,37702,2938470,23935,28513,8
180,47702,3938470,33935,37026,8
60,32424,1933740,16103,15082,11
120,42424,2933740,26103,25082,11
180,52424,3933740,36103,30164,11
60,20080,1946092,9335,14970,2
120,30080,2946092,2335,24970,2
180,40080,3946092,3335,29940,2
60,28236,1937936,13799,16871,6
120,38236,2937936,23799,26871,6
180,48236,3937936,33799,33742,6
60,22717,1943455,10809,16726,4
120,32717,2943455,20809,26726,4
180,42717,3943455,30809,33452,4
If I understood your problem correctly, DataFrame.shift is what you're looking for.
Something like:
df['LPM_prev'] = df.groupby(['ID'])['LPM'].shift(1)
And then you can work with that column
I need to import certain information from an Excel file into an Access DB and in order to do this, I am using DAO.
The user gets the excel source file from a system, he does not need to directly interact with it. This source file has 10 columns and I would need to retrieve only certain records from it.
I am using this to retrieve all the records:
Set destinationFile = CurrentDb
Set dbtmp = OpenDatabase(sourceFile, False, True, "Excel 8.0;")
DoEvents
Set rs = dbtmp.OpenRecordset("SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536]")
My problem comes when I want to retrieve only certain records using a WHERE clause. The name of the field where I want to apply the clause is 'Date (UCT)' (remember that the user gets this source file from another system) and I can not get the WHERE clause to work on it. If I apply the WHERE clause on another field, whose name does not have ( ) or spaces, then it works. Example:
Set rs = dbtmp.OpenRecordset("SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536] WHERE Other = 12925")
The previous instruction will retrieve only the number of records where the field Other has the value 12925.
Could anyone please tell me how can I achieve the same result but with a field name that has spaces and parenthesis i.e. 'Date (UCT)' ?
Thank you very much.
Octavio
Try enclosing the field name in square brackets:
SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536] WHERE [Date (UCT)] = 12925
or if it's a date we are looking for:
SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536] WHERE [Date (UCT)] = #02/14/13#;
To use date literal you must enclose it in # characters and write the date in MM/DD/YY format regardless of any regional settings on your machine