SQLite date string comparison. SAME FORMAT not working - string

Please Help! This has been driving me nuts. YES I have looked at dozens of Stack Overflow answers for similar questions, and they all say "Your dates should be in a consistent accepted format".
All of my SQLite dates are stored in the following string format:
'yyyy-MM-dd HH:mm:ss'
I'm trying to do date comparison but NOTHING works. For instance:
SELECT transactiondate
FROM transactions
WHERE transactiondate >= '2010-02-19 00:00:00'
Doesn't work.. I get 0 records returned.
Even though when I run it without the WHERE clause I get returned date strings like:
'2017-02-25 00:00:00'
In fact, simply doing:
SELECT transactiondate
FROM transactions
WHERE '2017-02-25 00:00:00' >= '2010-02-19 00:00:00'
Does work, i.e. it compares the dates successfully in its raw form but not when referring to the table column value.
PLEASE.. WTF am I doing wrong?
Also, I've tried every permutation of datetime(...), date(...) in the SQL to no avail. I have also tried using ... BETWEEN ... AND ..., it doesn't help.
Thanks in advance if anyone can figure this out.

You got the ' in the front and end included in your data. In column transactiondate.
Then you also need to include ' chars in your WHERE filter too. Two ways to do that in SQLite:
select * from transactions where transactiondate >= "'2010-02-19 00:00:00'";
select * from transactions where transactiondate >= '''2010-02-19 00:00:00''';
However, I suggest you clean up your data instead:
update transactions
set transactiondate=substr(transactiondate,2,length(transactiondate)-2)
where transactiondate like "'%'";
Now you can use your original WHERE without the extra ' chars.

OK, This is not an answer. But in SO you can't do this kind of thing in a comment.
I can't replicate the condition you have. The code, in particular the SELECT, works well enough executed from Python.
>>> import sqlite3
>>> conn = sqlite3.connect(':memory:')
>>> cur = conn.cursor()
>>> cur.execute('CREATE TABLE transactions (transactionsdate string)')
<sqlite3.Cursor object at 0x000000000531CCE0>
>>> cur.execute('''INSERT INTO transactions VALUES ('2010-02-10 00:00:00')''')
<sqlite3.Cursor object at 0x000000000531CCE0>
>>> cur.execute('''INSERT INTO transactions VALUES ('2017-02-10 00:00:00')''')
<sqlite3.Cursor object at 0x000000000531CCE0>
>>> list(cur.execute("SELECT * from transactions where transactionsdate >= '2010-02-19 00:00:00'"))
[('2017-02-10 00:00:00',)]
>>> list(cur.execute("SELECT * from transactions"))
[('2010-02-10 00:00:00',), ('2017-02-10 00:00:00',)]
Stab in the dark: Can you dump the contents of the field into a text file and examine it using (say) Notepad++, which would enable to to see non-printing characters that are interfering?

Related

How to query date by year?

Following query in SQLite Studio for a date in 2021 or 2022 returns all rows correspondingly:
select * FROM daily_reports
WHERE strftime('%Y', daily_reports.reporting_date) in ('2021','2022');
Same query in Python returns no rows and the where clause
created is ... where 1 = 0:
q = ...
q = q.filter((func.strftime('%Y', DailyReportItem.reporting_date) in ['2021','2022']))
How to make it work?
The problem is that the statement you passed into .filter() does not get converted into SQL, it is executed as false in Python right away and then passed down to .filter() as argument.
The way you do the IN operator in SQLAlchemy is via DailyReportItem.reporting_date.in_(['2021', '2022']).
So in your case it should be something like this: q.filter(func.strftime('%Y', DailyReportItem.reporting_date).in_(['2021', '2022']))
I would rather just check if date is bigger than the beginning of the year 2021 and lower than the end of 2022. This way you would also get a better use of indexes.

Python Warning Panda Dataframe "Simple Issue!" - "A value is trying to be set on a copy of a slice from a DataFrame"

first post / total Python novice so be patient with my slow understanding!
I have a dataframe containing a list of transactions by order of transaction date.
I've appended an additional new field/column called ["DB/CR"], that dependant on the presence of "-" in the ["Amount"] field populates 'Debit', else 'Credit' in the absence of "-".
Noting the transactions are in date order, I've included another new field/column called [Top x]. The output of which is I want to populate and incremental independent number (starting at 1) for both debits and credits on a segregated basis.
As such, I have created a simple loop with a associated 'if' / 'elif' (prob could use else as it's binary) statement that loops through the data sent row 0 to the last row in the df and using an if statement 1) "Debit" or 2) "Credit" increments the number for each independently by "Debit" 'i' integer, and "Credit" 'ii' integer.
The code works as expected in terms of output of the 'Top x'; however, I always receive a warning "A value is trying to be set on a copy of a slice from a DataFrame".
Trying to perfect my script, without any warnings I've been trying to understand what I'm doing incorrect but not getting it in terms of my use case scenario.
Appreciate if someone can kindly shed light on / propose how the code needs to be refactored to avoid receiving this error.
Code (the df source data is an imported csv):
#top x debits/credits
i = 0
ii = 0
for ind in df.index:
if df["DB/CR"][ind] == "Debit":
i = i+1
df["Top x"][ind] = i
elif df["DB/CR"][ind] == "Credit":
ii = ii+1
df["Top x"][ind] = ii
Interpreter
df["Top x"][ind] = i
G:\Finances Backup\venv\Statementsv.03.py:173: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df["Top x"][ind] = ii
Many thanks :)
You should use df.loc["DB/CR", ind] = "Debit"
Use iterrows() to iterate over the DF. However, updating DF while iterating is not preferable
see documentation here
Refer to the documentation here Iterrows()
You should never modify something you are iterating over. This is not
guaranteed to work in all cases. Depending on the data types, the
iterator returns a copy and not a view, and writing to it will have no
effect.

Query date range and product size from xlsx file

I'm using python 3.6 to do this. Below are just a few important columns that I'm interested to query out.
Auto-Gen Index : Product Container : Ship Date :.......
0 : Large Box : 2017-01-09:.......
1 : Large Box : 2012-07-15:.......
2 : Small Box : 2012-07-18:.......
3 : Large Box : 2012-07-31:.......
I would like to query rows that indicate Large Box as their product container and the shipping date must be in the period of July in the year of 2012.
file_name = r'''Sample-Superstore-Subset-Excel.xlsx'''
df = read_excel(file_name, sheet_name = my_sheet)
lb = df.loc[df['Product Container'] == 'Large Box'] //Get large box
july = lb[(lb['Ship Date'] > '2012-07-01') & (lb['Ship Date'] < '2012-07-31')]
I just wonder how to use query and where condition by python(pd.query())?
If your question is when to use loc vs where, see my answer here:
Think of loc as a filter - give me only the parts of the df that
conform to a condition.
where originally comes from numpy. It runs over an array and checks if
each element fits a condition. So it gives you back the entire array,
with a result or NaN. A nice feature of where is that you can also get
back something different, e.g. df2 = df.where(df['Goals']>10,
other='0'), to replace values that don't meet the condition with 0.
If you are asking when to use query, AFAIK there is no real reason to do besides performance. If you have a very large dataset, query is expected to be faster. More on high-level performance here.

Reordering data by manipulating column wise in Python

I have data in a csv file as follows:
60,27702,1938470,13935,18513,8
60,32424,1933740,16103,15082,11
60,20080,1946092,9335,14970,2
60,28236,1937936,13799,16871,6
60,22717,1943455,10809,16726,4
120,37702,2938470,23935,28513,8
120,42424,2933740,26103,25082,11
120,30080,2946092,2335,24970,2
120,38236,2937936,23799,26871,6
120,32717,2943455,20809,26726,4
180,47702,3938470,33935,8513,8
180,52424,3933740,36103,5082,11
180,40080,3946092,3335,4970,2
180,48236,3937936,33799,6871,6
180,42717,3943455,30809,6726,4
I then used the following code to insert column heading:
df = pd.read_csv("contikiMAC_new_out.csv", names=['Energest','CPU','LPM','Transmit','Listen','ID'])
I used df.groupby(['ID']) to see the data in group according to column 'ID'.
The problem is the data in column 'LPM' gets reset after some time so I would like to add the previous value with the new value whenever the new value in LPM column is smaller for specific 'ID' .
I tried doing :
for x in df.groupby(['ID']):
for i in df.ID:
if (df.loc[i, 'LPM'] < df.loc[i - 1, 'LPM']):
df.loc[i, 'LPM'] = df.loc[i, 'LPM'] + df.loc[i - 1, 'LPM']
But actually not getting the fruitful result I desire because it mixes with the 'LPM' value of different 'ID' and the process takes a long time. Can anyone please help me in suggesting a way to write the data group wise in a csv file based on 'ID' after performing the sum operation ?
The data structure I like to see is as follows:
60,27702,1938470,13935,18513,8
120,37702,2938470,23935,28513,8
180,47702,3938470,33935,37026,8
60,32424,1933740,16103,15082,11
120,42424,2933740,26103,25082,11
180,52424,3933740,36103,30164,11
60,20080,1946092,9335,14970,2
120,30080,2946092,2335,24970,2
180,40080,3946092,3335,29940,2
60,28236,1937936,13799,16871,6
120,38236,2937936,23799,26871,6
180,48236,3937936,33799,33742,6
60,22717,1943455,10809,16726,4
120,32717,2943455,20809,26726,4
180,42717,3943455,30809,33452,4
If I understood your problem correctly, DataFrame.shift is what you're looking for.
Something like:
df['LPM_prev'] = df.groupby(['ID'])['LPM'].shift(1)
And then you can work with that column

Importing data from Excel into Access using DAO and WHERE clause

I need to import certain information from an Excel file into an Access DB and in order to do this, I am using DAO.
The user gets the excel source file from a system, he does not need to directly interact with it. This source file has 10 columns and I would need to retrieve only certain records from it.
I am using this to retrieve all the records:
Set destinationFile = CurrentDb
Set dbtmp = OpenDatabase(sourceFile, False, True, "Excel 8.0;")
DoEvents
Set rs = dbtmp.OpenRecordset("SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536]")
My problem comes when I want to retrieve only certain records using a WHERE clause. The name of the field where I want to apply the clause is 'Date (UCT)' (remember that the user gets this source file from another system) and I can not get the WHERE clause to work on it. If I apply the WHERE clause on another field, whose name does not have ( ) or spaces, then it works. Example:
Set rs = dbtmp.OpenRecordset("SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536] WHERE Other = 12925")
The previous instruction will retrieve only the number of records where the field Other has the value 12925.
Could anyone please tell me how can I achieve the same result but with a field name that has spaces and parenthesis i.e. 'Date (UCT)' ?
Thank you very much.
Octavio
Try enclosing the field name in square brackets:
SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536] WHERE [Date (UCT)] = 12925
or if it's a date we are looking for:
SELECT * FROM [EEX_Avail_Cap_ALL_DEU_D1_S_Y1$A1:J65536] WHERE [Date (UCT)] = #02/14/13#;
To use date literal you must enclose it in # characters and write the date in MM/DD/YY format regardless of any regional settings on your machine

Resources