.strftime doesn't apply zero padding on '%Y' in python:3.7-slim Docker image - python-3.x

I found a strange quirk in the slim version of the python Docker image with regards to date formatting. If you pass it a first-century date, %Y-%m-%d formatting doesn’t yield a zero-padded year-part:
$ docker run -ti python:3.7-slim /bin/bash
root#71f21d562837:/# python
Python 3.7.5 (default, Nov 23 2019, 06:10:46)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import date
>>> d = date(197,1,1)
>>> d.strftime('%Y-%m-%d')
'197-01-01'
But running this on the same python version locally on my macbook does yield 4 digits for the year:
$ python
Python 3.7.5 (default, Nov 1 2019, 02:16:32)
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import date
>>> d = date(197,1,1)
>>> d.strftime('%Y-%m-%d')
'0197-01-01'
The Python docs suggest that %y should yield no zero padding while %Y should.
Same quirk for version 3.6-slim.
The problem with this is that some systems (like BigQuery) requires the zero padding.
What would be the most elegant/least hacky workaround for this? I'm building a custom image derived from python:3.7-slim. I'm open to using a different image with a small footprint, or making an elegant code change.

You can always use a manual workaround to get identical formatting on all platforms:
from datetime import date
d = date(197,1,1)
dstr = d.strftime('%Y-%m-%d')
dstr = ('0'+dstr if len(dstr.split('-')[0]) == 3 else dstr)
print(dstr)

Related

sqlite3.OperationalError('near "(": syntax error') in Google Colab

Observing some odd behavior with SQLite 2.6, where the ROW_NUMBER() throws an error only in Google Colab (Python 3.6.9), whereas the code works fine in my local Python 3.6.9 and Python 3.9.1 instances. Can you help me debug this further?
Code
import sqlite3, sys
try:
print('Py.version : ' + (sys.version))
print('sqlite3.version : ' + (sqlite3.version))
print('sqlite3.sqlite_version : ' + (sqlite3.sqlite_version)+'\n')
conn = sqlite3.connect(':memory:')
conn.execute('''CREATE TABLE team_data(team text, total_goals integer);''')
conn.commit()
conn.execute("INSERT INTO team_data VALUES('Real Madrid', 53);")
conn.execute("INSERT INTO team_data VALUES('Barcelona', 47);")
conn.commit()
sql='''
SELECT
team,
ROW_NUMBER () OVER (
ORDER BY total_goals
) RowNum
FROM
team_data
'''
print('### DB Output ###')
cursor = conn.execute(sql)
for row in cursor:
print(row)
except Exception as e:
print('ERROR : ' + str(e))
finally:
conn.close()
Output
Google Colab (ROW_NUMBER() causes SQL to fail):
Py.version : 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
sqlite3.version : 2.6.0
sqlite3.sqlite_version : 3.22.0
### DB Output ###
ERROR : near "(": syntax error
Local Python 3.6.9 (Succeeds):
Py.version : 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 14:00:49) [MSC v.1915 64 bit (AMD64)]
sqlite3.version : 2.6.0
sqlite3.sqlite_version : 3.33.0
### DB Output ###
('Barcelona', 1)
('Real Madrid', 2)
Local Python 3.9.1 (Succeeds):
Py.version : 3.9.1 (default, Dec 11 2020, 09:29:25) [MSC v.1916 64 bit (AMD64)]
sqlite3.version : 2.6.0
sqlite3.sqlite_version : 3.33.0
### DB Output ###
('Barcelona', 1)
('Real Madrid', 2)
Note: Above SQL and code is simplified for error reproduction purposes only
The query in question is a window function and support for that was added in version 3.25. You can check the library (opposed to package) version with sqlite3.sqlite_version or as #forpas shared with the query select sqlite_version().
You can upgrade your sqlite version. Use this code.
!add-apt-repository -y ppa:sergey-dryabzhinsky/packages
!apt update
!apt install sqlite3
# MENU: Runtime > Restart runtime
import sqlite3
sqlite3.sqlite_version # '3.33.0'

split username & password from URL in 3.8+ (splituser is deprecated, no alternative)

trying to filter out the user-password from a URL.
(I could've split it manually by the last '#' sign, but I'd rather use a parser)
Python gives a deprecation warning but urlparse() doesn't handle user/password.
Should I just trust the last-#-sign, or is there a new version of split-user?
Python 3.8.2 (default, Jul 16 2020, 14:00:26)
[GCC 9.3.0] on linux
>>> url="http://usr:pswd#www.site.com/path&var=val"
>>> import urllib.parse
>>> urllib.parse.splituser(url)
<stdin>:1: DeprecationWarning: urllib.parse.splituser() is deprecated as of 3.8, use urllib.parse.urlparse() instead
('http://usr:pswd', 'www.site.com/path&var=val')
>>> urllib.parse.urlparse(url)
ParseResult(scheme='http', netloc='usr:pswd#www.site.com', path='/path&var=val', params='', query='', fragment='')
#neigher with allow_fragments:
>>> urllib.parse.urlparse(url,allow_fragments=True)
ParseResult(scheme='http', netloc='us:passw#ktovet.com', path='/all', params='', query='var=val', fragment='')
(Edit: the repr() output is partial & misleading; see my answer.)
It's all there, clear and accessible.
What went wrong: The repr() here is misleading, showing only few properties / values (why? it's another question).
The result is available with explicit property get:
>>> url = 'http://usr:pswd#www.sharat.uk:8082/nativ/page?vari=valu'
>>> p = urllib.parse.urlparse(url)
>>> p.port
8082
>>> p.hostname
'www.sharat.uk'
>>> p.password
'pswd'
>>> p.username
'usr'
>>> p.path
'/nativ/page'
>>> p.query
'vari=valu'
>>> p.scheme
'http'
Or as a one-liner (I just needed the domain):
>>> urllib.parse.urlparse('http://usr:pswd#www.sharat.uk:8082/nativ/page?vari=valu').hostname
www.shahart.uk
Looking at the source code for splituser, looks like they simply use str.rpartition:
def splituser(host):
warnings.warn("urllib.parse.splituser() is deprecated as of 3.8, "
"use urllib.parse.urlparse() instead",
DeprecationWarning, stacklevel=2)
return _splituser(host)
def _splituser(host):
"""splituser('user[:passwd]#host[:port]') --> 'user[:passwd]', 'host[:port]'."""
user, delim, host = host.rpartition('#')
return (user if delim else None), host
which yes, relies on the last occurrence of #.
EDIT: urlparse still has all these fields, see Berry's answer

How to determine the appropriate the timezone to apply for historical dates in a give region in python3

I'm using python3 on Ubuntu 20.04.
I have a trove of files with naive datetime strings in them, dating back more than 20 years. I know that all of these datetimes are in the Pacific Timezone. I would like to convert them all to UTC datetimes.
However, whether they are relative to PDT or PST is a bigger question. Since when PDT/PST changes has changed over the last 20 years, it's not just a matter of doing a simple date/month threshold to figure out whether to apply the pdt or pst timezone. Is there an elegant way to make this determination and apply it?
Note upfront, for Python 3.9+: use zoneinfo from the standard library, no need anymore for a third party library. Example.
Here's what you can to do set the timezone and convert to UTC. dateutil will take DST changes from the IANA database.
from datetime import datetime
import dateutil
datestrings = ['1991-04-06T00:00:00', # PST
'1991-04-07T04:00:00', # PDT
'1999-10-30T00:00:00', # PDT
'1999-10-31T02:01:00', # PST
'2012-03-11T00:00:00', # PST
'2012-03-11T02:00:00'] # PDT
# to naive datetime objects
dateobj = [datetime.fromisoformat(s) for s in datestrings]
# set timezone:
tz_pacific = dateutil.tz.gettz('US/Pacific')
dtaware = [d.replace(tzinfo=tz_pacific) for d in dateobj]
# with pytz use localize() instead of replace
# check if has DST:
# for d in dtaware: print(d.dst())
# 0:00:00
# 1:00:00
# 1:00:00
# 0:00:00
# 0:00:00
# 1:00:00
# convert to UTC:
dtutc = [d.astimezone(dateutil.tz.UTC) for d in dtaware]
# check output
# for d in dtutc: print(d.isoformat())
# 1991-04-06T08:00:00+00:00
# 1991-04-07T11:00:00+00:00
# 1999-10-30T07:00:00+00:00
# 1999-10-31T10:01:00+00:00
# 2012-03-11T08:00:00+00:00
# 2012-03-11T09:00:00+00:00
Now if you'd like to be absolutely sure that DST (PDT vs. PST) is set correctly, you'd have to setup test cases and verify against IANA I guess...

Python3.6.6 argparse for negative string values with list of arguments

I'm having issues with negative strings being fed into argparse and resulting in errors. I am wondering if anyone has figured out a way around this. Unfortunately, I need the strings to be prepended by a negative sign in some cases, so I cannot fix this by removing the negation part.
I've taken a look at some other stackoverflow pages, such as How to parse positional arguments with leading minus sign (negative numbers) using argparse but still have no solution for this. I'm sure that someone must have a solution for this!
Here is what I have tried and am seeing:
PyDev console: starting.
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
import argparse
argparser = argparse.ArgumentParser()
# arguments:
argparser.add_argument("--configfile", "--config", type=str, default=None,
help="A config file to parse (see src/configs/sample_config.ini for more details).")
argparser.add_argument("--starttime", "--st", type=str, default="-1h#m",
help="Starting time for the dump; default: one hour ago rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--endtime", "--et", type=str, default="#m",
help="Ending time for the dump; default: current time rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--indexes", "--i", type=str, nargs="+", default="-config-",
help="Provide one or more indexes (comma-separated with quotes around the list; "
"ex: \"index1, index two, index3\") to search on. Default is \"-config-\" "
"which means that the indeces will be gathered from file names; see the sample "
"config for details.")
argparser.add_argument("--format", "--f", type=str, default="csv",
help="Write data out to CSV file.")
import shlex
args = argparser.parse_args(shlex.split("--configfile /somepath/sample_config_test04.ini --endtime now --indexes \"index one\" index2 index3 --format csv --starttime -5h#m"))
usage: pydevconsole.py [-h] [--configfile CONFIGFILE] [--starttime STARTTIME]
[--endtime ENDTIME] [--indexes INDEXES [INDEXES ...]]
[--format FORMAT]
pydevconsole.py: error: argument --starttime/--st: expected one argument
Process finished with exit code 2
As you can see the above fails, but the following works fine, so I am sure that it's an issue with the "-":
PyDev console: starting.
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
import argparse
argparser = argparse.ArgumentParser()
# arguments:
argparser.add_argument("--configfile", "--config", type=str, default=None,
help="A config file to parse (see src/configs/sample_config.ini for more details).")
argparser.add_argument("--starttime", "--st", type=str, default="-1h#m",
help="Starting time for the dump; default: one hour ago rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--endtime", "--et", type=str, default="#m",
help="Ending time for the dump; default: current time rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--indexes", "--i", type=str, nargs="+", default="-config-",
help="Provide one or more indexes (comma-separated with quotes around the list; "
"ex: \"index1, index two, index3\") to search on. Default is \"-config-\" "
"which means that the indeces will be gathered from file names; see the sample "
"config for details.")
argparser.add_argument("--format", "--f", type=str, default="csv",
help="Write data out to CSV file.")
import shlex
args = argparser.parse_args(shlex.split("--configfile /somepath/sample_config_test04.ini --endtime now --indexes \"index one\" index2 index3 --format csv --starttime 5h#m"))
print(args)
Namespace(configfile='/somepath/sample_config_test04.ini', endtime='now', format='csv', indexes=['index one', 'index2', 'index3'], starttime='5h#m')
You may be wondering why I'm calling the code like this; that's because I need to do some unittest-ing of the argparse calls that I am running, so I need to be able to call it from both the commandline as well as from my unittesting code.
If I call the same code from commandline without any quotes or \ in front of the -5h#m, that seems to work fine, but only for commandline (gets converted to \-5h#m). I have tried --starttime \"-5h#m\" and --starttime \-5h#m, -5h#m, '\-5h#m', etc. but nothing seems to be accepted and correctly parsed by argparse other than cmdl input.
The error is typically:
test.py: error: argument --starttime/--st: expected one argument
Any help would be greatly appreciated!
Updates: changing the input to be alike to -configfile=/somepath/sample_config_test04.ini -endtime=now -indexes="index one, index2, index3" -format=csv -starttime=-5h#m seems to work from command line.
NOTE: I'd like to keep this answer mainly because the other suggested answer has a very weirdly phrased title that I would not be have been able to find by doing a google search for what I needed. I did update the question to also reflect that I also had listed items as part of it.

Developing module and using it in Spyder

I'm trying to develop a python module, which I then want to use in Spyder.
Here is how my files are organized in my module :
testing_the_module.py
myModule
-> __init__.py
-> sql_querying.py #contains a function called sql()
testing_the_module.py contains :
import myModule
print(myModule.sql_querying.sql(query = "show tables")) # how this function works is not relevant
__init__.py contains
import myModule.sql_querying
When I use the command line, it works :
> python3 .\testing_the_module.py
[{
'query': 'show tables',
'result': ['table1', 'table2']
}]
It also works if I use the python console :
> python3
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import myModule
>>> print(myModule.sql_querying.sql(query = "show tables"))
[{
'query': 'show tables',
'result': ['table1', 'table2']
}]
However, when using Spyder, I can't get it to work. Here is what I get when I run (with F9) each of those lines :
import myModule
# no error message
print(myModule.sql_querying.sql(query = "show tables"))
AttributeError: module 'myModule' has no attribute 'sql_querying'
Any idea of why and how to make it work in Spyder ?
Edit to answer comment :
In [665]: sys.path
Out[665]:
['',
'C:\\ProgramData\\Anaconda3\\python36.zip',
'C:\\ProgramData\\Anaconda3\\DLLs',
'C:\\ProgramData\\Anaconda3\\lib',
'C:\\ProgramData\\Anaconda3',
'C:\\ProgramData\\Anaconda3\\lib\\site-packages',
'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\Sphinx-1.5.6-py3.6.egg',
'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\win32',
'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\win32\\lib',
'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\Pythonwin',
'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\setuptools-27.2.0-py3.6.egg',
'C:\\ProgramData\\Anaconda3\\lib\\site-packages\\IPython\\extensions',
'C:\\Users\\fmalaussena\\.ipython']

Resources