Change locale for Google Colab - locale

I want to change the local setting (to change the date format) in GoogleCollab
The following works for me in JupyterNotebook but not in GoogleColab:
locale.setlocale(locale.LC_TIME, 'de_DE.UTF-8')
It always returns the error: unsupported locale setting
I have already looked at many other solutions and tried everything.
One solution to change only the time zone I have seen is this one:
'!rm /etc/localtime
!ln -s /usr/share/zoneinfo/Asia/Bangkok /etc/localtime
!date

I figured this one out after a long time:
In Colab, you will have to install the desired locales. You do this with:
!sudo dpkg-reconfigure locales
This will prompt for a numeric input, e.g. 268 and 269 for Hungarian.
So you enter 268 269.
It will also prompt for the default locale, after installation. Here you will need to select your desired custom locale. This time, it is a numeric selection out of 3-5 options, depending, on how many have you selected at the previous step. In my case, I have selected 3, and the default locale became hu_HU.
You need to restart the Colab runtime: Ctrl + M then .
You need to activate the locale:
import locale
locale.setlocale(locale.LC_ALL, 'hu_HU') <- make sure you do it for the LC_ALL context.
The custom locale is now ready to use with pandas:
pd.to_datetime('2021-01-01').day_name() returns Friday, but
pd.to_datetime('2021-01-01').day_name('hu_HU') returns Péntek

I wasn't successful using German locale on Google Colab, but desired formatting could be obtained as a combination of overriding locale for decimal separator and date formatting.
German formatting rules can be found here.
For custom string formatting nice cheatsheet is here.
from datetime import datetime, timedelta
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import numpy as np
import locale
german_format_str_full = '%Y-%m-%d, %H.%M Uhr'
german_format_str_date = '%Y-%m-%d'
# genereting plot data, xs are dates with not obvious step
xs = np.arange(datetime(year=2021, month=11, day=28, hour=23, minute=59, second=59),
datetime(year=2021, month=12, day=6, hour=23, minute=59, second=59),
timedelta(hours=5,minutes=47,seconds=27))
ys = np.sin(np.arange(0,len(xs),1)) # whatever
# use overwritten locale for comma as decimal point -- German formatting
plt.rcParams['axes.formatter.use_locale'] = True
locale._override_localeconv["decimal_point"]= ','
# plot
fig, ax = plt.subplots(figsize=(9,4))
ax.plot(xs,ys, 'o-')
# set formatting string using mdates from matplotlib
ax.xaxis.set_major_formatter(mdates.DateFormatter(german_format_str_date))
# rotate formatted ticks or use autoformat 'fig.autofmt_xdate()'
plt.xticks(rotation=70)
plt.title('Google Colab plot with German locale style')
plt.show()
It gives me this plot:
If you need to check how formatting settings look like on your machine you can use locale.nl_langinfo(locale.D_T_FMT). For example:
import locale
from datetime import datetime
now = datetime.now()
# find local date time formatting on Google Colab
local_format_str = locale.nl_langinfo(locale.D_T_FMT)
print('local_format_str on Google Colab: ', local_format_str)
print('now in Google Colab default format:', now.strftime(local_format_str))
german_format_str_full = '%Y-%m-%d, %H.%M Uhr'
german_format_str_date = '%Y-%m-%d'
print('now in German format, full:',now.strftime(german_format_str_full))
print('now in German format, only date:',now.strftime(german_format_str_date))
ridiculous_format = '%Y->%m-->%d'
print('now ridiculous_format:',now.strftime(ridiculous_format))

Based on this answer I was able to load german locales. However it needs to be done in two steps: Installing new, german locale. Restarting kernel and loading german locale.
In short:
import os
# Install de_DE
!/usr/share/locales/install-language-pack de_DE
!dpkg-reconfigure locales
# Restart Python process to pick up the new locales
os.kill(os.getpid(), 9)
More detailed version:
It turned out that the list of available locales is pretty short which can be checked like this:
import locale
from datetime import datetime
now = datetime.now()
# find local date time formatting on Google Colab
local_format_str = locale.nl_langinfo(locale.D_T_FMT)
print('local_format_str on Google Colab: ', local_format_str)
print('now in Google Colab default format:', now.strftime(local_format_str))
print('Loading avaliable locales via real names...')
for real_name in set(locale.locale_alias.values()):
try:
locale.setlocale(locale.LC_ALL, real_name)
print('success: real_name = ', real_name)
except:
pass
print('Loading avaliable locales via aliases...')
for alias , real_name in locale.locale_alias.items():
try:
locale.setlocale(locale.LC_ALL, alias)
print('success: alias = ' , alias, ' , real_name = ', real_name)
except:
pass
With output:
local_format_str on Google Colab: %a %b %e %H:%M:%S %Y
now in Google Colab default format: Wed Dec 1 12:10:52 2021
Loading avaliable locales via real names...
success: real_name = en_US.UTF-8
success: real_name = C
Loading avaliable locales via aliases...
As we can see there is no german locale, so it needs to be installed with code:
import os
# Install de_DE
!/usr/share/locales/install-language-pack de_DE
!dpkg-reconfigure locales
# Restart Python process to pick up the new locales
os.kill(os.getpid(), 9)
giving an output:
Generating locales (this might take a while)...
de_DE.ISO-8859-1... done
Generation complete.
dpkg-trigger: error: must be called from a maintainer script (or with a --by-package option)
Type dpkg-trigger --help for help about this utility.
Generating locales (this might take a while)...
de_DE.ISO-8859-1... done
en_US.UTF-8... done
Generation complete.
Then we load german locale locale.setlocale(locale.LC_ALL, 'german') and the same code as at the beginning (remember about importing again packages) gives us:
Loading avaliable locales via real names...
success: real_name = C
success: real_name = en_US.UTF-8
success: real_name = de_DE.ISO8859-1
Loading avaliable locales via aliases...
success: alias = deutsch , real_name = de_DE.ISO8859-1
success: alias = german , real_name = de_DE.ISO8859-1
and the default formatting is more German:
local_format_str on Google Colab: %a %d %b %Y %T %Z
now in Google Colab default format: Mi 01 Dez 2021 12:12:03

Related

How to determine the appropriate the timezone to apply for historical dates in a give region in python3

I'm using python3 on Ubuntu 20.04.
I have a trove of files with naive datetime strings in them, dating back more than 20 years. I know that all of these datetimes are in the Pacific Timezone. I would like to convert them all to UTC datetimes.
However, whether they are relative to PDT or PST is a bigger question. Since when PDT/PST changes has changed over the last 20 years, it's not just a matter of doing a simple date/month threshold to figure out whether to apply the pdt or pst timezone. Is there an elegant way to make this determination and apply it?
Note upfront, for Python 3.9+: use zoneinfo from the standard library, no need anymore for a third party library. Example.
Here's what you can to do set the timezone and convert to UTC. dateutil will take DST changes from the IANA database.
from datetime import datetime
import dateutil
datestrings = ['1991-04-06T00:00:00', # PST
'1991-04-07T04:00:00', # PDT
'1999-10-30T00:00:00', # PDT
'1999-10-31T02:01:00', # PST
'2012-03-11T00:00:00', # PST
'2012-03-11T02:00:00'] # PDT
# to naive datetime objects
dateobj = [datetime.fromisoformat(s) for s in datestrings]
# set timezone:
tz_pacific = dateutil.tz.gettz('US/Pacific')
dtaware = [d.replace(tzinfo=tz_pacific) for d in dateobj]
# with pytz use localize() instead of replace
# check if has DST:
# for d in dtaware: print(d.dst())
# 0:00:00
# 1:00:00
# 1:00:00
# 0:00:00
# 0:00:00
# 1:00:00
# convert to UTC:
dtutc = [d.astimezone(dateutil.tz.UTC) for d in dtaware]
# check output
# for d in dtutc: print(d.isoformat())
# 1991-04-06T08:00:00+00:00
# 1991-04-07T11:00:00+00:00
# 1999-10-30T07:00:00+00:00
# 1999-10-31T10:01:00+00:00
# 2012-03-11T08:00:00+00:00
# 2012-03-11T09:00:00+00:00
Now if you'd like to be absolutely sure that DST (PDT vs. PST) is set correctly, you'd have to setup test cases and verify against IANA I guess...

Python 3: time.tzset() alternative for Windows?

I am new to Python. I am reading about dates and times from the lovely book 'Python 3 Standard Library by Example' by Doug Hellmann and I stumbled upon this code snippet:
import time
import os
def show_zone_info():
print(f'''\
TZ : {os.environ.get('TZ', '(not set)')}
tzname: {time.tzname}
Zone : {time.timezone} ({time.timezone / 3600})
DST : {time.daylight}
Time : {time.ctime()}
''')
if __name__ == '__main__':
print('Default: ')
show_zone_info()
ZONES = [
'GMT',
'Europe/Amsterdam'
]
for zone in ZONES:
os.environ['TZ'] = zone
# time.tzset() # Only available on Unix
print(zone, ':')
show_zone_info()
Problem is, time.tzset() is only available on Unix and without it on Windows machine, timezone doesn't change during the run time of the code. What is the alternative to time.tzset() on Windows? (I am running Python 3.8.3 on Windows 10 at the time of asking this question.)

Why is the date format in Backoffice on local env different from the date format on dev env?

The date format on local env is not the same as on DEV env.
=== English locale ===
"Jan 23, 2035 3:00:00 AM" - LOCAL
"Jan 23, 35, 3:00:00 AM" - DEV
=== Chinese locale ===
"2035. 1. 23 오전 3:00:00" - LOCAL
"35. 1. 23. 오전 3:00:00" - DEV
Why on DEV it looks different and how to fix it?
You can locale settings in hac with groovy. May be locale settings different. Date/time format can be change by locale. You can check locale setting with below groovy script in hac with Console > Script Languages.
import java.text.DateFormat;
import java.util.Date;
import java.text.SimpleDateFormat;
DateFormat formatter = DateFormat.getDateInstance(DateFormat.SHORT, Locale.getDefault());
String pattern = ((SimpleDateFormat)formatter).toPattern();
String localPattern = ((SimpleDateFormat)formatter).toLocalizedPattern();
print Locale.getDefault();
print "\n";
print pattern;
print "\n";
print localPattern;

A Question with using scapy.sniff for get the 'Ethernet Frame' in pcap files

Aim: Get the arrival time from the pcap files
Language: python3.7
Tools: Scapy.sniff
Above all ,i want get the arrival time data,in the .pcap ,when i use wireshark ,i saw the data in the Ethernet Frame,but when i use
#Scapy.sniff(offline='.pcap') ,i just get the Ether,TCP,IP and others ,so how can i get that data?
Thanx alot!
>>from scapy.all import *
>>a = sniff(offline = '***.pcap')
>>a[0]
[out]:
<Ether dst=*:*:*:*:*:* src=*:*:*:*:*:* type=** |<IP version=4 ihl=5 tos=0x20 len=52 id=14144 flags=DF frag=0 ttl=109 proto=tcp chksum=0x5e3b src=*.*.*.* dst=*.*.*.* |<TCP sport=gcsp dport=http seq=1619409885 ack=1905830025 dataofs=8 reserved=0 flags=A window=65535 chksum=0xfdb5 urgptr=0 options=[('NOP', None), ('NOP', None), ('SAck', (1905831477, 1905831485))] |>>>
[ ]:
The packet time from the pcap is available in the time member:
print(a[0].time)
It's kept as a floating point value (the standard python "timestamp" format). To get it in a form more easily understandable, you may want to use the datetime module:
>>> from datetime import datetime
>>> dt = datetime.fromtimestamp(a[0].time)
>>> print(dt)
2018-11-12 03:03:00.259780
The scapy documentation isn't great. It can be very instructive to use the interactive help facility. For example, in the interpreter:
$ python
>>> from scapy.all import *
>>> a = sniff(offline='mypcap.pcap')
>>> help(a[0])
This will show you all the methods and attributes of the object represented by a[0]. In your case, that is an instance of class Ether(scapy.packet.Packet).

Python3.6.6 argparse for negative string values with list of arguments

I'm having issues with negative strings being fed into argparse and resulting in errors. I am wondering if anyone has figured out a way around this. Unfortunately, I need the strings to be prepended by a negative sign in some cases, so I cannot fix this by removing the negation part.
I've taken a look at some other stackoverflow pages, such as How to parse positional arguments with leading minus sign (negative numbers) using argparse but still have no solution for this. I'm sure that someone must have a solution for this!
Here is what I have tried and am seeing:
PyDev console: starting.
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
import argparse
argparser = argparse.ArgumentParser()
# arguments:
argparser.add_argument("--configfile", "--config", type=str, default=None,
help="A config file to parse (see src/configs/sample_config.ini for more details).")
argparser.add_argument("--starttime", "--st", type=str, default="-1h#m",
help="Starting time for the dump; default: one hour ago rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--endtime", "--et", type=str, default="#m",
help="Ending time for the dump; default: current time rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--indexes", "--i", type=str, nargs="+", default="-config-",
help="Provide one or more indexes (comma-separated with quotes around the list; "
"ex: \"index1, index two, index3\") to search on. Default is \"-config-\" "
"which means that the indeces will be gathered from file names; see the sample "
"config for details.")
argparser.add_argument("--format", "--f", type=str, default="csv",
help="Write data out to CSV file.")
import shlex
args = argparser.parse_args(shlex.split("--configfile /somepath/sample_config_test04.ini --endtime now --indexes \"index one\" index2 index3 --format csv --starttime -5h#m"))
usage: pydevconsole.py [-h] [--configfile CONFIGFILE] [--starttime STARTTIME]
[--endtime ENDTIME] [--indexes INDEXES [INDEXES ...]]
[--format FORMAT]
pydevconsole.py: error: argument --starttime/--st: expected one argument
Process finished with exit code 2
As you can see the above fails, but the following works fine, so I am sure that it's an issue with the "-":
PyDev console: starting.
Python 3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:07:29)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
import argparse
argparser = argparse.ArgumentParser()
# arguments:
argparser.add_argument("--configfile", "--config", type=str, default=None,
help="A config file to parse (see src/configs/sample_config.ini for more details).")
argparser.add_argument("--starttime", "--st", type=str, default="-1h#m",
help="Starting time for the dump; default: one hour ago rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--endtime", "--et", type=str, default="#m",
help="Ending time for the dump; default: current time rounded to the last minute. \n"
"Supported structure: -[0-9]+[h,m,s]+(#[h,m,s])?; for example: -1h#m, -10h, ... OR now")
argparser.add_argument("--indexes", "--i", type=str, nargs="+", default="-config-",
help="Provide one or more indexes (comma-separated with quotes around the list; "
"ex: \"index1, index two, index3\") to search on. Default is \"-config-\" "
"which means that the indeces will be gathered from file names; see the sample "
"config for details.")
argparser.add_argument("--format", "--f", type=str, default="csv",
help="Write data out to CSV file.")
import shlex
args = argparser.parse_args(shlex.split("--configfile /somepath/sample_config_test04.ini --endtime now --indexes \"index one\" index2 index3 --format csv --starttime 5h#m"))
print(args)
Namespace(configfile='/somepath/sample_config_test04.ini', endtime='now', format='csv', indexes=['index one', 'index2', 'index3'], starttime='5h#m')
You may be wondering why I'm calling the code like this; that's because I need to do some unittest-ing of the argparse calls that I am running, so I need to be able to call it from both the commandline as well as from my unittesting code.
If I call the same code from commandline without any quotes or \ in front of the -5h#m, that seems to work fine, but only for commandline (gets converted to \-5h#m). I have tried --starttime \"-5h#m\" and --starttime \-5h#m, -5h#m, '\-5h#m', etc. but nothing seems to be accepted and correctly parsed by argparse other than cmdl input.
The error is typically:
test.py: error: argument --starttime/--st: expected one argument
Any help would be greatly appreciated!
Updates: changing the input to be alike to -configfile=/somepath/sample_config_test04.ini -endtime=now -indexes="index one, index2, index3" -format=csv -starttime=-5h#m seems to work from command line.
NOTE: I'd like to keep this answer mainly because the other suggested answer has a very weirdly phrased title that I would not be have been able to find by doing a google search for what I needed. I did update the question to also reflect that I also had listed items as part of it.

Resources