Encoding issue with python3 and click package - python-3.x

When the lib click detects that the runtime is python3 but the encoding is ASCII then it ends the python program abruptly:
RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII as encoding for the environment. Either switch to Python 2 or consult http://click.pocoo.org/python3/ for mitigation steps.
I found the cause of this issue in my case, when I connect to my Linux host from my Mac, the Terminal.app set the SSH session locale to my Mac locale (es_ES.UTF-8) However my Linux host hasn't installed such locale (only en_US.utf-8).
I applied an initial workaround to fix it (but It had many issues, see accepted answer):
import locale, codecs
# locale.getpreferredencoding() == 'ANSI_X3.4-1968'
if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
os.environ['LANG'] = 'en_US.utf-8'
EDIT: For a better patch see my accepted answer.
All my linux hosts have installed 'en_US.utf-8' as locale (Fedora uses it as default).
My question is: Is there a better (more robust) way to choose/force the locale in a python3 script ? For instance, setting one of the available locales in the system.
Maybe there is a different approach to fix this issue but I didn't find it.

If you have python version >= 3.7, then you should not need to do anything. If you have python 3.6 see the original solution.
EDIT 2017-12-08
I've seen that there is a PEP 538 for py3.7, that will change the entire behavior of python3 encoding management during startup, I think that the new approach will fix the original problem: https://www.python.org/dev/peps/pep-0538/
IMHO the changes targeted to python 3.7 for encoding issues, should have been planed years ago, but better late than never, I guess.
EDIT 2015-09-01
There is an opened issue (enhancement), http://bugs.python.org/issue15216, that will allow to change the encoding in a created (not-used) stream easily (sys.std*). But is targeted to python 3.7 So, we'll have to wait for a while.
Original solution that targets python version 3.6
NOTE: this solution should not be needed for anyone running python version >= 3.7 see PEP 538
Well, my initial workaround had many flaws, I got to pass the click library check about the encoding, but the encoding itself was not fixed, so I get exceptions when the input parameters or output had non-ascii characters.
I had to implement a more complex method, with 3 steps: set locale, correct encoding in std in/out and re-encode the command line parameters, besides I've added a "friendly" exit if the first try to set the locale doesn't work as expected:
def prevent_ascii_env():
"""
To avoid issues reading unicode chars from stdin or writing to stdout, we need to ensure that the
python3 runtime is correctly configured, if not, we try to force to utf-8,
but It isn't possible then we exit with a more friendly message that the original one.
"""
import locale, codecs, os, sys
# locale.getpreferredencoding() == 'ANSI_X3.4-1968'
if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
os.environ['LANG'] = 'en_US.utf-8'
if codecs.lookup(locale.getpreferredencoding()).name == 'ascii':
print("The current locale is not correctly configured in your system")
print("Please set the LANG env variable to the proper value before to call this script")
sys.exit(-1)
#Once we have the proper locale.getpreferredencoding() We can change current stdin/out streams
_, encoding = locale.getdefaultlocale()
import io
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding=encoding, errors="replace", line_buffering=True)
sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding=encoding, errors="replace", line_buffering=True)
sys.stdin = io.TextIOWrapper(sys.stdin.detach(), encoding=encoding, errors="replace", line_buffering=True)
# And finally we need to re-encode the input parameters
for i, p in enumerate(sys.argv):
sys.argv[i] = os.fsencode(p).decode()
This patch solves almost all issues, however it has a caveat, the method shutils.get_terminal_size() raises a ValueError because the sys.__stdout__ has been detached, click lib uses that method to print the help, to fix it I had to apply a monkey-patch on click lib
def wrapper_get_terminal_size():
"""
Replace the original function termui.get_terminal_size (click lib) by a new one
that uses a fallback if ValueError exception has been raised
"""
from click import termui, formatting
old_get_term_size = termui.get_terminal_size
def _wrapped_get_terminal_size():
try:
return old_get_term_size()
except ValueError:
import os
sz = os.get_terminal_size()
return sz.columns, sz.lines
termui.get_terminal_size = _wrapped_get_terminal_size
formatting.get_terminal_size = _wrapped_get_terminal_size
With this changes all my scripts work fine now when the environment has a wrong locale configured but the system supports en_US.utf-8 (It's the Fedora default locale).
If you find any issue on this approach or have a better solution, please add a new answer.

It's an aged thread, however this answer might help other in the future or myself. If it's *nux
env | grep LC_ALL
if it's set, do the follows. That's all of it.
unset LC_ALL

If you are running python 3.6 then you will still get this error. Here is a simple solution that the authors of click recommend:
#!/bin/bash
# before your python code executes set two environment variables
export LANG=en_US.utf8
export LC_ALL=en_US.utf8
NOTE: replace the values with whatever your locale is configured to
NOTE: this solution is even given in the PEP 538 document seen here.

I haven't found this simple method (re-exec script with proper environment before doing anything) so I'll add it for future travellers using old Python version for some reason. Add it bellow imports to be that first :
if os.environ["LC_ALL"] != "C.UTF-8" or os.environ["LANG"] != "C.UTF-8":
os.execve(sys.executable,
[os.path.realpath(__file__)] + sys.argv,
{"LC_ALL": "C.UTF-8", "LANG": "C.UTF-8"})

Related

Issue with Python tkinter / pypdftk / subprocess(?)

I have been using this whole script flawlessly on my PC. I attempted to put it on my coworkers PC, but this particular part doesn't seem to work. I am using a tkinter interface to take data from psql and fill a premade fillable PDF using pypdftk, then either saving it using asksaveasfilename and opening it with subprocess.Popen or not saving it and opening it as a temp file using subprocess.run. On my PC both work great. On coworkers PC, neither work.
On my coworkers PC, the save option opens the save dialog with all the correct info as far as I can tell, and lets me go through the process of saving a file as it normally would, but then the file just doesn't save and never actually appears. If I open as a temp file, it throws the exception.
import tkinter as tk
from tkinter import *
from tkinter.ttk import *
import tkinter.messagebox
import pypdftk
from tkinter.filedialog import asksaveasfilename
import os.path
import os
import subprocess
from pathlib import Path
def file_handler(form_path, data, fname):
try:
tl2 = tk.Toplevel()
tl2.wm_title('File Handler')
w = 340
h = 55
ws = tl2.winfo_screenwidth()
hs = tl2.winfo_screenheight()
x = (ws/2) - (w/2)
y = (hs/2) - (h/2)
tl2.geometry('%dx%d+%d+%d' % (w, h, x, y))
def save_and_open():
savefile = asksaveasfilename(defaultextension=".pdf", initialdir="C:\\Desktop", filetypes=[('pdf file', '*.pdf')], initialfile=fname)
generated_pdf = pypdftk.fill_form(form_path, data, savefile)
subprocess.Popen(generated_pdf,shell=True)
def open_without_save():
try:
generated_pdf = pypdftk.fill_form(form_path, data)
os.rename(generated_pdf, generated_pdf+".pdf")
generated_pdf = generated_pdf+".pdf"
subprocess.run(generated_pdf,shell=True)
except:
tk.messagebox.showerror("Unable to open", "An error has occurred. Please try again.")
else:
tl2.destroy()
finally:
if os.path.exists(generated_pdf):
os.remove(generated_pdf)
print("Successfully removed temp file.")
save = tk.Button(tl2,text='Save and Open', width=20, command=save_and_open)
nosave = tk.Button(tl2,text='Open without saving', width=20,command=open_without_save)
save.grid(row=0, columnspan=2, sticky='NESW', padx=5, pady=10, ipadx=5, ipady=5)
nosave.grid(row=0, column=2, columnspan=2, sticky='NESW', padx=5, pady=10, ipadx=5, ipady=5)
tl2.mainloop()
except:
tk.messagebox.showerror("Unable to open", "An error has occurred. Please try again.")
As far as I can tell, everything works until you get into the save_and_open and open_without_save functions. I left in all the libraries I believe are relevant.
I should also mention, I am quite a novice at python. So if any of this is ugly coding, feel free to shame me for it.
update:
I now believe the problem to be here in the pypdftk.py file:
if os.getenv('PDFTK_PATH'):
PDFTK_PATH = os.getenv('PDFTK_PATH')
else:
PDFTK_PATH = '/usr/bin/pdftk'
if not os.path.isfile(PDFTK_PATH):
PDFTK_PATH = 'pdftk'
My error states pdftk is not a known command. My guess is that there is no environment variable, then it looks to the /usr/bin and cannot find the pdftk file, so it's just making "pdftk" a string? I don't know much about /usr/bin, but is there a way to check that?
What's going on in lines 19-21
if os.getenv('PDFTK_PATH'): is checking to see if the environment variable PDFTK_PATH even exists on your machine. If so, the constant PDFTK_PATH is set to the value provided by the PDFTK_PATH environment key/variable.
Otherwise, it sets PDFTK_PATH to /usr/bin/pdftk. Two things are happening here... First, it provides a path to the binary, i.e., /usr/bin. Second, it provides the name of the binary, i.e., pdftk. In other words, it sets PDFTK_PATH to path + executable filename.
(NOTE: The directory usr/bin is where most executables are stored on machines running Unix-like operating systems, e.g., Mac, Ubuntu, etc. This is alluded to in the repo, see here.)
To err on the side of caution, if not os.path.isfile(PDFTK_PATH): checks to see if the pdftk binary can indeed be found in the /usr/bin/ folder. If not, it sets PDFTK_PATH to pdftk, i.e., it sets the path to the pdftk binary to the very directory in which pypdftk.py is located.
Farther down, the try block runs a test call on whatever the value of PDFTK_PATH was ultimately set to. See lines 46-49. If the binary, pdftk.exe, is not where PDFTK_PATH says it is, then you get the error that you got.
Concerning the r-string
As for casting the string literal to an r-string, that actually did nothing. The r prefix simply redefines the \ as just that, a backslash, rather than allowing the \ to continue to function as cue to an escape sequence. See here and here. You'll notice that neither /usr/bin/pdftk nor pdftk, i.e., where you prepended the r, contain any backslashes.
What I think happened...
After you took the advice of acw1668, and installed the PDF Toolkit (pdftk); and reinstalled the pypdftk package, you had probably fixed the problem. I don't know, but maybe you had not restarted your IDE/editor during these steps? (Side note: Sometimes you need to restart your IDE/editor after changes have been made to your machine's environment variables/settings.)
The short answer: If you're on a windows machine, the install of the pdftk toolkit added PDFTK_PATH to your environment; or, if you're on a Unix-based machine, the install placed the binary in the /usr/bin directory.
Regardless, I assure you the r had nothing to do with it. Now that you know it's working, let's prove it... take out the r, you'll see that it is still working.
I was able to fix this problem by going into the pypdftk.py file and changing the paths to raw strings like such:
if os.getenv('PDFTK_PATH'):
PDFTK_PATH = os.getenv('PDFTK_PATH')
else:
PDFTK_PATH = r'/usr/bin/pdftk'
if not os.path.isfile(PDFTK_PATH):
PDFTK_PATH = r'pdftk'

debug python code and change variables at runtime in pyCharm

I am using pycharm to 2018.3.4 to develop python scripts and I am still pretty new in this language, I normally write PowerShell code. I have a question concerning debugging. Take this unfinished code as an example.
#! python3
# phoneAndEmail.py - Finds phone numbers and email addresses on the clipboard.
import pyperclip, re, sys
phoneRegex = re.compile(r'''(
(\(?([\d \-\)\–\+\/\(]+)\)?([ .-–\/]?)([\d]+))
)''', re.VERBOSE)
mailRegex = re.compile(r'''(
\w+#\w+\.\w+
)''', re.VERBOSE)
text = pyperclip.paste()
def getMatches(text, regex) :
return regex.findall(text)
Emails = getMatches(text,mailRegex) #I want to play with this variable at runtime
phone = getMatches(text,phoneRegex)
I am at the stage where I want to analyze the variable Emails at runtime. So I set a breakpoint and can view the contents of the varieble just fine. However I also want to run some methods and play with their input parameters at runtime. Does someone know how this is possible? If this is possible with another IDE then this would be fine too.
In Pycharm 2018.3.4 you simply do this by clicking on the console tab and then on the show command prompt button.
Now you can type in Emails or Emails.pop() for example in order to manipulate variables:
You can also refer to this post, where this was discussed for an older version: change variable in Pycharm debugger

how to (re)set default file open() encoding for a whole python 3 script?

Python reads the default file content encoding from the system. *
This S.O. question demonstrates that behavior
I'd like to override that globally, in a script level. I do NOT want to specify it in every call to "open()".
For example, if my Windows has a CP1255 legacy codepage, I'd like to do:
magic_set_file_open_encoding('utf8')
data = open('file').read() # contents assumed utf8
Why this is very silly:
python3 was "designed for unicode". So why's the backward cowardliness?
Windows' system encoding is for LEGACY features, and not a basis for a system of government.
scripts' behavior is thus unpredictable and whimsical.
As an improvement to the "less hacky way" of this answer, the following will allow to only override the encoding and keep the language specification:
import locale
locale.setlocale(locale.LC_ALL, (locale.getlocale()[0], "utf8"))
# or alternatively:
locale.setlocale(locale.LC_ALL, f"{locale.getlocale()[0]}.utf8")
You can override open for your file
open = functools.partial(open, encoding='utf8')
# replace *open* by a *new open func* with UTF-8 encoding
with open('somefile') as f:
...
Another less hacky way and more global approach could be:
import locale
locale.setlocale(locale.LC_ALL, 'en_US.utf-8')

Encoding error using google adwords api

I am using the google adwords api. Currenlty my only code is:
from googleads import adwords
adwords_client = adwords.AdWordsClient.LoadFromStorage()
This results in an error displaying Your default encoding, cp1252, is not UTF-8. Please run this script with UTF-8 encoding to avoid errors.
I am using Python 3.6, which should be UTF-8 by default. What is the source of this error/how is it avoided?
It turns out that this is actually a warning emitted by googleads whenever the default encoding returned by locale.getdefaultlocale() is not UTF-8.
If your script runs without issues, I feel that you can safely ignore it. Otherwise it might be worth a try to set a different locale at the beginning of your code:
import locale
locale.setlocale(locale.LC_ALL, NEW_LOCALE)
I take it that you are running Windows, so I'm not sure what the proper locale definitions are. On Linux, you could use en_US.UTF-8, but that's probably not going to work for you.
Try importing the _locale module.
import _locale
_locale._getdefaultlocale = (lambda *args: ['en_US', 'UTF-8'])

pcap file viewing library in python 3

I'm looking at trying to read pcap files from various CTF events.
Ideally, I would like something that can do the breakdown of information such as wireshark, but just being able to read the timestamp and return the packet as a bytestring of some kind would be welcome.
The problem is that there is little or no python 3 support with all the commonly cited libraries: dpkt, pylibpcap, pcapy, etc.
Does anyone know of a pcap library that works with python 3?
to my knowledge, there are at least 2 packages that seems to work with Python 3: pure-pcapfile and dpkt:
pure-pcapfile is easy to install in python 3 using pip. It's very easy to use but still limited to decoding Ethernet and IP data. The rest is left to you. But it works right out of the box.
dpkt doesn't work right out of the box and needs some manipulation before. They are porting it to Python 3 and plan to have a Python 2 and 3 compatible version for version 2.0. Unfortunately, it's not there yet. However, it is way more complete than pure-pcapfile and can decode many protocols. If your packet embeds several layers of protocols, it will decode them automatically for you. The only problem is that you need to make a few corrections here and there to make it work (as the time of writing this comment).
pure-pcapfile
the only one that I found working for Python 3 so far is pcapfile. You can find it at https://pypi.python.org/pypi/pypcapfile/ or install it by doing pip3 install pypcapfile.
There are just basic functionalities but it works very well for me and has been updated quite recently (at the time of writing this message):
from pcapfile import savefile
file = open('mypcapfile.pcp' , 'rb')
pcapfile = savefile.load_savefile(file,verbose=True)
If everything goes well, you should see something like this:
[+] attempting to load mypcapfile.pcap
[+] found valid header
[+] loaded 1234 packets
[+] finished loading savefile.
A few remarks now. I'm using Python 3.4.3. And doing import pcapfile will not import anything from it (I'm still a beginner with Python) but the only basic information and functions from the package. Next, you have to explicitly open your file in read binary mode by passing 'rb' as the mode in the open() function. In the documentation they don't say it explicitly.
The rest is like in the documentation:
packet = pcapfile.packets[12]
to access the packet number 12 (the 13th packet then, the first one being at 0). And you have basic functionalities like
packet.timestamp
to get a timestamp or
packet.raw()
to get raw data.
The documentation mentions functions to do packet decoding of some standard formats like Ethernet and IP.
dpkt
dpkt is not available for Python 3 so you need to do the following, assuming you have access to a command line. The code is available on https://github.com/kbandla/dpkt.git and you must download it before:
git clone https://github.com/kbandla/dpkt.git
cd dpkt
git checkout --track origin/migrate_py3
git pull
This 4 commands do the following:
clone (download) the code from its git repository on github
go into the newly created directory named dpkt
switch to the branch name migrate_py3 which contains the Python 3 code. As you can see from the name of this branch, it's still experimental. So far it works for me.
(just in case) download again the code
then copy the directory named dpkt in your project or wherever Python 3 can find it.
Later on, in Python 3 here is what you have to do to get started:
import dpkt
file = open('mypcapfile.pcap','rb')
will open your file. Don't forget the 'rb' binary mode in Python 3 (same thing as in pure-pcapfile).
pcap = dpkt.pcap.Reader(file)
will read and decode your file
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print(eth)
will, for example, decode Ethernet packet and print them. Then read the documentation on how to use dpkt. If your packets contain IP or TCP layer, then dpkt.ethernet.Ethernet(buf) will decode them as well. Also note that in the for loop, we have access to the timestamps in ts.
You may want to iterate it in a less constrained form and doing as follows will help:
(ts,buf) = next(pcap)
eth = dpkt.ethernet.Ethernet(buf)
where the first line get the next tuple from the pcap file. If pcap is False then you read everything.

Resources