Pyx unicode text - python-3.x

So I am trying to generate postscript from Python.
Currently trying with PyX 0.14.1 on Python3.4.2,
but I am open to suggestions, if you know something simpler.
I was following mostly the suggestions found on the PyX
mailing list in this thread. This was Python2 and is quite old.
The following shows my current code after many changes:
from pyx import *
text.set(cls=text.LatexRunner, texenc='utf-8')
text.preamble(r'\usepackage{ucs}')
text.preamble(r'\usepackage[utf8x]{inputenc}')
c = canvas.canvas()
c.text(5, 5, "Sören Sundstrøm".encode("utf8"))
p = document.page(c, paperformat=document.paperformat.A4,
centered=0)
d = document.document([p])
d.writePSfile('test.ps')
PyX stops with a TexResultError. The interesting part of the error
shows what's happening in TeX:
pyx.text.TexResultError: unhandled TeX response (might be an error)
The expression passed to TeX was:
\ProcessPyXBox{b'S\xc3\xb6ren Sundstr\xc3\xb8m'%
}{1}%
\PyXInput{7}%
After parsing the return message from TeX, the following was left:
*
*! Undefined control sequence.
<argument> b'S\xc
3\xb 6ren Sundstr\xc 3\xb 8m'
<*> }{1}
(cut after 5 lines; use errordetail.full for all output)
So it looks like latex is receiving not utf-8,
but an escaped representation of utf-8.
My question: How do I pass the string to canvas.text correctly?
Or is my preamble wrong?
I also tried to follow this answer by wobsta here on SO,
but besides being much too complicated, it does not work for me either.
(Looks like PyX does not understand a metafont message in this case).
Running latex directly on a simple utf-8 input file with the same preamble
works fine by the way.

Looking into the PyX code revealed the problem.
The text module prepares an io.TextIOWrapper with utf-8 encoding to be used for TeX input. The string parameters in text.preamble and canvas.text are passed verbatim to the wrapper, so in Python 3 you just pass a string without any encoding necessary. Encoding will be done by the wrapper.
My original unsimplified code had another problem which made it difficult to solve this first problem. So for completeness here's the second problem and its solution. My original code had this order of operations:
from pyx import *
c = canvas.canvas()
# doing other stuff with canvas
text.set(cls=text.LatexRunner, texenc='utf-8')
text.preamble(r'\usepackage{ucs}')
text.preamble(r'\usepackage[utf8x]{inputenc}')
c.text(5, 5, "Sören Sundstrøm")
p = document.page(c, paperformat=document.paperformat.A4,
centered=0)
d = document.document([p])
d.writePSfile('test.ps')
This does not work either, because when a canvas is created it keeps a reference to a text.defaulttexrunner which is set up with the current settings of the text module. The changed text module settings never influence the canvas instance. So you have to set-up the text module before you create the canvas where you want to draw text into.
Thanks to anyone who looked into this.

Related

Julia: Using ProtoBuf to read messages from gzipped file

A sensor provides a stream of frames containing object coordinates, which are stored in ProtoBuf format in a gzipped file. I would like to read this file in Julia.
Using protoc, I have generated the Protobuf files for both Python and Julia, coordinate_push.py and coordinate_push.jl
My Python code is as follows:
frameList = []
with gzip.open(filePath) as f:
data = f.read()
next_pos, pos = 0, 0
while pos < len(data):
msg = coordinate_push.CoordinatesFrame()
next_pos, pos = _DecodeVarint32(data, pos)
msg.ParseFromString(data[pos:pos + next_pos])
frameList.append(msg)
pos += next_pos
I'd like to rewrite the above in Julia, and don't know where to start. Part of the problem is that I haven't fully understood the Python script (IO is not my strong point).
I understand that I need:
to open the gzip file, presumably using using GZip; file = GZip.open(file_path, "r")
to read in the data, along the lines of using ProtoBuf; data = readproto(iob, CoordinatesFrame())
What I don't understand is:
how to define iob, and especially how to link it to file (in the Julia Protobuf manual, we had iob = PipeBuffer(), but here it's a gzip-file that we'd like to read)
how to replicate the while-loop in Julia, and in particular the mysterious _DecodeVarint32 (I'm on Windows, if it's related to that.)
whether the file coordinate_push.jl has to be in the same directory as my main file, and if not, how I can properly import it (it is currently in a proto subfolder, and in Python I'd import it using from src.proto import coordinate_push)
Insight on any of the three points would be highly appreciated.
You should open an issue on the Gzip GitHub repo and ask this first part of your question there (I am not a Gzip expert unfortunately).
On the second point, I suggest looking here: https://github.com/JuliaIO/FileIO.jl/blob/master/README.md for lots of examples of FileIO loops which seems exactly what you need to replicate that Python loop. For the second part of this question, you best bet for that function is to try and hunt down the definition on GitHub or in the docs somewhere.
For the 3rd questions, coordinate_push.jl does not need to be in the same folder as your "main file" (I am not sure what you mean by this so perhaps it would help to add context on the structure of your files). To import that file all you need to do is add include("path/to/coordinate_push.jl") at the top of the file you want to call/run the code from. It's worth noting that the path can either be the absolute path or the relative project path (in some cases).

webp2y XML helper sanitize line breaks under python3

In my web2py app I’m processing a list of items, where the user can click on a link for each item to select this. An item has an UUID, a title and a description. For a better orientation the item description is also displayed as link title. To prevent injections by and to escape tags in the description I’m using the XML sanitizer as follows:
A(this_item.title, \
callback = URL('item', 'select', \
vars=dict(uuid=this_item.uuid), user_signature=True), \
_title=XML(str_replace(this_item.description, {'\r\n':'
', '<':'<', '>':'>'}), sanitize=True))
Using Python 2 everything was fine. Since I have switched to Python 3 I have the following problem. When the description contains line breaks the sanitizer is not working anymore. For example the following string produces by my str_replace routine is fine to be sanitized by the XML helper under Python 2 but not under Python 3:
Header

Line1
Line2
Line3
Sanitizing line breaks escaped by 
 is the problem with Python 3 (but not with Python 2). Everything else is no problem for the XML helper to sanitize (e.g. less than or greater than, I need these, since if there is no description it is generated as <no description>).
How can be line breaks sanitized by the XML helper running web2py under Python3?
Thanks for any support!
Best regards
Clemens
This is down to a change in python's HTMLParser class between 3.4 and 3.5, where convert_charrefs started defaulting to True:
Python 3.4 DeprecationWarning convert_charrefs
I think the following fix in the your web2py yatl source should correct it:
https://github.com/web2py/yatl/compare/master...timnyborg:patch-1

Encoding issue, from html form data, to python print

I'm getting in Python 3 the data from an HTML form. To simplify to the maximum, my Python code looks like this:
#!/usr/bin/python3
import cgi
form = cgi.FieldStorage()
print('Content-type: text/html; charset=utf-8\n')
data = form.getvalue('nom')
print(data)
Now it prints (like it's supposed to) the name filled in the HTML form, however when that name has an accent (for example Valérie), then the accented character is printed as a ? (in this case Python prints Val?rie).
I know it's a problem of encoding (Python being notorious for this), and I've searched quite a bit (encode, decode, locale, etc...) but didn't get it to work unfortunately. If anyone knows how to fix this and have it print Valérie, I'd really appreciate it ;-)
EDIT: got it to work using print(data.encode('utf-8').decode('latin-1'))
Take care.

Unpickling from converted string in python/numpy

I have a ton of numpy ndarrays that are stored picked to strings. That may have been a poor design choice but it's what I did, and now the picked strings seem to have been converted or something along the way, when I try to unpickle I notice they are of type str and I get the following error:
TypeError: 'str' does not support the buffer interface
when I invoke
numpy.loads(bin_str)
Where bin_str is the thing I'm trying to unpickle. If I print out bin_strit looks like
b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\ ...
continuing for some time, so the info seems to be there, I'm just not quite sure how to convert it into whatever string format numpy/pickle need. On a whim I tried
numpy.loads( bytearray(bin_str, encoding='utf-8') )
and
numpy.loads( bin_str.encode() )
which both throw an error _pickle.UnpicklingError: unpickling stack underflow. Any ideas?
PS: I'm on python 3.3.2 and numpy 1.7.1
Edit
I discovered that if I do the following:
open('temp.txt', 'wb').write(...)
return numpy.load( 'temp.txt' )
I get back my array, and ... denotes copying and pasting the output of print(bin_str) from another window. I've tried writing bin_str to a file directly to unpickle but that doesn't work, it complains that TypeError: 'str' does not support the buffer interface. A few sane ways of converting bin_str to something that can be written directly to a binary file result in pickle errors when trying to read it back.
Edit 2
So I guess what's happened is that my binary pickle string ended up encoded inside of a normal string, something like:
"b'pickle'"
which is unfortunate and I haven't figured out how to deal with that, except this ridiculous and convoluted way to get it back:
open('temp.py', 'w').write('foo = ' + bin_str)
from temp import foo
numpy.loads( foo )
This seems like a very shameful solution to the problem, so please give me a better one!
It sounds like your saved strings are the reprs of the original bytes instances returned by your pickling code. That's a bit unfortunate, but not too bad. repr is intended to return a "machine friendly" representation of an object, and it can often be reversed by using eval:
import numpy as np
import pickle
# this part has already happened
orig_obj = np.array([1,2,3])
orig_pickle = pickle.dumps(orig_obj)
saved_str = repr(orig_pickle) # this was a mistake, but it's already done
# this is what you need to do to get something equivalent to orig_obj back
reconstructed_pickle = eval(saved_str)
reconstructed_obj = pickle.loads(reconstructed_pickle)
# test
if np.all(reconstructed_obj == orig_obj):
print("It worked!")
Obligatory note that using eval can be dangerous: Be aware that eval can run any Python code it wants, so don't call it with untrusted data. However, pickle data has the same risks (a malicious Pickle string can run arbitrary code upon unpickling), so you're not losing much safety in this situation. I'm guessing that you trust your data in this case anyway.

Python 3.1 server-side can't output Unicode string to client

I'm using a free web host but choosing not to work with any Python framework, and am stuck trying to print Chinese characters saved in the source file (using emacs to save file encoded in utf-8) to the resulting HTML page. I thought Unicode "just works" in Python 3.1 so I am baffled. I found three solutions that aren't working. I might just be missing a detail or two.
The host is Alwaysdata, and it has been straightforward to use, so I have little clue about details of how they put together the parts. All I do is upload or edit (with ssh) Python files to a www folder, change permissions, point a browser to the right URL, and it works.
My first attempt, which works on local IDLE (and also the server's Python command line interactive shell, which makes me even more confused why it won't work when it's passed to a browser)
#!/usr/bin/python3.1
mystr = "世界好"
print("Content-Type: text/html\n\n")
print("""<!DOCTYPE html>
<html><head><meta charset="utf-8"></head>
<body>""")
print(mystr)
The error is:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
ordinal not in range(128)
Then I tried
print(mystr.encode("utf-8"))
resulting in no error, but the following undesired output to the browser:
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe4\xb8\x96\xe7\x95\x8c'
Third, the following lines were added but got an error:
import sys
sys.setdefaultencoding("utf-8")
AttributeError: 'module' object has no attribute 'setdefaultencoding'
Finally, replacing print with f.write:
import codecs
f = codecs.open(sys.stdout, "w", "utf-8")
mystr = "你好世界"
...
f.write(mystr)
error:
TypeError: invalid file: <_io.TextIOWrapper name='<stdout>'
encoding='ANSI_X3.4-1968'>
How do I get the output to work? Do I need to use a framework for a quick fix?
It sounds like you are using CGI, which is a stupid API as it's using stdout, made for output to humans, to output to your browser. This is the basic source of your problems.
You need to encode it in UTF-8, and then write to sys.stdout.buffer instead of sys.stdout.
And after that, get yourself a webframework. Really, you'll be a lot happier.

Resources