webp2y XML helper sanitize line breaks under python3 - python-3.x

In my web2py app I’m processing a list of items, where the user can click on a link for each item to select this. An item has an UUID, a title and a description. For a better orientation the item description is also displayed as link title. To prevent injections by and to escape tags in the description I’m using the XML sanitizer as follows:
A(this_item.title, \
callback = URL('item', 'select', \
vars=dict(uuid=this_item.uuid), user_signature=True), \
_title=XML(str_replace(this_item.description, {'\r\n':'
', '<':'<', '>':'>'}), sanitize=True))
Using Python 2 everything was fine. Since I have switched to Python 3 I have the following problem. When the description contains line breaks the sanitizer is not working anymore. For example the following string produces by my str_replace routine is fine to be sanitized by the XML helper under Python 2 but not under Python 3:
Header

Line1
Line2
Line3
Sanitizing line breaks escaped by 
 is the problem with Python 3 (but not with Python 2). Everything else is no problem for the XML helper to sanitize (e.g. less than or greater than, I need these, since if there is no description it is generated as <no description>).
How can be line breaks sanitized by the XML helper running web2py under Python3?
Thanks for any support!
Best regards
Clemens

This is down to a change in python's HTMLParser class between 3.4 and 3.5, where convert_charrefs started defaulting to True:
Python 3.4 DeprecationWarning convert_charrefs
I think the following fix in the your web2py yatl source should correct it:
https://github.com/web2py/yatl/compare/master...timnyborg:patch-1

Related

Selenium unusual output in python

I am getting comments from website I tried it already and it worked well but now it gives me unusual output.
Part of my code:
comments = driver.find_elements_by_class_name("comment-text")
time.sleep(1)
print(comments[1])
The output:
<selenium.webdriver.remote.webelement.WebElement (session="bb6ae0409dd8ec8c191f9bd84f79bea7", element="5f5d4a2f-7a93-41fe-9ca5-aa4e7c525792")>
You want
print(comments[1].text)
You were printing the element itself which is just some GUID (I think). I'm assuming you want the text contained in the element which means you need .text.

Why pandas profiling isn't showing any output in ipython?

I've a quick question about "pandas_profiling" .So basically i'm trying to use the pandas 'profiling' but instead of showing the output it says something like this:
<pandas_profiling.ProfileReport at 0x23c02ed77b8>
Where i'm making the mistake?? or Does it have anything to do with Ipython?? Because i'm using Ipython in Anaconda.
try this
pfr = pandas_profiling.ProfileReport(df)
pfr.to_notebook_iframe()
pandas_profiling creates an object that then needs to be displayed or output. One standard way of doing so is to save it as an HTML:
profile.to_file(outputfile="sample_file_name.html")
("profile" being the variable you used to save the profile itself)
It doesn't have to do with ipython specifically - the difference is that because you're going line by line (instead of running a full block of code, including the reporting step) it's showing you the object itself. The code above should allow you to see the report once you open it up.

Encoding issue, from html form data, to python print

I'm getting in Python 3 the data from an HTML form. To simplify to the maximum, my Python code looks like this:
#!/usr/bin/python3
import cgi
form = cgi.FieldStorage()
print('Content-type: text/html; charset=utf-8\n')
data = form.getvalue('nom')
print(data)
Now it prints (like it's supposed to) the name filled in the HTML form, however when that name has an accent (for example Valérie), then the accented character is printed as a ? (in this case Python prints Val?rie).
I know it's a problem of encoding (Python being notorious for this), and I've searched quite a bit (encode, decode, locale, etc...) but didn't get it to work unfortunately. If anyone knows how to fix this and have it print Valérie, I'd really appreciate it ;-)
EDIT: got it to work using print(data.encode('utf-8').decode('latin-1'))
Take care.

Pyx unicode text

So I am trying to generate postscript from Python.
Currently trying with PyX 0.14.1 on Python3.4.2,
but I am open to suggestions, if you know something simpler.
I was following mostly the suggestions found on the PyX
mailing list in this thread. This was Python2 and is quite old.
The following shows my current code after many changes:
from pyx import *
text.set(cls=text.LatexRunner, texenc='utf-8')
text.preamble(r'\usepackage{ucs}')
text.preamble(r'\usepackage[utf8x]{inputenc}')
c = canvas.canvas()
c.text(5, 5, "Sören Sundstrøm".encode("utf8"))
p = document.page(c, paperformat=document.paperformat.A4,
centered=0)
d = document.document([p])
d.writePSfile('test.ps')
PyX stops with a TexResultError. The interesting part of the error
shows what's happening in TeX:
pyx.text.TexResultError: unhandled TeX response (might be an error)
The expression passed to TeX was:
\ProcessPyXBox{b'S\xc3\xb6ren Sundstr\xc3\xb8m'%
}{1}%
\PyXInput{7}%
After parsing the return message from TeX, the following was left:
*
*! Undefined control sequence.
<argument> b'S\xc
3\xb 6ren Sundstr\xc 3\xb 8m'
<*> }{1}
(cut after 5 lines; use errordetail.full for all output)
So it looks like latex is receiving not utf-8,
but an escaped representation of utf-8.
My question: How do I pass the string to canvas.text correctly?
Or is my preamble wrong?
I also tried to follow this answer by wobsta here on SO,
but besides being much too complicated, it does not work for me either.
(Looks like PyX does not understand a metafont message in this case).
Running latex directly on a simple utf-8 input file with the same preamble
works fine by the way.
Looking into the PyX code revealed the problem.
The text module prepares an io.TextIOWrapper with utf-8 encoding to be used for TeX input. The string parameters in text.preamble and canvas.text are passed verbatim to the wrapper, so in Python 3 you just pass a string without any encoding necessary. Encoding will be done by the wrapper.
My original unsimplified code had another problem which made it difficult to solve this first problem. So for completeness here's the second problem and its solution. My original code had this order of operations:
from pyx import *
c = canvas.canvas()
# doing other stuff with canvas
text.set(cls=text.LatexRunner, texenc='utf-8')
text.preamble(r'\usepackage{ucs}')
text.preamble(r'\usepackage[utf8x]{inputenc}')
c.text(5, 5, "Sören Sundstrøm")
p = document.page(c, paperformat=document.paperformat.A4,
centered=0)
d = document.document([p])
d.writePSfile('test.ps')
This does not work either, because when a canvas is created it keeps a reference to a text.defaulttexrunner which is set up with the current settings of the text module. The changed text module settings never influence the canvas instance. So you have to set-up the text module before you create the canvas where you want to draw text into.
Thanks to anyone who looked into this.

Python 3.1 server-side can't output Unicode string to client

I'm using a free web host but choosing not to work with any Python framework, and am stuck trying to print Chinese characters saved in the source file (using emacs to save file encoded in utf-8) to the resulting HTML page. I thought Unicode "just works" in Python 3.1 so I am baffled. I found three solutions that aren't working. I might just be missing a detail or two.
The host is Alwaysdata, and it has been straightforward to use, so I have little clue about details of how they put together the parts. All I do is upload or edit (with ssh) Python files to a www folder, change permissions, point a browser to the right URL, and it works.
My first attempt, which works on local IDLE (and also the server's Python command line interactive shell, which makes me even more confused why it won't work when it's passed to a browser)
#!/usr/bin/python3.1
mystr = "世界好"
print("Content-Type: text/html\n\n")
print("""<!DOCTYPE html>
<html><head><meta charset="utf-8"></head>
<body>""")
print(mystr)
The error is:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3:
ordinal not in range(128)
Then I tried
print(mystr.encode("utf-8"))
resulting in no error, but the following undesired output to the browser:
b'\xe4\xbd\xa0\xe5\xa5\xbd\xe4\xb8\x96\xe7\x95\x8c'
Third, the following lines were added but got an error:
import sys
sys.setdefaultencoding("utf-8")
AttributeError: 'module' object has no attribute 'setdefaultencoding'
Finally, replacing print with f.write:
import codecs
f = codecs.open(sys.stdout, "w", "utf-8")
mystr = "你好世界"
...
f.write(mystr)
error:
TypeError: invalid file: <_io.TextIOWrapper name='<stdout>'
encoding='ANSI_X3.4-1968'>
How do I get the output to work? Do I need to use a framework for a quick fix?
It sounds like you are using CGI, which is a stupid API as it's using stdout, made for output to humans, to output to your browser. This is the basic source of your problems.
You need to encode it in UTF-8, and then write to sys.stdout.buffer instead of sys.stdout.
And after that, get yourself a webframework. Really, you'll be a lot happier.

Resources