error parsing dtd python lxml

error parsing dtd python lxml - python-3.x

i've ben stuck for weeks already with this little problem. Neither of these links, as well as many other won't help:
Parsing dtd file with lxml library (python)
Error parsing a DTD using lxml
get errors when import lxml.etree to python
The problem is:
When I try to parse any dtd-file with lxml I get this message:
Traceback (most recent call last):
File "/Users/benutzer/Documents/tester.py", line 7, in
dtd=etree.DTD(dtd_path)
File "src/lxml/dtd.pxi", line 294, in lxml.etree.DTD.init (src/lxml/lxml.etree.c:186845)
lxml.etree.DTDParseError: error parsing DTD
The example of the dtd-file is taken from oxygen samples list, but alongside other files it cannot be parsed. I'm using MacBook Pro and Python 3.4 and interestingly a colleague of mine does not get this message and is apparently able to parse stuff with lxml. Everything works fine on Windows-laptops as well. So I assume the problem is somewhere within my laptop. If any of you has any idea where it might lurk it would be great.
Here is the example of the code I'm using on this one:
import lxml
from lxml import etree
import os
import io
from io import StringIO
dtd_path=os.path.join(os.path.dirname(os.path.realpath(__file__)), r'personal.dtd')
dtd=etree.DTD(dtd_path)
If you need more info from me just ask.
Thanks in advance!
P.S. Now as I've followed the commentary of MattDMo I am able to parse with this piece of code without any trouble, but when I insert it in the larger code the error comes back. Now the path is seemingly ok. Should I include that part od the code where I use these lines?

Related

Rodeo giving error on Excel import working in Spyder

Full disclosure: I am a total beginner when it comes to Python in particular and programming in general. So please bear with me.
Today I tried for the first time to play around some datasets on my own, outside of the sandboxed environment of online courses.
I downloaded both Anaconda and Rodeo (which somehow I feel more akin to than, say, Spyder or Jupyter).
Wrote down this code. It works in Spyder.
import numpy as np
import pandas as pd
myexcel="C:/Users/myname/folder/subfolder/file.xlsx"
xl=pd.ExcelFile(myexcel)
mydf=xl.parse(0)
print(mydf.head())
However, if I try to run the same code in Rodeo I get the following error message. Here, I am showing just a part.
----> 4 xl=pd.ExcelFile(myexcel)
ImportError: No module named 'xlrd'
I am getting that in Rodeo the script fail because it is missing the xlrd package, which admittedly after checking with help("modules") is not there. But I don't fully get the problem: if xlrd was quintessential to the correct execution of this code, then why doesn't it fail in Spyder?

IAC-protocol interface error on python 3

I would like to work with excell sheets (.xls likely per .ods conversion) via python while maintaining all of the sheet's original content. Unlike xlutils (http://www.python-excel.org/) the iac-protocol (http://pythonhosted.org/iac-protocol/index.html) seems to me to be more fit/elegant tool to maintain sheet's style,formulas,dropboxes etc. One of the steps to launch iac's server or interpreter (iacs/iaci) is to initialize the interface which consists among others of this command:
import iac.app.libreoffice.calc as localc
While import iac.app.libreoffice works fine
moving to calc level
import iac.app.libreoffice.calc
throws following error
import iac.app.libreoffice.calc
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.4/site-packages/iac/app/libreoffice/calc.py", line 11, in
from uno import getComponentContext
ImportError: cannot import name 'getComponentContext'
From what I've learned so far on this forum it might be linked to method name duplicity between two modules. This is where I am stuck. How do I learn which other module has such name of a method and how to fix it? Both iac-protocol and unotools are modules downloaded via pip3. I did not created method of such name in any script.
Thank you in advance for any advice!
Python3.4 on Scientific Linux release 7.3 (Nitrogen) LibreOffice 5.0.6.2 00(Build:2)

Some questions to narrow down the problem:
Did you start libreoffice listening on a socket first?
Did you import anything else before import iac.app.libreoffice.calc?
What happens when you start python in a terminal and enter from uno import getComponentContext?
I installed iac-protocol on Linux Mint and was able to import iac.app.libreoffice.calc and then use it. The installation process was complex, so I wouldn't be surprised if there is some problem with how your packages were installed, or possibly it does not work on RHEL-based systems. For one thing, it required me to install gnumeric.
The Calc "Hello World" code that worked for me is as follows.
libreoffice "--accept=socket,host=localhost,port=18100;urp;StarOffice.ServiceManager" --norestore --nofirststartwizard --nologo --calc &
python3
>>> import iac.app.libreoffice.calc as localc
>>> doc = localc.Interface.current_document()
>>> sheet = doc.getSheets().getByIndex(0)
>>> cell = sheet.getCellByPosition(0,0)
>>> cell.setString("Hello, World!")
One more thought: Have you considered using straight PyUNO starting from import uno instead of a wrapper library? That would avoid dependency on some of the extra libraries which may be causing the problem. Also there is better documentation for straight PyUNO.

pickle a zipfile.ZipFile with python >= 3.6

I came across some code that would not work anymore in python 3.6, but did good in all versions before. I found out the problem is actually a field containing a ZipFile somewhere in a class. Here is a short program which raises the error:
from pickle import dumps
import io
import zipfile
raw = b""
foo = zipfile.ZipFile(io.BytesIO(raw), mode="w")
dumps(foo)
I get this error:
Traceback (most recent call last):
File "bla.py", line 8, in <module>
dumps(foo)
TypeError: can't pickle _thread.RLock objects
So the test program can be even shorter:
from pickle import dumps
import threading
dumps(threading.RLock())
I diffed both the python 3.5 and 3.6 zipfile.py but can not spot any difference in respect to the _lock field in ZipFile, so it seems that there are changes in the threading module - but in threading.py there are also no obvious changes between the versions.
Why is it not pickable anymore? Do I need to do something before I can pickle a ZipFile?
Edit: ok after searching now for a while, I stumbled across this python bug tracker entry: https://bugs.python.org/msg284751
So that a ZipFile is pickable in python <3.6 is actually the bug...
I think I need to change a lot of code now...

Just to give an answer to this question: That ZipFile objects are pickable is actually a bug: https://bugs.python.org/msg284751 which has been fixed in py 3.6.

Import parent directory for brief tests

I have searched this site top to bottom yet have not found a single way to actually accomplish what I want in Python3x. This is a simple toy app so I figured I could write some simple test cases in asserts and call it a day. It does generate reports and such so I would like to make sure my code doesn't do anything wonky upon changes.
My current directory structure is: (only relevant parts included)
project
-model
__init__.py
my_file.py
-test
my_file_test.py
I am having a hell of a time getting my_file_test.py to import my_file.py.
Like I've said. I've searched this site top to bottom and no solution has worked. My version of Python is 3.2.3 running on Fedora 17.
Previously tried attempts:
https://stackoverflow.com/questions/5078590/dynamic-imports-relative-imports-in-python-3
Importing modules from parent folder
Can anyone explain python's relative imports?
How to accomplish relative import in python
In virtually every attempt I get an error to the effect of:
ImportError: No module named *
OR
ValueError: Attempted relative import in non-package
What is going on here. I have tried every accepted answer on SO as well as all over the interwebs. Not doing anything that fancy here but as a .NET/Java/Ruby programmer this is proving to be the absolute definition of intuitiveness.
EDIT: If it matters I tried loading the class that I am trying to import in the REPL and I get the following:
>>> import datafileclass
>>> datafileclass.methods
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
>>> x = datafileclass('sample_data/sample_input.csv')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable
If it matters...I know the functionality in the class works but I can't import it which in the now is causing an inability to test. In the future will certainly cause integration issues. (names changed to protect the innocent)
getting within a couple of weeks of desired functionality for this iteration of the library...any help could be useful. Would have done it in Ruby but the client wants the Python as a learning experience,

Structure your code like this:
project
-model
__init__.py
my_file.py
-tests
__init__.py
test_my_file.py
Importantly, your tests directory should also be a module directory (have an empty __init__.py file in it).
Then in test_my_file.py use from model import my_file, and from the top directory run python -m tests.test_my_file. This is invoking test_my_file as a module, which results in Python setting up its import path to include your top level.
Even better, you can use pytest or nose, and running py.test will pick up the tests automatically.
I realise this doesn't answer your question, but it's going to be a lot easier for you to work with Python standard practices rather than against them. That means structuring your project with tests in their own top-level directory.

Robobrowser and local files

I am a beginner using Python 3.6.4 and RoboBrowser 0.5.3.
I have saved some HTML webpage and I am trying to pick up the information in the page.
Most likely incorrectly, I took inspiration from a similar question on beautifulSoup. The beautifulSoup solution works for me (BeautifulSoup 4.6.0).
In contrast, the following, based on roboBrowser, does not seem to work:
from robobrowser import RoboBrowser
br = RoboBrowser(parser='html.parser')
br.open(open("my_file.html"))
with error:
MissingSchema: Invalid URL "<_io.TextIOWrapper
name='my_file.html'
mode='r' encoding='UTF-8'>": No schema supplied. Perhaps you meant
http://<_io.TextIOWrapper
name='my_file.html'
mode='r' encoding='UTF-8'>?
I understand that the code expected a "http"-based url. I tried prepending "file://" to the absolute path of my file, to no avail.
Is there any way to communicate with the library that it is a local file, or perhaps such functionality is not part of roboBrowser?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

error parsing dtd python lxml - python-3.x

Related

Rodeo giving error on Excel import working in Spyder

IAC-protocol interface error on python 3

pickle a zipfile.ZipFile with python >= 3.6

Import parent directory for brief tests

Robobrowser and local files

Categories

Resources