Can't parse html using xml.etree.ElementTree - python-3.x

I am trying to parse the xml of google.com however I am getting a 'not well-formed' error. Why is this? Thanks
➜ testing cat code.py
from urllib.request import urlopen; from xml.etree.ElementTree import fromstring
fromstring(urlopen('https://www.google.com').read().replace(b'<!doctype html>',b'<!DOCTYPE html>'))
➜ testing python3 code.py
Traceback (most recent call last):
File "code.py", line 2, in <module>
fromstring(urlopen('https://www.google.com').read().replace(b'<!doctype html>',b'<!DOCTYPE html>'))
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xml/etree/ElementTree.py", line 1315, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1826
➜ testing

You are probably getting the error message because you are trying to parse HTML with an XML parser; it won't work. Try it with a library with an HTML parser. Also, I would recommend getting the page with requests, instead. So together:
import requests
import lxml.html as lh
req = requests.get('https://www.google.com')
lh.fromstring(req.text)
and it should work.

Related

asyncio import issues - no attribute 'StreamReader'

I have had asyncio and websockets work fine several times, but for some reason it sometimes refuses to run and will refuse to ever run again. I have had this happen across multiple devices, with code as simple as just imports:
import asyncio
import json
import websockets
Interestingly, when using Pydroid3 on Android, any code I write with asyncio works fine, but only until I save it to a file. Once it's been saved, it stops working. I can copy all the text and paste it to a new, unsaved file and it again works fine until saved. This awful solution does not work for Windows, unfortunately. I am using Python 3.9.0 for Windows. The stacktrace produced by running the code shown above is as follows:
Traceback (most recent call last):
File "C:\Users\user\Documents\AtomTests\socket.py", line 1, in <module>
import asyncio
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\asyncio\__init__.py", line 8, in <module>
from .base_events import *
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 23, in <module>
import socket
File "C:\Users\user\Documents\AtomTests\socket.py", line 3, in <module>
import websockets
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\__init__.py", line 3, in <module>
from .auth import * # noqa
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\auth.py", line 12, in <module>
from .exceptions import InvalidHeader
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\exceptions.py", line 33, in <module>
from .http import Headers, HeadersLike
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\http.py", line 70, in <module>
async def read_request(stream: asyncio.StreamReader) -> Tuple[str, "Headers"]:
AttributeError: partially initialized module 'asyncio' has no attribute 'StreamReader' (most likely due to a circular import)
[Finished in 0.158s]
I've searched a bit for this error, but either it's uncommon or I'm just blind, because I couldn't find anything. Has anyone else had this happen to them?
Your local socket.py file is shadowing Python’s socket module. Rename your file and your imports will work.

ImportError: cannot import name 'entrenamiento'

I'm taking Python again after a long time.
I'm developing a little software to help me learn a new lengauge (japanese)
I tried to make a class and import it, but it did't work. Just for testing, I created a very simple class and when I tried to import it I got an error.
Here is the code (both files trainer.py and prueba.py are in the same folder):
file trainer.py
class trainer:
def entrenamiento(t,dicc):
print(t)
print(dicc)
file prueba.py
from trainer import entrenamiento
entrenamiento(1,2)
When I run prueba.py I get the following:
C:\Users\nico\AppData\Local\Programs\Python\Python36-32\python.exe C:/Users/nico/PycharmProjects/japanese/prueba.py
Traceback (most recent call last):
File "C:/Users/nico/PycharmProjects/japanese/prueba.py", line 1, in <module>
from trainer import entrenamiento
ImportError: cannot import name 'entrenamiento'
Process finished with exit code 1
I also tried with a different code in prueba.py:
import trainer
trainer.entrenamiento(1,2)
and I got this:
C:\Users\nico\AppData\Local\Programs\Python\Python36-32\python.exe
C:/Users/nico/PycharmProjects/japanese/prueba.py
Traceback (most recent call last):
File "C:/Users/nico/PycharmProjects/japanese/prueba.py", line 3, in <module>
trainer.entrenamiento(1,2)
AttributeError: module 'trainer' has no attribute 'entrenamiento'
Process finished with exit code 1
Finally, just for checking I tried the following
file trainer.py
class trainer:
print('hello world')
file prueba.py
import trainer
and I got no error
C:\Users\nico\AppData\Local\Programs\Python\Python36-32\python.exe
C:/Users/nico/PycharmProjects/japanese/prueba.py
hello world
Process finished with exit code 0
I'm working with Python 3.6.5 and PyCharm 2018.1.4 Community Edition
Is there any mistake in my coding or maybe a configuration issue?
I thank you in advance for your help
Uhm so, you made a little mistake with your import in prueba.py
it should be:
from trainer import trainer
trainer.entrenamiento(1,2)
where the first trainer points to trainer.pyand the second points to the class.
you can now access the function defined inside the class using the syntax class.function eg. trainer.entrenamiento(1,2)
i would recommend to change either the name of trainer.py or the class trainer as it's obviously confusing.

Learning Python extracting data from a website

I am trying to write a script to get data from an internal website that exports to Excel, that data gets broken into smaller pieces and gets emailed to technicians. (metric data) I am trying to get into the website using robobrowser but I keep getting this:
C:\Users\user\AppData\Local\Programs\Python\Python36-32\Aging.py
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\Aging.py", line 3, in
from robobrowser import RoboBrowser
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\robobrowser-0.5.3-py3.6.egg\robobrowser__init__.py", line 3, in
from .browser import RoboBrowser
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\robobrowser-0.5.3-py3.6.egg\robobrowser\browser.py", line 7, in
from bs4 import BeautifulSoup
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4__init__.py", line 30, in
from .builder import builder_registry, ParserRejectedMarkup
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\builder__init__.py", line 308, in
from . import _htmlparser
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\builder_htmlparser.py", line 7, in
from html.parser import (
ImportError: cannot import name 'HTMLParseError'
Here is the code:
import webbrowser
import re
from robobrowser import RoboBrowser
#Set BR module
br = RoboBrowser()
#open a website
br.open("https://www.whatever.com")
form = br.get_form()
form ['username'] = "username"
form ['password'] = "password"
br.submit_form(form)
Any help would be appreciated.
You should try reinstalling RoboBrowser and BeautifulSoup. What's happening is that when you import robobrowser, RoboBrowser then tries to import BeautifulSoup (a python module) which then tries to import _htmlparser (a python module that is part of the BeautifulSoup package), but it can't find that file and the load fails.
This is most likely caused by a missing or corrupted file (or maybe an out of date version). If you reinstall BeautifulSoup (and probably robobrowser to be safe) it should fix the problem.

Users in my group can't import boxsdk in python ( SyntaxError: invalid syntax )

When I import the necessary libraries in the python box sdk into my projects it works perfectly, but when ever another user in my group tries to use the same library it gives me the following error:
Traceback (most recent call last): File
"/home/-------/.../---------.py", line 2, in
from boxsdk import Client, OAuth2
File "/usr/local/lib/python2.7/dist-packages/boxsdk/init.py", line 5,
in
from .auth import JWTAuth, OAuth2
File "/usr/local/lib/python2.7/dist-packages/boxsdk/auth/init.py", line 8,
in
from .jwt_auth import JWTAuth File "/usr/local/lib/python2.7/dist-packages/boxsdk/auth/jwt_auth.py", line 11,
in
import jwt File "/usr/local/lib/python2.7/dist-packages/jwt/__init__.py", line 17, in
from .jwk import ( File "/usr/local/lib/python2.7/dist-packages/jwt/jwk.py", line 60
def is_sign_key(self) -> bool:
^ SyntaxError: invalid syntax
This error occurs whether the user uses sudo or not as well as if they import the libraries with this:
from boxsdk import Client, OAuth2
or
import boxsdk
***************** UPDATE
We are all using Python 2.7.12
is_sign_key() uses type annotations.
Compare the output of python --version for you and your colleagues. Downrev python interpreters won't recognize type annotations.

Running neo4j-Python code in Eclipse with Pydev under ArchLinux

so I installed neo4j on ArchLinux (AUR Link) and want to test it using python 3.2.
I am using python 3.2, Eclipse with Pydev.
I tried following code from the neo4j website, allthough I think it was still 2.7 python code and I tried to convert it to Python 3.2 code.
Here's the code:
import os
libpath = '/usr/share/java/neo4j'
os.environ['CLASSPATH'] = ';'.join( [ os.path.abspath(p) for p in
os.listdir(libpath)])
from neo4j import GraphDatabase
# Create a database
db = GraphDatabase('/home/USERNAME/.db/neo4j/HelloWorld')
# All write operations happen in a transaction
with db.transaction:
firstNode = db.node(name='Hello')
secondNode = db.node(name='world!')
# Create a relationship with type 'knows'
relationship = firstNode.knows(secondNode, name='graphy')
# Read operations can happen anywhere
message = ' '.join([firstNode['name'], relationship['name'], secondNode['name']])
print(message)
# Delete the data
with db.transaction:
firstNode.knows.single.delete()
firstNode.delete()
secondNode.delete()
# Always shut down your database when your application exits
db.shutdown()
But I get following error message:
Traceback (most recent call last):
File "/home/USERNAME/PATH/TO/src/neo4j-HelloWorld.py", line 12, in <module>
from neo4j import GraphDatabase
File "/usr/lib/python3.2/site-packages/neo4j_embedded-1.6-py3.2.egg/neo4j/__init__.py", line 29, in <module>
from neo4j.core import GraphDatabase, Direction, NotFoundException, BOTH, ANY, INCOMING, OUTGOING
File "/usr/lib/python3.2/site-packages/neo4j_embedded-1.6-py3.2.egg/neo4j/core.py", line 19, in <module>
from _backend import *
ImportError: No module named _backend
I just can't figure out what's wrong!
I tried to set the CLASSPATH as described here, but it doesn't change anything.
I would really appreciate any help!
Did you run the code through 2to3?
If not, I suggest you do.
I think the problem is that the relative import syntax changed in 3.x, see PEP328 for details.
e.g. the offending import in core.py should probably say from ._backend import *

Resources