Learning Python extracting data from a website

Learning Python extracting data from a website - python-3.x

I am trying to write a script to get data from an internal website that exports to Excel, that data gets broken into smaller pieces and gets emailed to technicians. (metric data) I am trying to get into the website using robobrowser but I keep getting this:
C:\Users\user\AppData\Local\Programs\Python\Python36-32\Aging.py
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\Aging.py", line 3, in
from robobrowser import RoboBrowser
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\robobrowser-0.5.3-py3.6.egg\robobrowser__init__.py", line 3, in
from .browser import RoboBrowser
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\robobrowser-0.5.3-py3.6.egg\robobrowser\browser.py", line 7, in
from bs4 import BeautifulSoup
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4__init__.py", line 30, in
from .builder import builder_registry, ParserRejectedMarkup
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\builder__init__.py", line 308, in
from . import _htmlparser
File "C:\Users\user\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\builder_htmlparser.py", line 7, in
from html.parser import (
ImportError: cannot import name 'HTMLParseError'
Here is the code:
import webbrowser
import re
from robobrowser import RoboBrowser
#Set BR module
br = RoboBrowser()
#open a website
br.open("https://www.whatever.com")
form = br.get_form()
form ['username'] = "username"
form ['password'] = "password"
br.submit_form(form)
Any help would be appreciated.

You should try reinstalling RoboBrowser and BeautifulSoup. What's happening is that when you import robobrowser, RoboBrowser then tries to import BeautifulSoup (a python module) which then tries to import _htmlparser (a python module that is part of the BeautifulSoup package), but it can't find that file and the load fails.
This is most likely caused by a missing or corrupted file (or maybe an out of date version). If you reinstall BeautifulSoup (and probably robobrowser to be safe) it should fix the problem.

Related

Django unable to get models module

I am learning Django and not sure what is causing the error 'ModuleNotFoundError: No module named 'models''
even though the files are in the same folder
files are available here
File "/Users/mayanksharma/Library/Mobile Documents/com~apple~CloudDocs/for Stackoverflow/app-7-django/blogs/urls.py", line 1, in <module>
from . import views
File "/Users/mayanksharma/Library/Mobile Documents/com~apple~CloudDocs/for Stackoverflow/app-7-django/blogs/views.py", line 2, in <module>
from models import Post
ModuleNotFoundError: No module named 'models'

Do you mean by?
from .models import Post
Replace from . import views with:
import sys
sys.append('..')
import views

This was something to do with python reading path instead of from .models import Post or from . import views I gave from app.models import Post and that made it work

asyncio import issues - no attribute 'StreamReader'

I have had asyncio and websockets work fine several times, but for some reason it sometimes refuses to run and will refuse to ever run again. I have had this happen across multiple devices, with code as simple as just imports:
import asyncio
import json
import websockets
Interestingly, when using Pydroid3 on Android, any code I write with asyncio works fine, but only until I save it to a file. Once it's been saved, it stops working. I can copy all the text and paste it to a new, unsaved file and it again works fine until saved. This awful solution does not work for Windows, unfortunately. I am using Python 3.9.0 for Windows. The stacktrace produced by running the code shown above is as follows:
Traceback (most recent call last):
File "C:\Users\user\Documents\AtomTests\socket.py", line 1, in <module>
import asyncio
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\asyncio\__init__.py", line 8, in <module>
from .base_events import *
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\asyncio\base_events.py", line 23, in <module>
import socket
File "C:\Users\user\Documents\AtomTests\socket.py", line 3, in <module>
import websockets
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\__init__.py", line 3, in <module>
from .auth import * # noqa
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\auth.py", line 12, in <module>
from .exceptions import InvalidHeader
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\exceptions.py", line 33, in <module>
from .http import Headers, HeadersLike
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\websockets\http.py", line 70, in <module>
async def read_request(stream: asyncio.StreamReader) -> Tuple[str, "Headers"]:
AttributeError: partially initialized module 'asyncio' has no attribute 'StreamReader' (most likely due to a circular import)
[Finished in 0.158s]
I've searched a bit for this error, but either it's uncommon or I'm just blind, because I couldn't find anything. Has anyone else had this happen to them?

Your local socket.py file is shadowing Python’s socket module. Rename your file and your imports will work.

Can't parse html using xml.etree.ElementTree

I am trying to parse the xml of google.com however I am getting a 'not well-formed' error. Why is this? Thanks
➜ testing cat code.py
from urllib.request import urlopen; from xml.etree.ElementTree import fromstring
fromstring(urlopen('https://www.google.com').read().replace(b'<!doctype html>',b'<!DOCTYPE html>'))
➜ testing python3 code.py
Traceback (most recent call last):
File "code.py", line 2, in <module>
fromstring(urlopen('https://www.google.com').read().replace(b'<!doctype html>',b'<!DOCTYPE html>'))
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xml/etree/ElementTree.py", line 1315, in XML
parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 1826
➜ testing

You are probably getting the error message because you are trying to parse HTML with an XML parser; it won't work. Try it with a library with an HTML parser. Also, I would recommend getting the page with requests, instead. So together:
import requests
import lxml.html as lh
req = requests.get('https://www.google.com')
lh.fromstring(req.text)
and it should work.

Users in my group can't import boxsdk in python ( SyntaxError: invalid syntax )

When I import the necessary libraries in the python box sdk into my projects it works perfectly, but when ever another user in my group tries to use the same library it gives me the following error:
Traceback (most recent call last): File
"/home/-------/.../---------.py", line 2, in
from boxsdk import Client, OAuth2
File "/usr/local/lib/python2.7/dist-packages/boxsdk/init.py", line 5,
in
from .auth import JWTAuth, OAuth2
File "/usr/local/lib/python2.7/dist-packages/boxsdk/auth/init.py", line 8,
in
from .jwt_auth import JWTAuth File "/usr/local/lib/python2.7/dist-packages/boxsdk/auth/jwt_auth.py", line 11,
in
import jwt File "/usr/local/lib/python2.7/dist-packages/jwt/__init__.py", line 17, in
from .jwk import ( File "/usr/local/lib/python2.7/dist-packages/jwt/jwk.py", line 60
def is_sign_key(self) -> bool:
^ SyntaxError: invalid syntax
This error occurs whether the user uses sudo or not as well as if they import the libraries with this:
from boxsdk import Client, OAuth2
or
import boxsdk
***************** UPDATE
We are all using Python 2.7.12

is_sign_key() uses type annotations.
Compare the output of python --version for you and your colleagues. Downrev python interpreters won't recognize type annotations.

Running neo4j-Python code in Eclipse with Pydev under ArchLinux

so I installed neo4j on ArchLinux (AUR Link) and want to test it using python 3.2.
I am using python 3.2, Eclipse with Pydev.
I tried following code from the neo4j website, allthough I think it was still 2.7 python code and I tried to convert it to Python 3.2 code.
Here's the code:
import os
libpath = '/usr/share/java/neo4j'
os.environ['CLASSPATH'] = ';'.join( [ os.path.abspath(p) for p in
os.listdir(libpath)])
from neo4j import GraphDatabase
# Create a database
db = GraphDatabase('/home/USERNAME/.db/neo4j/HelloWorld')
# All write operations happen in a transaction
with db.transaction:
firstNode = db.node(name='Hello')
secondNode = db.node(name='world!')
# Create a relationship with type 'knows'
relationship = firstNode.knows(secondNode, name='graphy')
# Read operations can happen anywhere
message = ' '.join([firstNode['name'], relationship['name'], secondNode['name']])
print(message)
# Delete the data
with db.transaction:
firstNode.knows.single.delete()
firstNode.delete()
secondNode.delete()
# Always shut down your database when your application exits
db.shutdown()
But I get following error message:
Traceback (most recent call last):
File "/home/USERNAME/PATH/TO/src/neo4j-HelloWorld.py", line 12, in <module>
from neo4j import GraphDatabase
File "/usr/lib/python3.2/site-packages/neo4j_embedded-1.6-py3.2.egg/neo4j/__init__.py", line 29, in <module>
from neo4j.core import GraphDatabase, Direction, NotFoundException, BOTH, ANY, INCOMING, OUTGOING
File "/usr/lib/python3.2/site-packages/neo4j_embedded-1.6-py3.2.egg/neo4j/core.py", line 19, in <module>
from _backend import *
ImportError: No module named _backend
I just can't figure out what's wrong!
I tried to set the CLASSPATH as described here, but it doesn't change anything.
I would really appreciate any help!

Did you run the code through 2to3?
If not, I suggest you do.
I think the problem is that the relative import syntax changed in 3.x, see PEP328 for details.
e.g. the offending import in core.py should probably say from ._backend import *

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Learning Python extracting data from a website - python-3.x

Related

Django unable to get models module

asyncio import issues - no attribute 'StreamReader'

Can't parse html using xml.etree.ElementTree

Users in my group can't import boxsdk in python ( SyntaxError: invalid syntax )

Running neo4j-Python code in Eclipse with Pydev under ArchLinux

Categories

Resources