Python Sumy - no module named: sumy.parsers.html - python-3.5

I started using Sumy, a paragraph summariser for Python. I run their sample code but it gives me this error:
from sumy.parsers.html import HtmlParser
ImportError: No module named sumy.parsers.html
I'm on the right version so it should work?
Sample code that I used from their website:
# -*- coding: utf-8 -*-
from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals
from sumy.parsers.html import HtmlParser
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
LANGUAGE = "czech"
SENTENCES_COUNT = 10
if __name__ == "__main__":
url = "http://www.zsstritezuct.estranky.cz/clanky/predmety/cteni/jak-naucit-dite-spravne-cist.html"
parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
# or for plain text files
# parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
stemmer = Stemmer(LANGUAGE)
summarizer = Summarizer(stemmer)
summarizer.stop_words = get_stop_words(LANGUAGE)
for sentence in summarizer(parser.document, SENTENCES_COUNT):
print(sentence)

It won't work if your filename is named with 'sumy'. Check that. Also check the python version for which sumy is installed and then run it.

Related

Importing sklearn module in pyscript

how to import modules which are in form of
"from sklearn.tree import DecisionTreeRegressor" in Pyscript?
The way you import modules works as follows:
Include the relevant package in the environment
<py-env>
- scikit-learn
</py-env>
Import the module as you would do it in any other python file
<py-script>
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier()
...
</py-script>

Is there a way to use the ffmpeg binary/unix executable in a py2app application to run ffmpeg on computers without it installed?

I wrote a small python script that is essentially just a text to speech script. It uses the pydub - audiosegment python library to convert the mp3 from gTTS to an ogg that can be played in pygame. A link to my github repository can be found here: https://github.com/AnupPlays/TTS
this is the main function:
def webscrape():
global x
global b
b.state(['disabled'])
src = "sound.mp3"
dst = "sound.ogg"
murl = str(url.get())
response = requests.get(murl)
response.raise_for_status()
parse = bs4.BeautifulSoup(response.text, 'html.parser')
x = str(parse.get_text())
print(x)
text = gTTS(x)
text.save("sound.mp3")
AudioSegment.from_mp3(src).export(dst, format='ogg')
b.state(['!disabled'])
this is a list of my imports:
#Imports
import os
import sys
import pygame
#google text to speech
from gtts import gTTS
#requests and BeautifulSoup
import requests
import bs4
#pygame audio player
from pygame import mixer
#tkinter ui
from tkinter import *
from tkinter import ttk
from tkinter import filedialog
from tkinter import messagebox
#mp3 -> wav
from os import path
from pydub import AudioSegment
For anyone wondering using homebrew you can get the dependencies for this and copy those dependencies into your packager.

Remove punctuation and stop words from a data frame

My data frame looks like -
State text
Delhi 170 kw for330wp, shipping and billing in delhi...
Gujarat 4kw rooftop setup for home Photovoltaic Solar...
Karnataka language barrier no requirements 1kw rooftop ...
Madhya Pradesh Business PartnerDisqualified Mailed questionna...
Maharashtra Rupdaypur, panskura(r.s) Purba Medinipur 150kw...
I want to remove punctuation and stop words from this data frame. I have done the following code. But its not working -
import nltk
nltk.download('stopwords')
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import string
import collections
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import matplotlib.cm as cm
import matplotlib.pyplot as plt
% matplotlib inline
import nltk
from nltk.corpus import stopwords
import string
from sklearn.feature_extraction.text import CountVectorizer
import re
def message_cleaning(message):
Test_punc_removed = [char for char in message if char not in string.punctuation]
Test_punc_removed_join = ''.join(Test_punc_removed)
Test_punc_removed_join_clean = [word for word in Test_punc_removed_join.split() if word.lower() not in stopwords.words('english')]
return Test_punc_removed_join_clean
df['text'] = df['text'].apply(message_cleaning)
AttributeError: 'set' object has no attribute 'words'
Problem: I believe you have a name conflict for stopwords. There is probably a line somewhere in your notebook where you assign:
stopwords = stopwords.words("english")
That would explain the issue, as calling stopwords would turn ambiguous: you'd be referring to the variable and not the package anymore.
Solution: Make things unambiguous:
First assign a variable referring to stop words (that'll be faster than calling it everytime btw)
from nltk.corpus import stopwords
english_stop_words = set(stopwords.words("english"))
Use that in your function:
Test_punc_removed_join_clean = [
word for word in Test_punc_removed_join.split()
if word.lower() not in english_stop_words
]

picamera on raspberry pi zero module not installed

SUMMARY:
Unable to run a time-lapse Python3 script due to a module not being installed. I am running Raspian Lite on Raspberry Pi Zero W.
THINGS I'VE TRIED:
I've tried installing picamera module for python. Tried googling the error and came across https://raspberrypi.stackexchange.com/questions/88339/importerror-no-module-named-picamera
Here is a list of installed modules. I can't see picamera on there??
help('modules')
Please wait a moment while I gather a list of all available modules...
BaseHTTPServer aifc httplib sets
Bastion antigravity ihooks sgmllib
CDROM anydbm imageop sha
CGIHTTPServer argparse imaplib shelve
Canvas array imghdr shlex
ConfigParser ast imp shutil
Cookie asynchat importlib signal
DLFCN asyncore imputil site
Dialog atexit inspect sitecustomize
DocXMLRPCServer audiodev io smtpd
FileDialog audioop itertools smtplib
FixTk base64 json sndhdr
HTMLParser bdb keyword socket
IN binascii lib2to3 spwd
MimeWriter binhex linecache sqlite3
Queue bisect linuxaudiodev sre
RPi bsddb locale sre_compile
ScrolledText bz2 logging sre_constants
SimpleDialog cPickle lsb_release sre_parse
SimpleHTTPServer cProfile macpath ssl
SimpleXMLRPCServer cStringIO macurl2path stat
SocketServer calendar mailbox statvfs
StringIO cgi mailcap string
TYPES cgitb markupbase stringold
Tix chunk marshal stringprep
Tkconstants cmath math strop
Tkdnd cmd md5 struct
Tkinter code mhlib subprocess
UserDict codecs mimetools sunau
UserList codeop mimetypes sunaudio
UserString collections mimify symbol
_LWPCookieJar colorsys mmap symtable
_MozillaCookieJar commands modulefinder sys
builtin compileall multifile sysconfig
future compiler multiprocessing syslog
_abcoll contextlib mutex tabnanny
_ast cookielib netrc tarfile
_bisect copy new telnetlib
_bsddb copy_reg nis tempfile
_codecs crypt nntplib termios
_codecs_cn csv ntpath test
_codecs_hk ctypes nturl2path textwrap
_codecs_iso2022 curses numbers this
_codecs_jp datetime opcode thread
_codecs_kr dbhash operator threading
_codecs_tw dbm optparse time
_collections decimal os timeit
_csv difflib os2emxpath tkColorChooser
_ctypes dircache ossaudiodev tkCommonDialog
_ctypes_test dis parser tkFileDialog
_curses distutils pdb tkFont
_curses_panel dl pickle tkMessageBox
_elementtree doctest pickletools tkSimpleDialog
_functools dumbdbm pipes toaiff
_hashlib dummy_thread pkgutil token
_heapq dummy_threading platform tokenize
_hotshot email plistlib trace
_io encodings popen2 traceback
_json ensurepip poplib ttk
_locale errno posix tty
_lsprof exceptions posixfile turtle
_md5 fcntl posixpath types
_multibytecodec filecmp pprint unicodedata
_multiprocessing fileinput profile unittest
_osx_support fnmatch pstats urllib
_pyio formatter pty urllib2
_random fpformat pwd urlparse
_sha fractions py_compile user
_sha256 ftplib pyclbr uu
_sha512 functools pydoc uuid
_socket future_builtins pydoc_data warnings
_sqlite3 gc pyexpat wave
_sre genericpath quopri weakref
_ssl getopt random webbrowser
_strptime getpass re whichdb
_struct gettext readline wsgiref
_symtable glob repr xdrlib
_sysconfigdata grp resource xml
_sysconfigdata_nd gzip rexec xmllib
_testcapi hashlib rfc822 xmlrpclib
_threading_local heapq rlcompleter xxsubtype
_warnings hmac robotparser zipfile
_weakref hotshot runpy zipimport
_weakrefset htmlentitydefs sched zlib
abc htmllib select
CODE BELOW:
from time import sleep
import picamera
with picamera.PiCamera() as camera:
camera.resolution = (1024, 768)
WAIT_TIME = 300
with picamera.PiCamera() as camera:
camera.resolution = (1024, 768)
for filename in
camera.capture_continuous('/home/pi/camera/img{timestamp:%H-%M-%S-%f}.jpg'):
sleep(WAIT_TIME)
The expected result is images to appear in the camera folder timestamped every 5 minutes.
While it is not stated in the documentation of picamera and it could be that its trying to run in python 2.
But I found the solution to be the following:
You need to have the following packages installed: python-picamera & python3-picamera
So running:
sudo apt-get install python-picamera fixed it

To generate wordcloud in python jupyter notebook environment

I am working corpus analysis for non english text, but I am facing several problem like clustering with k-means
Now I am facing problem in generating wordcloud in python 3.5.2 jupyter notebook
I installed wordcloud with command pip install wordcloud than process following code
# Simple WordCloud
from os import path
from scipy.misc import imread
import matplotlib.pyplot as plt
import random
from wordcloud import WordCloud, STOPWORDS
text = 'all your base are belong to us all of your base base base'
wordcloud = WordCloud(font_path='/Library/Fonts/Verdana.ttf',
relative_scaling = 1.0,
stopwords = 'to of'
).generate(text)
plt.imshow(wordcloud)
plt.axis("off")
plt.show()
But got following error
ImportError Traceback (most recent call last)
in ()
5 import random
6
----> 7 from wordcloud import WordCloud, STOPWORDS
8
9 text = 'all your base are belong to us all of your base base base'
ImportError: No module named 'wordcloud'
plz help me in this concern.

Resources