SBI (python script for Google search by image) is not working today - google-image-search

I had been using sbi to do Google search by Image from own webpage, in order to extract the best guess. It has been working well. Yesterday worked well..
Today, when I ran the program, it got errors.
My Code:
import sbi
result = bi.search_by(url='http://weknowyourdreams.com/images/apple/apple-09.jpg')
print result.best_guess
Errors:
File "/Users/Documents/workspace/src/imagesearch.py", line 4, in <module>
result = sbi.search_by(url='http://weknowyourdreams.com/images/apple/apple-09.jpg')
File "/Users/Envs/code1/lib/python2.7/site-packages/sbi.py", line 154, in search_by
url = a['href']
File "/Users/Envs/code1/lib/python2.7/site-packages/bs4/element.py", line 905, in __getitem__
return self.attrs[key]
KeyError: 'href'
Any ideas?
Thanks!

go to sbi.py file, comment out line 129-171.

Related

How to deal with method takes 1 positional argument but 2 were given in functional code?

I have made a very small function to clear a Google Calendar using the API. The API generally works.The reference code from Google Developer pages is as so:
service.calendars().clear('primary').execute()
and my function is as so:
def clear_gcal(service):
someCal = '''my_calendar_address'''
service.calendars().clear(someCal).execute()
print("Some Google Cal cleared")
Running this with a service variable that works (I use it to add to the calendar too with no errors, I get "TypeError: method() takes 1 positional argument but 2 were given".All solutions to this I find uses self in OOP, but my code is just functional (and I prefer it that way for this), so how can I deal with this, seemingly pretty common error type? Thank You,
Traceback (most recent call last):
File "<ipython-input-87-d7bf7ff34210>", line 1, in <module> runfile('C:/Users/b017646/ExportCal/main.py', wdir='C:/Users/b017646/ExportCal')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/b017646/ExportCal/main.py", line 17, in <module> gcal.clear_gcal(service)
File "C:\Users\b017646\ExportCal\gcal.py", line 48, in clear_gcal service.calendars().clear(deaCal).execute()
TypeError: method() takes 1 positional argument but 2 were given
GregersDK
Issue:
You are not providing the calendarId parameter correctly.
Solution:
You should do this instead:
service.calendars().clear(calendarId='primary').execute()
Reference:
calendars().clear(calendarId=*)

MemoryError saving a very large workbook with openpyxl

I'm new to python, although I wrote C about 25 years ago. This is my first program in python.
I've been trying to convert a very large (0.5 million lines, 80 columns) csv file to an xlsx file using openpyxl. I have managed to write the excel file, but when I come to save it, it crashes with a memory error.
I'm using python 3.6 (32-bit)
Anyone got any hints, please? Any comments much appreciated in advance. Thanks!
Code and error are C&P below:
#!python3
import os, sys, csv, openpyxl, datetime, lxml
os.chdir('xxxxxxxxxxxxxxx')
# field sizes are large in input csv so need to increase the size of the field size limit
csv.field_size_limit(sys.maxsize)
# reading in the temporary working file
print('Reading cleaned file...')
with open('input_data.csv') as input_data:
dataReader = csv.reader(input_data,delimiter=';')
inputData = list(dataReader)
now=datetime.datetime.now()
dateStamp=now.strftime("%y%m%d")
newDatadump=dateStamp + ' output_data.xlsx'
# Deletes any old temporary working file.
if os.path.exists (newDatadump):
os.remove(newDatadump)
#writes an excel file
wb=openpyxl.Workbook(write_only=True)
sheet=wb.create_sheet()
print('Writing '+newDatadump+'...')
#debugging
numberOfRows=int(len(inputData))
print('number of rows',numberOfRows)
#create output file
for line in inputData:
sheet.append(line)
print('Phew...')
wb.save(newDatadump)
print('through...')
Output:
RESTART: xxxxxxxxxxx
Reading cleaned file...
Writing 180810 output_data.xlsx...
number of rows 551628
Phew...
And then I get the memory error, and here is it's stack trace.
Stack trace:
Traceback (most recent call last):
File "C:/Users/Simon/Network Drive/DATA/992 test python/cleaning a file example for internet.py", line 38, in <module>
print('through...')
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\workbook\workbook.py", line 365, in save
save_dump(self, filename)
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 313, in save_dump
writer.save(filename)
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 266, in save
self.write_data()
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 83, in write_data
self._write_worksheets()
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\writer\excel.py", line 203, in _write_worksheets
xml = ws._write()
File "C:\Users\Simon\AppData\Local\Programs\Python\Python36-32\lib\site-packages\openpyxl\worksheet\write_only.py", line 261, in _write
out = src.read()
MemoryError
Try installing lxml, as suggested in the docs:
pip install lxml
That should solve the issue, as it did in my case.

PyPDF2, why am I getting an index error? List index out of range

I'm following along in Al Sweigart's book 'Automate the Boring Stuff' and I'm at a loss with an index error I'm getting. I'm working with PyPDF2 tring to open an encrypted PDF document. I know the book is from 2015 so I went to the PyPDF2.PdfFileReader docs to see if I'm missing anything and everything seems to be the same, at least from what I can tell. So I'm not sure what's wrong here.
My Code
import PyPDF2
reader = PyPDF2.PdfFileReader('encrypted.pdf')
reader.isEncrypted # is True
reader.pages[0]
gives:
Traceback (most recent call last):
File "<pyshell#65>", line 1, in <module>
pdfReader.getPage(0)
File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1176, in getPage
self._flatten()
File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1505, in _flatten
catalog = self.trailer["/Root"].getObject()
File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/generic.py", line 516, in __getitem__
return dict.__getitem__(self, key).getObject()
File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/generic.py", line 178, in getObject
return self.pdf.getObject(self).getObject()
File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1617, in getObject
raise utils.PdfReadError("file has not been decrypted")
PyPDF2.utils.PdfReadError: file has not been decrypted
pdfReader.decrypt('rosebud')
1
pageObj = reader.getPage(0)
Traceback (most recent call last):
File "<pyshell#67>", line 1, in <module>
pageObj = pdfReader.getPage(0)
File "/home/user67/.local/lib/python3.6/site-packages/PyPDF2/pdf.py",line 1177, in getPage
return self.flattenedPages[pageNumber]
IndexError: list index out of range
Before asking my question, I did some searching on Google and found this link with a "proposed fix". However, I'm to new at this to see what the fix is. I can't make heads or tails out of this.
I figured it out. The issue is caused by running 'pdfReader.getPage(0)' before you decrypt the file in the IDLE shell. If you take that line out, or start over without using that line after getting the error it will work as it should.
Same error I got. I was working on console and before decrypt I used reader.getPage(0).
Don't use getPage(#) / pages[#] before decrypt.
use code like below:
reader = PyPDF2.PdfFileReader("file.pdf")
# reader.pages[0] # do not use this before decrypt
if reader.isEncrypted:
reader.decrypt('')
reader.pages[0]

How to visualize Pyfst transducers via dot files

I am learning how to create transducers with Pyfst and I am trying to visualize the ones I create. The ultimate goal is to be able to write the transducers to dot files and see them in Graphviz.
I took a sample code to see how to visualize the following acceptor.
a = fst.Acceptor()
a.add_arc(0, 1, 'x', 0.1)
a[1].final = -1
a.draw()
When I use draw(), which comes with the package, I get an error:
File "/Users/.../tests.py", line 42, in <module>
a.draw()
File "_fst.pyx", line 816, in fst._fst.StdVectorFst.draw
(fst/_fst.cpp:15487)
File "/Users/.../venv-3.6/lib/python3.6/re.py", line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: cannot use a string pattern on a bytes-like object
If I try to write the above mentioned acceptor to .dot via this:
def fst_dot(dot_object, filename):
path, file = split(filename)
new_path = join(dot_files_folder_path, path)
if not os.path.exists(new_path):
os.makedirs(new_path)
if hasattr(dot_object, 'dotFormat'):
draw_string = dot_object.dotFormat()
else:
draw_string = dot_object.draw()
open(join(dot_files_folder_path, filename + ".dot"), "w").write(draw_string)
then also I get the following error:
File "/Users/...tests.py", line 43, in <module>
fst_dot(a, 'acceptor')
File "/Users/...tests.py", line 22, in fst_dot
draw_string = dot_object.draw()
File "_fst.pyx", line 816, in fst._fst.StdVectorFst.draw
(fst/_fst.cpp:15487)
File "/Users/.../venv-3.6/lib/python3.6/re.py", line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: cannot use a string pattern on a bytes-like object
So, both errors look the same - there is some kind of an issue with draw().
On the pyfst site it says that draw is used for dot format representation of the transducer.
I can't understand how to fix the error. Any help would be greatly appreciated.
I am using OSX and PyCharm.
You might try using Python2 to see if that helps at all.
However, I think you'll be better off using the Python bindings that are included with OpenFST 1.5+. That library also has the ability to write to GraphViz .dot files. There is documentation available here:
http://www.openfst.org/twiki/bin/view/FST/PythonExtension
I recommend you fstdraw command from openfst.
after a.write('tmp.fst') in python.
$ fstdraw tmp.fst > tmp.dot in shell.
EDIT:
Finally, I found that UFAL's forked pyfst works fine with python3.
https://github.com/UFAL-DSG/pyfst

python requests invalid schema error

I am trying to scrape from an online corpus of texts. These texts are arranged in a tree-like fashion on the site: One clicks A which opens a B page, and in B, one clicks C, and it opens the text. In A there are about ~50 links, in B, it varies between 3 and ~150, there are also sometimes links in C, but I am not interested in them.
Here is what I did to achieve this: I opened the A, I parsed it with BeautifulSoup, I collected the links I wanted, and saved it as a .txt file. Then I did the following:
Url_List=[]
with open("Aramaic_Url_List.txt", "r") as Url_List:
urls=Url_List.read()
A_url_list=urls.splitlines()
Yeni_A_url_list=[showsubtexts for showsubtexts in A_url_list if len(showsubtexts)>52]
Which gave me all the links I wanted from page A in a list form.
Then I wrote a small script to test whether I can obtain the links in B page from an element of the list Yeni_A_url_list, here is my script:
data2=requests.get(Yeni_A_url_list[1].strip())
data2.raise_for_status()
data2_Metin=data2.text
soup_data2=BeautifulSoup(data2_Metin, "lxml")
for link in soup_data2.find_all("a"):
print(link.get("href"))
The strip probably has no function there, but I thought it wouldn't hurt. The script worked rather well for an element. So I thought, it is time to write a function to obtain all the links in page B level for every link in page A. So here is my function:
def ListedenLinkAl(h):
if h in Yeni_A_url_list:
print(h)
g=requests.get(h)
g.raise_for_status()
data_mtn=g.text
data_soup=BeautifulSoup(data_mtn,"lxml")
oP=[b.get("href") for b in data_soup.find_all("a")]
tk=list(set(oP))
sleep(3)
return tk
print is there for me to see the links that have been worked out by the function, and sleep is there to not overcharge the server though for some reason time.sleep revealed an error in syntax. The function also worked for a single element of the list, meaning the following worked: ListedenLinkAl(Yeni_A_url_list[1])
So I thought, it is time to apply this function to every element of the list Yeni_A_url_list and did a list comprehension:
Temiz_url_Listesi=[ListedenLinkAl(x) for x in Yeni_A_url_list]
And I received the following error:
In [45]: Temiz_url_Listesi=[ListedenLinkAl(x) for x in Yeni_A_url_list]
http://cal1.cn.huc.edu/showsubtexts.php?keyword=21200
Traceback (most recent call last):
File "<ipython-input-45-8e4811c83c3f>", line 1, in <module>
Temiz_url_Listesi=[ListedenLinkAl(x) for x in Yeni_A_url_list]
File "<ipython-input-45-8e4811c83c3f>", line 1, in <listcomp>
Temiz_url_Listesi=[ListedenLinkAl(x) for x in Yeni_A_url_list]
File "<ipython-input-36-390e6ed1eae5>", line 6, in ListedenLinkAl
g=requests.get(h)
File "/home/dk/anaconda3/lib/python3.5/site-packages/requests/api.py", line 67, in get
return request('get', url, params=params, **kwargs)
File "/home/dk/anaconda3/lib/python3.5/site-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/home/dk/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/home/dk/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 570, in send
adapter = self.get_adapter(url=request.url)
File "/home/dk/anaconda3/lib/python3.5/site-packages/requests/sessions.py", line 644, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
InvalidSchema: No connection adapters were found for 'http://cal1.cn.huc.edu/showsubtexts.php?keyword=21200'
In [46]:
I have no idea why the function works for a single element in the list, but not in the list comprehension.
Looks like there are extra characters around the url, use str.strip() to clean it up:
g = requests.get(h.strip())

Resources