Can an except block of python have 2 conditions simultaneously? - python-3.x

I was trying to learn stock prediction with the help of this github project. but when I run the main.py file given in the repository, via the cmd. I encountered an error
File "/Stock-Predictor/src/tweetstream/streamclasses.py", line 101
except urllib2.HTTPError, exception:
^
SyntaxError: invalid syntax
The below given code is part of a PyPi module named tweetstreami.e. named as tweetstream/streamclasses.py. Which while implementing in a Twitter sentiment analysis project gave the error
import time
import urllib
import urllib2
import socket
from platform import python_version_tuple
import anyjson
from . import AuthenticationError, ConnectionError, USER_AGENT
class BaseStream(object):
"""A network connection to Twitters streaming API
:param username: Twitter username for the account accessing the API.
:param password: Twitter password for the account accessing the API.
:keyword count: Number of tweets from the past to get before switching to
live stream.
:keyword url: Endpoint URL for the object. Note: you should not
need to edit this. It's present to make testing easier.
.. attribute:: connected
True if the object is currently connected to the stream.
.. attribute:: url
The URL to which the object is connected
.. attribute:: starttime
The timestamp, in seconds since the epoch, the object connected to the
streaming api.
.. attribute:: count
The number of tweets that have been returned by the object.
.. attribute:: rate
The rate at which tweets have been returned from the object as a
float. see also :attr: `rate_period`.
.. attribute:: rate_period
The amount of time to sample tweets to calculate tweet rate. By
default 10 seconds. Changes to this attribute will not be reflected
until the next time the rate is calculated. The rate of tweets vary
with time of day etc. so it's useful to set this to something
sensible.
.. attribute:: user_agent
User agent string that will be included in the request. NOTE: This can
not be changed after the connection has been made. This property must
thus be set before accessing the iterator. The default is set in
:attr: `USER_AGENT`.
"""
def __init__(self, username, password, catchup=None, url=None):
self._conn = None
self._rate_ts = None
self._rate_cnt = 0
self._username = username
self._password = password
self._catchup_count = catchup
self._iter = self.__iter__()
self.rate_period = 10 # in seconds
self.connected = False
self.starttime = None
self.count = 0
self.rate = 0
self.user_agent = USER_AGENT
if url: self.url = url
def __enter__(self):
return self
def __exit__(self, *params):
self.close()
return False
def _init_conn(self):
"""Open the connection to the twitter server"""
headers = {'User-Agent': self.user_agent}
postdata = self._get_post_data() or {}
if self._catchup_count:
postdata["count"] = self._catchup_count
poststring = urllib.urlencode(postdata) if postdata else None
req = urllib2.Request(self.url, poststring, headers)
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, self.url, self._username, self._password)
handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(handler)
try:
self._conn = opener.open(req)
except urllib2.HTTPError, exception: #___________________________problem here
if exception.code == 401:
raise AuthenticationError("Access denied")
elif exception.code == 404:
raise ConnectionError("URL not found: %s" % self.url)
else: # re raise. No idea what would cause this, so want to know
raise
except urllib2.URLError, exception:
raise ConnectionError(exception.reason)

The second item in the except is an identifier used in the body of the exception to access the exception information. The try/except syntax changed between Python 2 and Python 3 and your code is the Python 2 syntax.
Python 2 (language reference):
try:
...
except <expression>, <identifier>:
...
Python 3 (language reference, rationale):
try:
...
except <expression> as <identifier>:
...
Note that can be a single exception class or a tuple of exception classes to catch more than one type in a single except clause, so to answer your titled question you could use the following to handle more than one possible exception being thrown:
try:
x = array[5] # NameError if array doesn't exist, IndexError if it is too short
except (IndexError,NameError) as e:
print(e) # which was it?

Use...
Try: #code here
Except MyFirstError: #exception handling
Except AnotherError: #exception handling
You can repeat this many times

Related

APIError(code=-1099): Not found, unauthenticated, or unauthorized

I hope that you are having a good day.
I am working with the Binance api trough the Python-Binance.
My bot runs fine most of the time but for some reason after a while it randomly stops and gives me this error message. "APIError(code=-1099): Not found, unauthenticated, or unauthorized".
The error seems to happen at the same line of code everytime.
from binance.client import Client
import time
from datetime import datetime
import pandas as pd
class LastPrice():
def __init__(self, symbol):
self.symbol = symbol
self.api_key = "my key"
self.secret_key = "my secret"
def get_client(self):
try:
client = Client(self.api_key, self.secret_key)
except Exception as e:
print(e)
pass
return client
def get_last_price(self):
last_price_string = self.get_client().get_ticker(symbol=self.symbol)['lastPrice']
last_price = float(last_price_string)
return last_price_string
def get_last_price_and_time(self):
last_price_string = self.get_client().get_ticker(symbol=self.symbol)['lastPrice']
close_time = self.get_client().get_ticker(symbol=self.symbol)['closeTime']
last_price = float(last_price_string)
price_and_time = [close_time, last_price]
return price_and_time
error message
I Have included a screenshot of my error message i get. Despite using the same code to generate the client everytime i only get the error in that specific code. My code stops because the client is referenced before assignement but I am looking for the cause of the precedent APIError.
Any idea what that error meassage means in the context of my code ? Thanks in advance !

How to get the processed results from dramatiq python?

import dramatiq
from dramatiq.brokers.redis import RedisBroker
from dramatiq.results import Results
from dramatiq.results.backends import RedisBackend
broker = RedisBroker(host="127.0.0.1", port=6379)
broker.declare_queue("default")
dramatiq.set_broker(broker)
# backend = RedisBackend()
# broker.add_middleware(Results(backend=backend))
#dramatiq.actor()
def print_words(text):
print('This is ' + text)
print_words('sync')
a = print_words.send('async')
a.get_results()
I was checking alternatives to celery and found Dramatiq. I'm just getting started with dramatiq and I'm unable to retrieve results. I even tried setting the backend and 'save_results' to True. I'm always getting this AttributeError: 'Message' object has no attribute 'get_results'
Any idea on how to get the result?
You were on the right track with adding a result backend. The way to instruct an actor to store results is store_results=True, not save_results and the method to retrieve results is get_result(), not get_results.
When you run get_result() with block=False, you should wait the worker set result ready, like this:
while True:
try:
res = a.get_result(backend=backend)
break
except dramatiq.results.errors.ResultMissing:
# do something like retry N times.
time.sleep(1)
print(res)

Catch "HTTP500 Error"

I'm querying Facebook with the following code, iterating over a list of page names to get the pages' numeric ID and store it in a dictionary. I keep catching a HTTP 500 error, however; this doesn't appear in the short list I present here, though. See code:
import json
def FB_IDs(page_name, access_token=access_token):
""" get page's numeric information """
# construct URL
base = "https://graph.facebook.com/v2.4"
node = "/" + str(page_name)
parameters = "/?access_token=%s" % access_token
url = base + node + parameters
# retrieve data
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
return data
pages_ids_dict = {}
for page in pages:
pages_ids_dict[page] = FacebookIDs(page, access_token)['id']
How can I automate this and avoid the error?
There is a pretty standard helper function for this, which you might want to look at:
### HELPER FUNCTION ###
def request_until_succeed(url):
""" helper function to catch HTTP error 500"""
req = urllib.request.Request(url)
success = False
while success is False:
try:
response = urllib.request.urlopen(req)
if response.getcode() == 200:
success = True
except Exception as e:
print(e)
time.sleep(5)
print("Error for URL") # use following code if URL shall be printed %s: %s" % (url, datetime.datetime.now()))
return response.read()
Implement that function into yours so your function calls that the URL through that one and you should be fine.

How to check if boto3 S3.Client.upload_fileobj succeeded?

I want to save the result of a long running job on S3. The job is implemented in Python, so I'm using boto3. The user guide says to use S3.Client.upload_fileobj for this purpose which works fine, except I can't figure out how to check if the upload has succeeded. According to the documentation, the method doesn't return anything and doesn't raise an error. The Callback param seems to be intended for progress tracking instead of error checking. It is also unclear if the method call is synchronous or asynchronous.
If the upload failed for any reason, I would like to save the contents to the disk and log an error. So my question is: How can I check if a boto3 S3.Client.upload_fileobj call succeeded and do some error handling if it failed?
I use a combination of head_object and wait_until_exists.
import boto3
from botocore.exceptions import ClientError, WaiterError
session = boto3.Session()
s3_client = session.client('s3')
s3_resource = session.resource('s3')
def upload_src(src, filename, bucketName):
success = False
try:
bucket = s3_resource.Bucket(bucketName)
except ClientError as e:
bucket = None
try:
# In case filename already exists, get current etag to check if the
# contents change after upload
head = s3_client.head_object(Bucket=bucketName, Key=filename)
except ClientError:
etag = ''
else:
etag = head['ETag'].strip('"')
try:
s3_obj = bucket.Object(filename)
except ClientError, AttributeError:
s3_obj = None
try:
s3_obj.upload_fileobj(src)
except ClientError, AttributeError:
pass
else:
try:
s3_obj.wait_until_exists(IfNoneMatch=etag)
except WaiterError as e:
pass
else:
head = s3_client.head_object(Bucket=bucketName, Key=filename)
success = head['ContentLength']
return success
There is a wait_until_exists() helper function that seems to be for this purpose in the boto3.resource object.
This is how we are using it:
s3_client.upload_fileobj(file, BUCKET_NAME, file_path)
s3_resource.Object(BUCKET_NAME, file_path).wait_until_exists()
I would recommend you to perform the following operations-
try:
response = upload_fileobj()
except Exception as e:
save the contents to the disk and log an error.
if response is None:
polling after every 10s to check if the file uploaded successfully or not using **head_object()** function..
If you got the success response from head_object :
break
If you got error in accessing the object:
save the contents to the disk and log an error.
So , basically do poll using head_object()

Is selenium thread safe for scraping with Python?

I am executing a Python script with Threading, where given a "query" term that I put in the Queue, I create the url with the query parameters, set the cookies & parse the webpage to return the Products & the urls of those products. Here's the script.
Task : For a given set of queries, store the top 20 product ids in a file, or lower # if the query returns fewer results.
I remember reading that Selenium is not thread safe. Just want to make sure that this problem occurs because of that limitation, and is there a way to make it work in concurrent threads ? The main problem is that the script was I/O bound, so very slow for scraping about 3000 url fetches.
from pyvirtualdisplay import Display
from data_mining.scraping import scraping_conf as sf #custom file with rules for scraping
import Queue
import threading
import urllib2
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
num_threads=5
COOKIES=sf.__MERCHANT_PARAMS[merchant_domain]['COOKIES']
query_args =sf.__MERCHANT_PARAMS[merchant_domain]['QUERY_ARGS']
class ThreadUrl(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, queue, out_queue):
threading.Thread.__init__(self)
self.queue = queue
self.out_queue = out_queue
def url_from_query(self,query):
for key,val in query_args.items():
if query_args[key]=='query' :
query_args[key]=query
print "query", query
try :
url = base_url+urllib.urlencode(query_args)
print "url"
return url
except Exception as e:
log()
return None
def init_driver_and_scrape(self,base_url,query,url):
# Will use Pyvirtual display later
#display = Display(visible=0, size=(1024, 768))
#display.start()
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2)
fp.set_preference("javascript.enabled", True)
driver = webdriver.Firefox(firefox_profile=fp)
driver.delete_all_cookies()
driver.get(base_url)
for key,val in COOKIES[exp].items():
driver.add_cookie({'name':key,'value':val,'path':'/','domain': merchant_domain,'secure':False,'expiry':None})
print "printing cookie name & value"
for cookie in driver.get_cookies():
if cookie['name'] in COOKIES[exp].keys():
print cookie['name'],"-->", cookie['value']
driver.get(base_url+'search=junk') # To counter any refresh issues
driver.implicitly_wait(20)
driver.execute_script("window.scrollTo(0, 2000)")
print "url inside scrape", url
if url is not None :
flag = True
i=-1
row_data,row_res=(),()
while flag :
i=i+1
try :
driver.get(url)
key=sf.__MERCHANT_PARAMS[merchant_domain]['GET_ITEM_BY_ID']+str(i)
print key
item=driver.find_element_by_id(key)
href=item.get_attribute("href")
prod_id=eval(sf.__MERCHANT_PARAMS[merchant_domain]['PRODUCTID_EVAL_FUNC'])
row_res=row_res+(prod_id,)
print url,row_res
except Exception as e:
log()
flag =False
driver.delete_all_cookies()
driver.close()
return query+"|"+str(row_res)+"\n" # row_data, row_res
else :
return [query+"|"+"None"]+"\n"
def run(self):
while True:
#grabs host from queue
query = self.queue.get()
url=self.url_from_query(query)
print "query, url", query, url
data=self.init_driver_and_scrape(base_url,query,url)
self.out_queue.put(data)
#signals to queue job is done
self.queue.task_done()
class DatamineThread(threading.Thread):
"""Threaded Url Grab"""
def __init__(self, out_queue):
threading.Thread.__init__(self)
self.out_queue = out_queue
def run(self):
while True:
#grabs host from queue
data = self.out_queue.get()
fh.write(str(data)+"\n")
#signals to queue job is done
self.out_queue.task_done()
start = time.time()
def log():
logging_hndl=logging.getLogger("get_results_url")
logging_hndl.exception("Stacktrace from "+"get_results_url")
df=pd.read_csv(fh_query, sep='|',skiprows=0,header=0,usecols=None,error_bad_lines=False) # read all queries
query_list=list(df['query'].values)[0:3]
def main():
exp="Control"
#spawn a pool of threads, and pass them queue instance
for i in range(num_threads):
t = ThreadUrl(queue, out_queue)
t.setDaemon(True)
t.start()
#populate queue with data
print query_list
for query in query_list:
queue.put(query)
for i in range(num_threads):
dt = DatamineThread(out_queue)
dt.setDaemon(True)
dt.start()
#wait on the queue until everything has been processed
queue.join()
out_queue.join()
main()
print "Elapsed Time: %s" % (time.time() - start)
While I should be getting, all search results from each url page, I get only the 1st , i=0 search card and this doesn't execute for all queries/urls. What am I doing wrong ?
What I expect -
url inside scrape http://<masked>/search=nike+costume
searchResultsItem0
url inside scrape http://<masked>/search=red+tops
searchResultsItem0
url inside scrape http://<masked>/search=halloween+costumes
searchResultsItem0
and more searchResultsItem(s) , like searchResultsItem1,searchResultsItem2 and so on..
What I get
url inside scrape http://<masked>/search=nike+costume
searchResultsItem0
url inside scrape http://<masked>/search=nike+costume
searchResultsItem0
url inside scrape http://<masked>/search=nike+costume
searchResultsItem0
The skeleton code was taken from
http://www.ibm.com/developerworks/aix/library/au-threadingpython/
Additionally when I use Pyvirtual display, will that work with Threading as well ? I also used processes with the same Selenium code, and it gave the same error. Essentially it opens up 3 Firefox browsers, with exact urls, while it should be opening them from different items in the queue. Here I stored the rules in file that will import as sf, which has all custom attributes of a Base Domain.
Since setting the cookies is an integral part of my script, I can't use dryscrape.
EDIT :
I tried to localize the error, and here's what I found -
In the custom rules file, I call "sf" above, I had defined, QUERY_ARGS as
__MERCHANT_PARAMS = {
"some_domain.com" :
{
COOKIES: { <a dict of dict, masked here>
},
... more such rules
QUERY_ARGS:{'search':'query'}
}
So what is really happening is , that on calling,
query_args =sf.__MERCHANT_PARAMS[merchant_domain]['QUERY_ARGS'] - this should return the dict
{'search':'query'}, while it returns,
AttributeError: 'module' object has no attribute
'_ThreadUrl__MERCHANT_PARAMS'
This is where I don't understand how the thread is passing '_ThreadUrl__' I also tried re-initializing the query_args,inside the url_from_query method, but this doesn't work.
Any pointers, on what am I doing wrong ?
I may be replying pretty late to this. However, I tested it python2.7 and both options multithreading and mutliprocess works with selenium and it is opening two separate browsers.

Resources