I have a program which downloads pages from a site, finds links for pictures in them and downloads those pictures. If I try to run this program on a computer with fast and stable Internet connection - everything works perfectly for days and weeks. But if I try this program on a computer with slow or not stable Internet connection - I have one problem - the "try-except" module doesn't seem to work correctly.
--- this function downloads content - any content (page or picture)
def downl(self,addr,cook,head2,errmess):
global result
try:
result=requests.get(addr, cookies=cook, headers=head2)
except:
print(errmess) # error message
time.sleep(5)
return result
I sent to this function link to the page, then other function looks for picture_link in that page, and then I send to the same function (downl) picture_link. After this I save result of function (downl) as a .jpg file. As I told - on a computer with normal internet connection everything works fine. As a result I have 5, 10 or 5000 pictures on my HDD.
But let me show little example of what happens with bad internet connection. Suppose we have 2 pages and 1 picture in every page.
step 1) downloading 1st page (def downl)
step 2) taking picture_link from it
step 3) downloading picture (def downl)
step 4) saving 1st picture to hdd 1.jpg
step 5) downloading 2nd page (def downl)
step 6) taking picture_link from it
step 7) downloading picture (def downl) and receivind error message (errmess)
step 8) saving 2nd picture to hdd 2.jpg
just for example: 1st picture may be normal jpg with proper content. The second picture will be file with jpg extension but will have 2nd page as it's content (it will be usual html file, saved with wrong extension "jpg")
another words: there was problem with internet during downloadind of the second picture, the program printed an error about it (errmess), but INSTEAD of COUNTLESS retrying (as supposed in my function) it somehow PASSED through the try-except block and returned previous result (2nd page), which was saved as 2nd picture.
Please help! How to make this try-except (or requests) work FOREVER, UNTILL it downloads what it is supposed to download (no matter what mistakes happen with internet connection), and not pass through with previous result.
Thanks very much for youк time and attention.
Then you need a while True loop like this:
def downl(self,addr,cook,head2,errmess):
global result
while(True):
try:
result=requests.get(addr, cookies=cook, headers=head2)
return result
except:
print(errmess) # error message
time.sleep(5)
Related
I'm trying to download images with Python 3.9.1
Other than the first 2-3 images, all images are 1 kb in size. How do I download all pictures? Please, help me.
Sample book: http://web2.anl.az:81/read/page.php?bibid=568450&pno=1
import urllib.request
import os
bibID = input("ID: ")
first = int(input("First page: "))
last = int(input("Last page: "))
if not os.path.exists(bibID):
os.makedirs(bibID)
for i in range(first,last+1):
url=f"http://web2.anl.az:81/read/img.php?bibid={bibID}&pno={i}"
urllib.request.urlretrieve(url,f"{bibID}/{i}.jpg")
Doesn't look like there is an issue with your script. It has to do with the APIs you are hitting and the sequence required.
A GET http://web2.anl.az:81/read/img.php?bibid=568450&pno=<page> just on its own doesn't seem to work right away. Instead, it returns No FILE HERE
The reason this happens is that the retrieval of the images is linked to your cookie. You first need to initiate your read session that's generated when first visiting the page and clicking the TƏSDİQLƏYIRƏM button
From what I could tell you need to do the following:
POST http://web2.anl.az:81/read/page.php?bibid=568450 with Content-Type: multipart/form-data body. It should have a single key value of approve: TƏSDİQLƏYIRƏM - this starts a session and generates a cookie for you which you have to add as a header for all of your API calls from now on.
E.g.
requests.post('http://web2.anl.az:81/read/page.php?bibid=568450', files=dict(approve='TƏSDİQLƏYIRƏM'))
Do the following in your for-loop of pages:
a. GET http://web2.anl.az:81/read/page.php?bibid=568450&pno=<page number> - page won't show up if you don't do this first
b. GET http://web2.anl.az:81/read/img.php?bibid=568450&pno=<page number> - finally get the image!
I have been reading through alot of documentation around Praw, bs4 and I've had a look at other peoples examples of how to do this but I just can't get anything working the way I would like. I thought it would be a pretty simple script but every example I find is either written in python2 or just doesn't work at all.
I would like a script to download the top 10 images from a given Subreddit and save them to a folder.
If anyone could point me in the write direction that would be great.
Cheers
The high level flow will look something like this -
Iterate through the top posts of your subreddit.
Extract the url of the submission.
Check if the url is an image.
Save the image to your desired folder.
Stop once you have 10 images.
Here's an example of how this could be implemented-
import urllib.request
subreddit = reddit.subreddit("aww")
count = 0
# Iterate through top submissions
for submission in subreddit.top(limit=None):
# Get the link of the submission
url = str(submission.url)
# Check if the link is an image
if url.endswith("jpg") or url.endswith("jpeg") or url.endswith("png"):
# Retrieve the image and save it in current folder
urllib.request.urlretrieve(url, f"image{count}")
count += 1
# Stop once you have 10 images
if count == 10:
break
I was simply trying to generate a summary that would show the run_metadata as follows:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
summary = sess.run([x, y], options=run_options, run_metadata=run_metadata)
train_writer.add_run_metadata(paths.logs, 'step%d' % step)
train_writer.add_summary(paths.logs, step)
I made sure the path to the logs folder exists, this is confirmed by the fact the the summary file is generated but no metadata is presetn. Now I am not sure a file is actually generated to be honest (for the metadata), but when I open tensorboard, the graph looks fine and the session runs dropdown menu is populated. Now when I select any of the runs it shows a progress bar "Parsing metadata.pbtxt" that stops and hangs right half way through.
This prevents me from gathering any more additional info about my graph. Am I missing something ? A similar issue happened when trying to run this tutorial locally (MNIST summary tutorial). I feel like I am missing something simple. Does anyone have an idea about what could cause this issue ? Why would my tensorboard hang when trying to load a session run data ?
I can't believe I made it work right after posting the question but here it goes. I noticed that this line:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
was giving me an error so I removed the params and turned it into
run_options = tf.RunOptions()
without realizing that this is what caused the metadata not to be parsed. Once I researched the error message:
Couldn't open CUDA library cupti64_90.dll
I looked into this Github Thread and moved the file into the bin folder. After that I ran again my code with the trace_level param, had no errors and the metadata was successfully parsed.
Hello I have built a program script that goes onto a website and selects a size and auto checks out an Item for me it works very well but I have 2 concerns
1.I want to have this script run faster before the script ran pretty fast (so fast that it basically added to cart and went to the checkout page before the Item could even load into the cart (which resulted in errors) and so I added there script to my code
wait = WebDriverWait(driver, 10) and this one which I mainly used to wait until the item loaded into the cart and all the "add to cart" buttons showed up
wait.until(EC.presence_of_element_located((By.NAME, 'commit')))
but I want this script to run faster I tried changing the
wait = WebDriverWait(driver, 10) into something like
wait = WebDriverWait(driver, 1) and
wait = WebDriverWait(driver, 100) but I see no difference is there anything I can do to make the script run faster?(it doesnt have to do with the wait= thing Ill take any thing I can get to even shave off milaseconds.
I am currently using the send_keys option for autofill which is PAINFULLY SLOW is there anything I can use that will fill all the stuff instanstly alltogether? ik there are some "JAVA-scripts simular to this that can do it but im not sure how to right java script or more importantly how to even combine them
Can anyone help me out I just want my selenium python chromedriver script to run as fast as possible.
Thank you.
(for my script im using select for the size and just .click() and a couple of if statements which depends on how many items they want to cart and lots of def fweuf
fweuf() (i forget what those are called lol) )
For sending values with JS you can do this:
js= "document.getElementById('YOURELEMENT').value = '" + str(YOURVALUE) + "';"
driver.execute_script(js)
Hope this helps.
This question is a followup to a previous question.
The snippet of code below almost works...it runs without error yet gives back a None value for results_list. This means it is accessing the file (I think) but just can't extract anything from it.
I have a file, sample.wav, living publicly here: https://storage.googleapis.com/speech_proj_files/sample.wav
I am trying to access it by specifying source_uri='gs://speech_proj_files/sample.wav'.
I don't understand why this isn't working. I don't think it's a permissions problem. My session is instantiated fine. The code chugs for a second, yet always comes up with no result. How can I debug this?? Any advice is much appreciated.
from google.cloud import speech
speech_client = speech.Client()
audio_sample = speech_client.sample(
content=None,
source_uri='gs://speech_proj_files/sample.wav',
encoding='LINEAR16',
sample_rate_hertz= 44100)
results_list = audio_sample.async_recognize(language_code='en-US')
Ah, that's my fault from the last question. That's the async_recognize command, not the sync_recognize command.
That library has three recognize commands. sync_recognize reads the whole file and returns the results. That's probably the one you want. Remove the letter "a" and try again.
Here's an example Python program that does this: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe.py
FYI, here's a summary of the other types:
async_recognize starts a long-running, server-side operation to translate the whole file. You can make further calls to the server to see whether it's finished with the operation.poll() method and, when complete, can get the results via operation.results.
The third type is streaming_recognize, which sends you results continually as they are processed. This can be useful for long files where you want some results immediately, or if you're continuously uploading live audio.
I finally got something to work:
import time
from google.cloud import speech
speech_client = speech.Client()
sample = speech_client.sample(
content = None
, 'gs://speech_proj_files/sample.wav'
, encoding='LINEAR16'
, sample_rate= 44100
, 'languageCode': 'en-US'
)
retry_count = 100
operation = sample.async_recognize(language_code='en-US')
while retry_count > 0 and not operation.complete:
retry_count -= 1
time.sleep(10)
operation.poll() # API call
print(operation.complete)
print(operation.results[0].transcript)
print(operation.results[0].confidence)
for op in operation.results:
print op.transcript
Then something like
for op in operation.results:
print op.transcript