How to be Sure that page downloaded is complete by python - python-3.x

I am downloading a JSON based huge page, most of the time it is downloaded successfully but sometimes, its downloading partially. How can I sure that download completed.
My example code is as follows:
mac_sonuclari_url = "http://mservice.fanatik.com.tr/LeagueStage?TournamentID={}&includeFixture=1"
with urllib.request.urlopen(mac_sonuclari_url.format(1)) as url:
try:
data = json.loads(url.read().decode())
except Exception as err:
logging.error("{}: Error Getting URL: {} with Error: {}".format(fna, mac_sonuclari_url.format(1), err))
unfortunately, I can't catch partial download by Try - Except.
Then my code breaks as it doesn't catch all data needed.
Is there any way to understand that page loaded completely?
Thanks a lot

[From the comments]
You could do a size check of the read string against the returned 'Content-Length' header. If all data has been retrieved, the two sizes should agree.

Related

Selenium Connection Refused

I am scraping Google search pages using Python/Selenium, and since last night I have been encountering a MaxRetyError: p[Errno 61] Connection refused error. I debugged my code and found that the error begins in this code block right here"
domain = pattern.search(website)
counter = 2
# keep running this until the url appears like normal
while domain is None:
counter += 1
# close chrome and try again
print('link not found, closing chrome and restarting ...\nwaiting {} seconds...'.format(counter))
chrome.quit()
time.sleep(counter)
# chrome = webdriver.Chrome()
time.sleep(10) ### tried inserting a timer.sleep to delay request
chrome.get('https://google.com') ### error is right here. This is the second instance of chrome.get in this script
target = chrome.find_element_by_name('q')
target.send_keys(college)
target.send_keys(Keys.RETURN)
# parse the webpage
soup = BeautifulSoup(chrome.page_source, 'html.parser')
website = soup.find('cite', attrs={'class': 'iUh30'}).text
print('tried to get URL, is this it? : {}\n'.format(website))
pattern = re.compile(r'\w+\.(edu|com)')
domain = pattern.search(website)
I keep getting the following error:
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='ADDRESS', port=PORT): Max retries exceeded with url: /session/92ca3da95353ca5972fb5c520b704be4/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x11100e4e0>: Failed to establish a new connection: [Errno 61] Connection refused',))
As you can see in the code block above, I entered a timer.sleep() but it doesn't appear to help at all. For context, this script is part of a function, which in another script is repeatedly called in a loop. But again, I make sure to add delays in between each call of the webdriver.get() method. As of now, my script fails at the first iteration of this loop.
I tried googling the issue but the closest thing I found was this. It appears to speak to the same exact error, and the top answer identifies the same method that is causing the issue, but I don't really understand what the Solution and Conclusion sections are saying. I get that the MaxRetryError is confusing for debugging, but what precisely is the solution?
It mentions a max_retries argument, and Tracebacks, but I don't know what they mean in this context. Is there any way that I can catch this error (in the context of selenium)? I have some threads on Stack Exchange with mention of catching an error, but only in the context of urllib3. In my case, I would need to catch the same error for the Selenium package.
Thanks for any advice
My code still runs into issues every once in a while (which could be solved by using proxies), but I think I found the source of the issue. This loop anticipates that the first pattern match will return a .edu or .com, but does not anticipate for a .org. Therefore, my code runs indefinitely when the first search result returns a .org. Here is the source of the issue:
website = soup.find('cite', attrs={'class': 'iUh30'}).text
print('tried to get URL, is this it? : {}\n'.format(website))
pattern = re.compile(r'\w+\.(edu|com)') # does not anticipate .org's
Now my code runs okay, though I do run into errors when the code runs for too long (in this case the source of the issue is much clearer).
You are quitting the Chrome driver too early. After you call chrome.quit() it will cause subsequent calls to chrome.get('https://google.com') to fail and then automated retries lead to the MaxRetryError.
Try removing the call to chrome.quit().

LIEF: how to catch exceptions

Good morning folks,
I'm in front of a problem.
I am using the package lief (python3 package) to parse some ELF binaries.
It happens that some exceptions are raised.
I would like to catch them to apply some modifications depending on the exception raised.
Unfortunately, I cannot manage to catch them properly.
Here's what I tried:
try :
binary = lief.ELF.parse(sys.argv[-1])
except lief.exception:
print("lief exception")
except:
print("all exceptions")
It always shows "all exceptions"..
Here's the link of the API: API LINK
Thank you for your help !

Vimeo API: Upload from a File Using a Form

I followed the docs for the vimeo node.js api to upload a file. It's quite simple and I have it working by running it directly in node, with the exception that it requires me to pass the full path of the file I want to upload. Code is here:
function uploadFile() {
let file = '/Users/full/path/to/file/bulls.mp4';
let video_id; //the eventual end URI of the uploaded video
lib.streamingUpload(file, function(error, body, status_code, headers) {
if (error) {
throw error;
}
lib.request(headers.location, function (error, body, status_code, headers) {
console.log(body);
video_id = body.uri;
//after it's done uploading, and the result is returned, update info
updateVideoInfo(video_id);
});
}, function(upload_size, file_size) {
console.log("You have uploaded " +
Math.round((upload_size/file_size) * 100) + "% of the video");
});
}
Now I want to integrate into a form generated in my react app, except that the result of evt.target.files[0] is not a full path, the result is this:
File {name: "bulls.mp4", lastModified: 1492637558000, lastModifiedDate: Wed Apr 19 2017 14:32:38 GMT-0700 (PDT), webkitRelativePath: "", size: 1359013595…}
Just for the sake of it, I piped that into my already working upload function and it didn't work for the reasons specified. Am I missing something? If not I just want to clarify what I actually have to do then. So now I'm looking at the official Vimeo guide and wanted to make sure that was the right road to go down. See: https://developer.vimeo.com/api/upload/videos
So if I'm reading it right, you do several requests to achieve the same goal?
1) Do a GET to https://api.vimeo.com/me to find out the remaining upload data they have.
2) Do a POST to https://api.vimeo.com/me/videos to get an upload ticket. Use type: streaming if I want a resumable upload such as those provided by the vimeo streamingUpload() function
3) Do a PUT to https://1234.cloud.vimeo.com/upload?ticket_id=abcdef124567890
4) Do a PUT to https://1234.cloud.vimeo.com/upload?ticket_id=abcdef124567890 but without file data and the header Content-Range: bytes */* anytime I want to check the bytes uploaded.
Sound right? Or can you simply use a form and I got it wrong somewhere. Let me know. Thanks.
There's some example code in this project that might be worth checking out: https://github.com/websemantics/vimeo-upload.
Your description is mostly correct for the streaming system, but I want to clarify the last two points.
3) In this step, you should make a PUT request to that url with a Content-Length header describing the full size of the file (as described here: https://developer.vimeo.com/api/upload/videos#upload-your-video)
4) In this step, the reason you are checking bytes uploaded is if you have completed the upload, or if your connection in the PUT request dies. We save as many bytes possible, and we will respond to the request in step 4. with how many bytes we received. This lets you resume step 3 where you left off instead of at the very beginning.
For stability we highly recommend the resumable uploader, but if you are looking for simplicity we do offer a simple POST uploader that uses an HTML form. The docs for that are here: https://developer.vimeo.com/api/upload/videos#simple-upload

How do I init XOD in WebViewer? "DisplayModes" is undefined

I'm trying to load a XOD document into a PDFTron WebViewer. As far as I can read in the documentation and samples, this should be a simple "plug and play"-operation - it should simply work when you point at a file. Ideally, in my example, the document should be fetched from a service, as so:
fetch('/myservice/GetXOD')
.then(function(data) {
$(function() {
var viewerElement = document.getElementById("viewer");
var myWebViewer = new PDFTron.WebViewer({
initialDoc: data.body
}, viewerElement);
});
});
Unfortunately I get the following error:
Uncaught TypeError: Cannot read property 'DisplayModes' of undefined
The reason I'm doing it in a fetch, is because I'm rendering a Handlebars template, and pass the data to instantiate in a callback. However, I've isolated the code into an otherwise "empty" HTML-document, and in the simplified example below, I'm simply pointing at the XOD provided by PDFTron on page load (no fetch this time).
$(function() {
var viewerElement = document.getElementById("viewer");
var myWebViewer = new PDFTron.WebViewer({
initialDoc: 'GettingStarted.xod' // Using the XOD provided by PDFTron
}, viewerElement);
});
This unfortunately returns a different error (HTTP status 416).
Uncaught Error: Error loading document: Error retrieving file: /doc/WebViewer_Developer_Guide.xod?_=-22,. Received return status 416.
The same error appears when I run the samples from PDFTron on localhost.
I'm at a complete loss of how I should debug this further - all the samples assume everything is working out of the box.
I should note that I can actually get PDFs working just fine on localhost, but not on the server. XODs are problematic both on the server and on localhost.
I'm sorry to hear you are having troubles running our samples.
Your error message says 416 which means "Requested range not satisfiable". Perhaps your development servers do not support byte range requests (https://en.wikipedia.org/wiki/Byte_serving).
Could you try passing an option streaming: true? When streaming is true you're just requesting the entire file up front which shouldn't be a problem for any servers but it is a problem for WebViewer if the file is large because it will need to be completely downloaded and parsed at the start which is slow.

Java exception: "Can't get a Writer while an OutputStream is already in use" when running xAgent

I am trying to implement Paul Calhoun's Apache FOP solution for creating PDF's from Xpages (from Notes In 9 #102). I am getting the following java exception when trying to run the xAgent that does the processing --> Can't get a Writer while an OutputStream is already in use
The only changes that I have done from Paul's code was to change the package name. I have isolated when the exception happens to the SSJS line: var jce: DominoXMLFO2PDF = new DominoXMLFO2PDF(); All that line does is instantiate the class, there is no custom constructor. I don't believe it is the code itself, but some configuration issue. The SSJS code is in the beforeRenderResponse event where it should be, I haven't changed anything on the xAgent.
I have copied the jar files from Paul's sample database to mine, I have verified that the build paths are the same between the two databases. Everything compiles fine (after I did all this.) This exception appears to be an xpages only exception.
Here's what's really going on with this error:
XPages are essentially servlets... everything that happens in an XPage is just layers on top of a servlet engine. There are basically two types of data that a servlet can send back to whatever is initiating the connection (e.g. a browser): text and binary.
An ordinary XPage sends text -- specifically, HTML. Some xAgents also send text, such as JSON or XML. In any of these scenarios, however, Domino uses a Java Writer to send the response content, because Writers are optimized for sending Character data.
When we need to send binary content, we use an OutputStream instead, because streams are optimized for sending generic byte data. So if we're sending PDF, DOC/XLS/PPT, images, etc., we need to use a stream, because we're sending binary data, not text.
The catch (as you'll soon see, that's a pun) is that we can only use one per response.
Once any HTTP client is told what the content type of a response is, it makes assumptions about how to process that content. So if you tell it to expect application/pdf, it's expecting to only receive binary data. Conversely, if you tell it to expect application/json, it's expecting to only receive character data. If the response includes any data that doesn't match the promised content type, that nearly always invalidates the entire response.
So Domino in its infinite wisdom protects us from making this mistake by only allowing us to send one or the other in a single request, and throws an exception if we disobey that rule.
Unfortunately... if there's any exception in our code when we're trying to send binary content, Domino wants to report that to the consumer... which tries to invoke the output writer to send HTML reporting that something went wrong. Except we already got a handle on the output stream, so Domino isn't allowed to get a handle on the output writer, because that would violate its own rule against only using one per response. This, in turn, throws the exception you reported, masking the exception that actually caused the problem (in your case, probably a ClassNotFoundException).
So how do we make sure that we see the real problem, and not this misdirection? We try:
try {
/*
* Move all your existing code here...
*/
} catch (e) {
print("Error generating dynamic PDF: " + e.toString());
} finally {
facesContext.responseComplete();
}
There are two reasons this is a preferred approach:
If something goes wrong with our code, we don't let Domino throw an exception about it. Instead, we log it (instead of using print to send it to the console and log, you could also toss it to OpenLog, or whatever your preferred logging mechanism happens to be). This means that Domino doesn't try to report the error to the user, because we've promised that we already reported it to ourselves.
By moving the crucial facesContext.responseComplete() call (which is what ultimately tells Domino not to send any content of its own) to the finally block, this ensures it will get executed. If we left it inside the try block, it would get skipped if an exception occurs, because we'd skip straight to the catch... so even though Domino isn't reporting our exception because we caught it, it still tries to invoke the response writer because we didn't tell it not to.
If you follow the above pattern, and something's wrong with your code, then the browser will receive an incomplete or corrupt file, but the log will tell you what went wrong, rather than reporting an error that has nothing to do with the root cause of the problem.
I almost deleted this question, but decided to answer it myself since there is very little out on google when you search for the exception.
The issue was in the xAgent, there is a line importPackage that was incorrect. Fixing this made everything work. The exception verbage: "Can't get a Writer while an OutputStream is already in use" is quite misleading. I don't know what else triggers this exception, but an alternative description would be "Java class ??yourClass?? not found"
If you found this question, then you likely have the same issue. I would ignore what the exception actually says, and check your package statements throughout your application. The java code will error on its own, but your SSJS that references the java will not error until runtime, focus on that code.
Update the response header after the body can solve this kind of problem, example :
HttpServletResponse response = (HttpServletResponse) facesContext.getExternalContext().getResponse();
response.getWriter().write("<html><body>...</body></html>");
response.setContentType("text/html");
response.setHeader("Cache-Control", "no-cache");
response.setCharacterEncoding("UTF-8");

Resources