Downloading file from Dropbox API for use in Python Environment with Apache Tika on Heroku - python-3.x

I'm trying to use Dropbox as a cloud-based file receptacle for an app/script. The script, written in Python, needs to take PDFs from the Dropbox and use the tika-python wrapper to convert to string.
I'm able to connect to the Dropbox API and use the files_download_to_file() method to download the PDFs to disk, and then use the tika from_file() method to pull that download file from the disk to process. Example:
# Download ex.pdf to local disk
dbx.files_download_to_file('/my_local_path/ex_on_disk.pdf', '/my_dropbox_path/ex.pdf')
from tika import parser
parsed = parser.from_file('ex_on_disk.pdf')
The problem is that I'm planning on running this app on something like Heroku. I don't think I'm able to save anything locally and then access it again. I'm not sure how to get something from the Dropbox API that can be directly referenced by the tika wrapper to run the same as above. I think the PHP SDK has a file_get_contents and a file_put_contents set of methods but it doesn't appear to have a companion in the Python SDK.
I've tried using the shareable links in place of a filename but that hasn't worked. Any ideas? I know there's also the files_download method which downloads the FileMetadata object but I have no idea what to do with this and am having trouble finding more about it.
TLDR; How can I reference a file on Dropbox with a filename string such as 'example.pdf' to be used in another function that is trying to read a file from disk, without saving that Dropbox file to disk?

I figured it out. I used the files_download method to get the byte string and then use the from_buffer method of tika instead:
md, response = dbx.files_download(path)
file_contents = response.content
parsed = parser.from_buffer(file_contents)

Related

use getObject() from forge-api npm, how to make the return result as a download link?

I am using forge-api getObject() to download the excel from BIM360 hub. I set up express sever in the backend and make the call in the frontend.
I could get the result of the object and it looks like this:
So my question is:
How can I convert the result as a download link correctly? I could download the excel, but the excel can not be opened...
My code looks like this:
backend:
frontend:
I think all you need to modify in your backend code is to return content.body, instead of content
See e.g. https://github.com/Autodesk-Forge/forge-derivatives-explorer/blob/master/routes/data.management.js#L296
It might even be better if you generated a pre-signed URL for the file and passed that to the client. In that case, the file would not be downloaded to your server first and then to the client, but directly to the client in a single step.
https://forge.autodesk.com/en/docs/data/v2/reference/http/buckets-:bucketKey-objects-:objectName-signed-POST/

How to Download a File (from URL) in Typescript

Update: This question used to ask about Google Cloud Storage, but I have since realized the issue actually is reproducable merely trying to save the download to local disk. Thus, I am rephrasing the question to be entirely about file downloads in Typescript and to no longer mention Google Cloud Storage.
When attempting to download and save a file in Typescript with WebRequests (though I experienced the same issue with requests and request-promises), all the code seems to execute correctly, but the resultant file is corrupted and cannot be viewed. For example, if I download an image, the file is not viewable in any applications.
// Seems to work correctly
const download = await WebRequest.get(imageUrl);
// `Buffer.from()` also takes an `encoding` parameter, but it's unclear how to determine the encoding of a download
const imageBuffer = Buffer.from(download.content);
// I *think* this line is straightforward
const imageByteArray = new Uint8Array(imageBuffer);
// Saves a corrupted file
const file = fs.writeFileSync("/path/to/file.png", imageByteArray);
I suspect the issue lies within the Buffer.from call not correctly interpreting the downloaded content, but I'm not sure how to do it right. Any help would be greatly appreciated.
Thanks so much!
From what I saw in the examples for web-request, download.content is just a string. If you want to upload a string to Cloud Storage using the node SDK, you can use File.save, passing that string directly.
Alternatively, you could use one the solutions seen here.

Can we read files from server path using any fs method in NodeJs

In my case I need to read file/icon.png from cloud storage/bucket which is a token base URL/path. Token resides in header of request.
I tried to use fs.readFile('serverpath') but it gave back error as 'ENOENT' i.e. 'No such file or directory' is existed, but file is existed on that path. So are these methods are eligible to make calls and read files from server or they work only with static path, if that is so then in my case how to read file from cloud bucket/server.
Here i need to pass that file-path to UI, to show this icon.
Use this lib to handle GCS operations.
https://www.npmjs.com/package/#google-cloud/storage
If you do need use fs, install https://cloud.google.com/storage/docs/gcs-fuse, mount bucket to your local filesystem, then use fs as you normally would.
I would like to complement Cloud Ace's answer by saying that if you have Storage Object Admin permission you can make the URL of the image public and use it like any other public URL.
If you don't want to make the URL public you can get temporary access to the file by creating a signed URL.
Otherwise, you'll have to download the file using the GCS Node.js Client.
I posted this as an answer as it is quite long to be a comment.

How to download and write a jar file in Node.js?

So I'm working on a Minecraft launcher (because why not, good experience), and I'm stuck when it comes to downloading the libraries.
I have a valid jar URL here. When you download it in the browser, it works fine. But, when you download it with Node.js, 7-zip gives this error when trying to open it:
An attempt was made to move the file pointer before the beginning of the file.
I'm using a module called snekfetch, but I've also tried it with request. Both items gave the same issue. Here's my current test code:
request.get('https://libraries.minecraft.net/com/google/code/gson/gson/2.8.0/gson-2.8.0.jar').then(r => {
fs.writeFileSync('./mything.jar', r.body);
});
Am I doing something wrong to download the jar file?
Okay, so now that I've seen this answer, I need to modify the question. I've gotten it to work using pipes, but I need inline-code because this is a for loop that's downloading (hence my usage of writeFileSync, and in my actual code I use await for the request). Is it even possible to download and write without piping?
It turns out this is an issue with the snekfetch library. Switching to snekfetch v3 fixed it.
You can check out the status of the issue here.

Refreshing URL glitch - cloud updating/rewriting

Hello I've created an VBA script which saves me jpg from excel and then gsync uploads it on gDrive, but here comes the thing. The URL for downloading is volatile and I need full resolution image.
There is link so you can open in awful google UI
and I would like to open THIS => volatile link :(
Or can I use this VBA to upload image on some other Cloud directly from excel?
You can either use the Drive SDK to get the file details including the latest temporary download link:
https://developers.google.com/drive/v2/reference/files/get
...or you can make a parent or grandparent folder public and work out a URL direct to the image using its filename, like this:
http://gappstips.com/gmail/use-google-drive-to-host-your-gmail-signature/
http://drive.google.com/uc?export=view&id= "ID"
If you change URL in view mode you can view that image dirrect.
There are other files types.

Resources