How can I read the contents of a DocumentFile in Android? - android-studio

I have added a feature to an application to allow the user to create a local backup on external storage. That part is working great, but I am having trouble restoring that back up.
Using the SAF, I have found an easy way for the user to select the file they are wanting to restore. The file they choose comes back to the app as a DocumentFile
How can I open the DocumentFile and read its contents on API 26 and up?
So far I tried 2 things and neither worked:
1)
new File(myDocumentFile.getUri().getPath())
2)
I've tried this answer but it didn't work. It is also so hacky and I don't really need access to the java.io.File I just need a way to read the contents.
How does Google want us to approach reading user's files?

Kotlin code to read lines from a text file (where 'file' is a DocumentFile)
val inputStream = contentResolver.openInputStream(file.uri)
val reader = BufferedReader(InputStreamReader(inputStream))
val lines = reader.readLines()

Related

Reading GeoJSON in databricks, no mount point set

We have recently made changes to how we connect to ADLS from Databricks which have removed mount points that were previously established within the environment. We are using databricks to find points in polygons, as laid out in the databricks blog here: https://databricks.com/blog/2019/12/05/processing-geospatial-data-at-scale-with-databricks.html
Previously, a chunk of code read in a GeoJSON file from ADLS into the notebook and then projected it to the cluster(s):
nights = gpd.read_file("/dbfs/mnt/X/X/GeoSpatial/Hex_Nights_400Buffer.geojson")
a_nights = sc.broadcast(nights)
However, the new changes that have been made have removed the mount point and we are now reading files in using the string:
"wasbs://Z#Y.blob.core.windows.net/X/Personnel/*.csv"
This works fine for CSV and Parquet files, but will not load a GeoJSON! When we try this, we get an error saying "File not found". We have checked and the file is still within ADLS.
We then tried to copy the file temporarily to "dbfs" which was the only way we had managed to read files previously, as follows:
dbutils.fs.cp("wasbs://Z#Y.blob.core.windows.net/X/GeoSpatial/Nights_new.geojson", "/dbfs/tmp/temp_nights")
nights = gpd.read_file(filename="/dbfs/tmp/temp_nights")
dbutils.fs.rm("/dbfs/tmp/temp_nights")
a_nights = sc.broadcast(nights)
This works fine on the first use within the code, but then a second GeoJSON run immediately after (which we tried to write to temp_days) fails at the gpd.read_file stage, saying file not found! We have checked with dbutils.fs.ls() and can see the file in the temp location.
So some questions for you kind folks:
Why were we previously having to use "/dbfs/" when reading in GeoJSON but not csv files, pre-changes to our environment?
What is the correct way to read in GeoJSON files into databricks without a mount point set?
Why does our process fail upon trying to read the second created temp GeoJSON file?
Thanks in advance for any assistance - very new to Databricks...!
Pandas uses the local file API for accessing files, and you accessed files on DBFS via /dbfs that provides that local file API. In your specific case, the problem is that even if you use dbutils.fs.cp, you didn't specify that you want to copy file locally, and it's by default was copied onto DBFS with path /dbfs/tmp/temp_nights (actually it's dbfs:/dbfs/tmp/temp_nights), and as result local file API doesn't see it - you will need to use /dbfs/dbfs/tmp/temp_nights instead, or copy file into /tmp/temp_nights.
But the better way would be to copy file locally - you just need to specify that destination is local - that's done with file:// prefix, like this:
dbutils.fs.cp("wasbs://Z#Y.blob.core.windows.net/...Nights_new.geojson",
"file:///tmp/temp_nights")
and then read file from /tmp/temp_nights:
nights = gpd.read_file(filename="/tmp/temp_nights")

How to delete an Encrypted File in Kotlin/ Android Studio

I am using Kotlin to create and write to an encrypted file stored locally within the app to temporarily store login credentials, On Logout, I want to delete this file.
To create the file I am using the EncryptedFile.Builder method as below;
val encryptedFileName = "UserAuthData_.txt"
val mainKey = MasterKey.Builder(applicationContext)
.setKeyScheme(MasterKey.KeyScheme.AES256_GCM)
.build()
val encryptedFile = EncryptedFile.Builder(
applicationContext,
File(applicationContext.applicationInfo.dataDir, encryptedFileName),
mainKey,
EncryptedFile.FileEncryptionScheme.AES256_GCM_HKDF_4KB
).build()
I want to delete this file but can't find any way of doing so.
Any ideas would be greatly appreciated.
Thank you!
You can simply delete the file using this -
File someFile = new File(encryptedFileName);
someFile.delete();

Google Drive File Stream files creation date

I know it has been asked a few times already, but - not in this context I think and other questions have been asked few years ago already, so I'm hoping maybe something changed.
So my issue is - I am uploading files to the Google Drive using Google Drive File Stream. However, while the uploading goes smoothly, I have a problem with files creation date - it is always changed to the timestamp of the time the file got uploaded, not the actual, local file creation date. It is a serious problem, as I am going to use this to back-up huge amounts of data and preserve all the meta-data I can and the creation date is crucial. Is there a way to either upload it with the creation date intact, or to change it after the upload? From what I've seen this seems not to be possible, but I have to try and make it work. Any help and insight will be appreciated. I'm using the Drive File Stream with Python.
EDIT: I didn't make it clear enough - the issue here is that I don't want to use Google Drive API at all, but rather deal with this using only Google Drive File Stream interface if it's possible.
create
If you check the documentation for files.create You will find that acceptable metadata for file creation does include a createdTime
You should then just add this to the metadata you use when uploading the file. As you did not post your code I have grabbed the standard example from the documentation and added the created time as follows.
file_metadata = {'name': 'photo.jpg', 'createdTime': 'THETIME'}
media = MediaFileUpload('files/photo.jpg', mimetype='image/jpeg')
file = drive_service.files().create(body=file_metadata,
media_body=media,
fields='id').execute()
print 'File ID: %s' % file.get('id')
Update
In the event that you want to update the ones you have already created you could use the following method.
If you check the documentation for file create you will find that the response is just a File resource
If you check file resource you will see that CreatedTime is write able.
You should run a file.update and reset the createdTime to the proper time.

Flask: How to read "file" from request.files multiple times?

For flask web app, I know I can't read a "file" multiple times from request.files because it's a stream. So when I read it once, I'll empty it. But I need to use the "file" multiple times without saving it locally and I'm having trouble doing it.
For example, from this
image = request.files["image"]
I'd like to have something like
image2 = image.copy
and perform different operations on image and image2.
Can someone please help me with this?
image = request.files["image"]
# Seek the pointer to the beginning of the file to read again
request.files["image"].seek(0)
After reading a file just run "f.stream.seek(0)" this points to the beginning of the file stream and then you are able to read the file from beginning again, you can simply put the following snippet in a loop and see it in action.
f.stream.seek(0)
stream = io.StringIO(f.stream.read().decode("UTF8"), newline=None)
reader = csv.reader(stream)
for row in reader:
print(row)

Xref table not zero-indexed. ID numbers for objects will be corrected. won't continue

I am trying to open a pdf to get the number of pages. I am using PyPDF2.
Here is my code:
def pdfPageReader(file_name):
try:
reader = PyPDF2.PdfReader(file_name, strict=True)
number_of_pages = len(reader.pages)
print(f"{file_name} = {number_of_pages}")
return number_of_pages
except:
return "1"
But then i run into this error:
PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
I tried to use strict=True and strict=False, When it is True, it displays this message, and nothing, I waited for 30minutes, but nothing happened. When it is False, it just display nothing, and that's it, just do nothing, if I press ctrl+c on the terminal (cmd, windows 10) then it cancel that open and continues (I run this in a batch of pdf files). Only 1 in the batch got this problem.
My questions are, how do I fix this, or how do I skip this, or how can I cancel this and move on with the other pdf files?
If somebody had a similar problem and it even crashed the program with this error message
File "C:\Programy\Anaconda3\lib\site-packages\PyPDF2\pdf.py", line 1604, in getObject
% (indirectReference.idnum, indirectReference.generation, idnum, generation))
PyPDF2.utils.PdfReadError: Expected object ID (14 0) does not match actual (13 0); xref table not zero-indexed.
It helped me to add the strict argument equal to False for my pdf reader
pdf_reader = PdfReader(input_file, strict=False)
For anybody else who may be running into this problem, and found that strict=False didn't help, I was able to solve the problem by just re-saving a new copy of the file in Adobe Acrobat Reader. I just opened the PDF file inside an actual copy of Adobe Acrobat Reader (the plain ol' free version on Windows), did a "Save as...", and gave the file a new name. Then I ran my script again using the newly saved copy of my PDF file.
Apparently, the PDF file I was using, which were generated directly from my scanner, were somehow corrupt, even though I could open and view it just fine in Reader. Making a duplicate copy of the file via re-saving in Acrobat Reader somehow seemed to correct whatever was missing.
I had the same problem and looked for a way to skip it. I am not a programmer but looking at the documentation about warnings there is a piece of code that helps you avoid such hindrance.
Although I wouldn't recomend this as a solution, the piece of code that I used for my purpose is (just copied and pasted it from doc on link)
import sys
if not sys.warnoptions:
import warnings
warnings.simplefilter("ignore")
This happens to me when the file was created in a printer / scanner combo that generates PDFs. I could read in the PDF with only a warning though so I read it in, and then rewrote it as a new file. I could append that new one.
from PyPDF2 import PdfMerger, PdfReader, PdfWriter
reader = PdfReader("scanner_generated.pdf", strict=False)
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
with open("fixedPDF.pdf", "wb") as fp:
writer.write(fp)
merger = PdfMerger()
merger.append("fixedPDF.pdf")
I had the exact same problem, and the solutions did help but didn't solve the problem completely, at least the one setting strict=False & resaving the document using Acrobat reader.
Anyway, I still got a stream error, but I was able to fix it after using an PDF online repair. I used sejda.com but please be aware that you are uploading your PDF on some website, so make sure there is nothing sensible in there.

Resources