how to compress uploaded files in django using 7zip subprocess.call method? - python-3.x

im trying to compress uploaded files in django using 7zip.
i have successfully implemented compression and decompression of files using 7zip in python but i am not able to figure out how to integrate the same with django uploaded file so that whenever a file is uploaded ,a 7zp format for the same file is create and stored in the disk.
Code used in python for compression :
import subprocess
from py7zr import unpack_7zarchive
import shutil
exe = r"C:\Program Files\7-Zip\7zG.exe"
source = r"C:\profiles\Living in the Light_ A guide to personal transformation.pdf"
target = r"C:\profiles\Living1.7z"
def compress(source,traget):
subprocess.call(exe + " a -t7z \"" + target + "\" \"" + source + "\" -mx=9")
print('file compressed')
def uncompress(target):
shutil.register_unpack_format('7zip', ['.7z'], unpack_7zarchive)
shutil.unpack_archive(target, r'C:\Users\098an\Pictures\Camera Roll')
print('file uncompressed')

I believe there are Django-specific tools that you must use other than 7-zip. One problem using 7-zip on the web is that, if the client's computer doesn't have it installed, your server won't run correctly.
I know two ways to work this out: if you are simply looking to compress images, you can refer to this dev.to article on compressing images specifically. The other way to work this out is requiring users to upload in the form of zip files as described here, but change the file extensions to .zip, .rar, or whatever you deem necessary
EDIT
There are ways to automatically zip all types of files such as this pypi package, but I'm not sure how those will work out as it's not a mainstream package. If you want to conserve upload space, try setting a size bound instead.

Related

Downloading S3 files in Google Colab

I am working on a project and it happens that some data is provided in form of S3fileSystem. I can read that data using S3FileSystem.open(path). But there are more than 360 files and it takes atleast 3 minutes to read a single file. I was wondering, is there any way of downloading these files in my system and read them from there, instead of reading it directly from S3fileSystem. There is another reason, although I can read all those files but once my session on colab reconnects I have to re-read all those files again, hence it will take a lot of time. I am using following code to read files
fs_s3 = s3fs.S3FileSystem(anon=True)
s3path = 'file_name'
remote_file_obj = fs_s3.open(s3path, mode='rb')
ds = xr.open_dataset(remote_file_obj, engine= 'h5netcdf')
Is there any way of downloading those files?
You can use another s3fs to mount the bucket, then copy the files to Colab.
how to mount
After mounting, you can
!cp /s3/yourfile.zip /content/

Use images in s3 with SageMaker without .lst files

I am trying to create (what I thought was) a simple image classification pipeline between s3 and SageMaker.
Images are stored in an s3 bucket with their class labels in their file names currently, e.g.
My-s3-bucket-dir
cat-1.jpg
dog-1.jpg
cat-2.jpg
..
I've been trying to leverage several related example .py scripts, but most seem to be download data sets already in .rec format or containing special manifest or annotation files I don't have.
All I want is to pass the images from s3 to the SageMaker image classification algorithm that's located in the same region, IAM account, etc. I suppose this means I need a .lst file
When I try to manually create the .lst it doesn't seem to like it and it also takes too long doing manual work to be a good practice.
How can I automatically generate the .lst file (or otherwise send the images/classes for training)?
Things I read made it sound like im2rec.py was a solution, but I don't see how. The example I'm working with now is
Image-classification-fulltraining-highlevel.ipynb
but it seems to download the data as .rec,
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-train.rec')
download('http://data.mxnet.io/data/caltech-256/caltech-256-60-val.rec')
which just skips working with the .jpeg files. I found another that converts them to .rec but again it has essentially the .lst already as .json and just converts it.
I have mostly been working in a Python Jupyter notebook within the AWS console (in my browser) but I have also tried using their GUI.
How can I simply and automatically generate the .lst or otherwise get the data/class info into SageMaker without manually creating a .lst file?
Update
It looks like im2py can't be run against s3. You'd have to completely download everything from all s3 buckets into the notebook's storage...
Please note that [...] im2rec.py is running locally,
therefore cannot take input from the S3 bucket. To generate the list
file, you need to download the data and then use the im2rec tool. - AWS SageMaker Team
There are 3 options to provide annotated data to the Image Classification algo: (1) packing labels in recordIO files, (2) storing labels in a JSON manifest file ("augmented manifest" option), (3) storing labels in a list file. All options are documented here: https://docs.aws.amazon.com/sagemaker/latest/dg/image-classification.html.
Augmented Manifest and .lst files option are quick to do since they just require you to create an annotation file with a usually quick for loop for example. RecordIO requires you to use im2rec.py tool, which is a little more work.
Using .lst files is another option that is reasonably easy: you just need to create annotation them with a quick for loop, like this:
# assuming train_index, train_class, train_pics store the pic index, class and path
with open('train.lst', 'a') as file:
for index, cl, pic in zip(train_index, train_class, train_pics):
file.write(str(index) + '\t' + str(cl) + '\t' + pic + '\n')

Watson Data Platform how to unzip the zip file in the data assets

How to unzip the zip file in the data assets of the Watson Data Platform?
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
zip_ref.extractall(WHICH DIRECTORY FOR THE DATA ASSETS)
zip_ref.close()
streaming_body_1 is the zip file streaming body object in the DATA ASSETS section. I uploaded the zip file to the DATA ASSETS.
How can I unzip the zip file in the Data Assets?
Since I don't know the exact Key Path of the DATA ASSETS section.
I am trying to do this in the jupyter notebook of the project.
Thank you!
When you upload a file to your project it is stored in the project's assigned cloud storage, which should now be Cloud Object Storage by default. (Check your project settings.) To work with uploaded files (which are just one type of data asset, there are others) in a notebook you'll have to first download it from the cloud storage to make it accessible in the kernel's file system and then perform the desired file operation (e.g. read, extract, ...)
Assuming you've uploaded your ZIP file you should be able to generate code that reads the ZIP file using the tooling:
click the 1010 (Data icon) on the upper right hand side
select "Insert to code" > "Insert StreamingBody object"
consume the StreamingBody as desired
I ran a quick test and it worked like a charm:
...
# "Insert StreamingBody object" generated code
...
from io import BytesIO
import zipfile
zip_ref = zipfile.ZipFile(BytesIO(streaming_body_1.read()), 'r')
print zip_ref.namelist()
zip_ref.close()
Edit 1: If your archive is a compressed tar file use the following code instead:
...
# "Insert StreamingBody object" generated code
...
import tarfile
from io import BytesIO
tf = tarfile.open(fileobj=BytesIO(streaming_body_1.read()), mode="r:gz")
tf.getnames()
Edit 2: To avoid the read timeout you'll have to change the generated code from
config=Config(signature_version='oauth'),
to
config=Config(signature_version='oauth',connect_timeout=50, read_timeout=70),
With those changes in place I was able to download and extract training_data.tar.gz from the repo you've mentioned.

I want to add addtional files to existing archives (ZIP / RAR) or have the files added when compressing

I know how to do this for one archive at a time, but I want to add files, to multiple archives, in the same folder, simultaneously; if that is possible. I understand that I can do this with a batch file... but I don't know how to write the script / text.
So... I have several zip files in one folder. I want to add a specific text file and a specific image file to each/all of those zips. I don't want any other modifications of the zip files.
Or... is there a way to set WinRAR so that specific files will be automatically added whenever an archive is created?
Thanks
import zipfile
z = zipfile.ZipFile('cal.zip', mode='a', compression=zipfile.ZIP_DEFLATED)
z.write('/your/file/path') # or, z.writestr('your-filename', 'file-content')
z.close()

Zip Directory with Python

I'm trying to zip bunch of folders individually. The folders contain files. I wrote a script that seems to work perfectly, except that the resulting zip files are not actually compressed. THey're the same size as the original directory!
Here is my code:
import os, zipfile
workspace = "C:\\ziptest"
dirList = os.listdir(workspace)
def zipDir(path, zip):
for root, dirs, files in os.walk(path):
for file in files:
zip.write(os.path.join(root, file))
for item in dirList:
zip = zipfile.ZipFile('%s.zip' % item, 'w')
zipDir('C:\\ziptest\%s' % item, zip)
zip.close()
I'm not a Python expert, but a quick lookup shows that there is another argument for zip.write such as zipfile.ZIP_DEFLATED. I grabbed that from here. I quote:
The third, optional argument to the write method controls what compression method to use. Or rather, it controls whether data should be compressed at all. The default is zipfile.ZIP_STORED, which stores the data in the archive without any compression at all. If the zlib module is installed, you can also use zipfile.ZIP_DEFLATED, which gives you “deflate” compression.
The reference is here. Look for the constant ZIP_DEFLATED; it's definition:
The numeric constant for the usual ZIP compression method. This requires the zlib module. No other compression methods are currently supported.
I suppose that means that only default compression is supported... hope that helps!
Is there any reason you don't just call the shell command, like
def zipDir(path, zip):
subprocess.Popen('7z a -tzip %s %s'%(path, zip))

Resources