How do I unzip large files (>1Gb) from my Drive into Colab? - zip

I tried to unzip large zipfiles from my Drive into my Colab and got this error:
BadZipFile: zipfiles that span multiple disks are not supported
How do I unzip large files from Drive into Colab?

Im doing that with two options:
!unzip file_location -d file_destination #-d is for quite option.
Or other option more elaborate:
import zipfile
from google.colab import drive
drive.mount('/content/drive/')
zip_ref = zipfile.ZipFile("/content/drive/My Drive/ML/DataSet.zip", 'r')
zip_ref.extractall("/tmp")
zip_ref.close()

Related

Unziping zipped file in google colab

i ma trying to unzip a zip file in google colab and i get this Error
Archive: object_detection.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of object_detection.zip or
object_detection.zip.zip, and cannot find object_detection.zip.ZIP, period.
from google.colab import drive
drive.mount('/content/drive')
!cp '/content/drive/My Drive/slim.zip' slim.zip
!unzip object_detection.zip
i already uploded my zip files on the drive
Probable reason could be your zip file is corrupt.. may be you are not using proper link.. try downloading the file and then uploading on colab.. it should work.
I had the same problem and it was because I had not uploaded the file completely

How to access files downloaded from kaggle into a Colaboratory notebook?

I am having some difficulties with manipulating multiple files in a Colaboratory Notebook downloaded to the /content directory in my google drive. So far, I have successfully downloaded and extracted a kaggle dataset to a Colaboratory Notebook using the following code:
!kaggle datasets download -d iarunava/cell-images-for-detecting-malaria -p /content
!unzip \cell-images-for-detecting-malaria.zip
I was also able to use Pillow to import a single file from the dataset into my Colaboratory session (I obtained the filename from the output produced during the extraction):
from PIL import Image
img = Image.open('cell_images/Uninfected/C96P57ThinF_IMG_20150824_105445_cell_139.png')
How can I access multiple extracted files from /content without knowing their names in advance?
Thank you!
After some further experimentation, I found that the python os module works similarly in Colab Notebooks as it does on an individual computer. For example, in a Colab Notebook the command
os.getcwd()
returns '/content' as an output.
Also, the command os.listdir() returns the names of all the files I downloaded and extracted.
You can use glob. glob.glob(pattern) will match all files that match the pattern. For example the code bellow will read all the .png files in the image_dir.
png = glob.glob(os.path.join(img_dir, '*.png'))
png = np.array(png)
png will contain a list of filenames.
In your case you can use:
png = glob.glob('cell_images/Uninfected/*.png')
png = np.array(png)

cant unzip file in GCP jupyterlab notebook: 'End-of-central-directory signature not found.'

I am running python 3
I am trying to unzip my train.zip dataset on a GCP notebook instance, however whenever I try to I get a various error:
First I tried:
import os
from zipfile import ZipFile
os.path.exists('/home/jupyter/kerasProject/original_dataset/train.zip')
True
zip_file = ZipFile('/home/jupyter/kerasProject/original_dataset/train.zip')
Error:
BadZipFile: File is not a zip file
Then I tried:
!unzip '/home/jupyter/kerasProject/original_dataset/train.zip'
Error:
Archive: /home/jupyter/kerasProject/original_dataset/train.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /home/jupyter/kerasProject/original_dataset/train.zip or
/home/jupyter/kerasProject/original_dataset/train.zip.zip, and cannot find /home/jupyter/kerasProject/original_dataset/train.zip.ZIP, period.
For some reason in both cases it does not recogise the file as a .zip file, I tried reuploading it but it doesn't help, I can unzip it on my mac no problems so I don't think it can be corrupt, what have I done wrong?

How to extract a zip file using python 3

How to extract a zip file using python when zip file present in different directory where script file present.
I try this ,but i got error because source path is not accepted ,try to solve me this problem.
from zipfile import ZipFile
def func(source, target):
with ZipFile('source', 'target'):
ZipFile.Extractall('target')
Use this code. To move through directories you could either hard code the directory where your script is present or you could toggle through the directories using simple commands such as "../" to move out of the given directory or "/" to move inside a folder in the directory. For example - "../script.py" or "/folder/script.py". Similarly you can use this to find your .zip file.
import zipfile
with zipfile.ZipFile("file.zip","r") as zip_ref:
zip_ref.extractall("targetdir")
For just unpacking, shutil should suffice:
import shutil
shutil.unpack_archive('path-to-zipfile')
You'll have to check for source path of the zip file which is relative to your current working directory. To know your current working directory you can try
import os
print(os.getcwd())
zip - Unzipping files in python
relative-paths-in-python

Error in colab in finding directory and file

img = load_img('train/cat/cat.12499.jpg')
I'm using colab with the same code that I used in Jupyter, but now I get the below error, although the directory and file there are in:
No such file or directory: 'train/cat/cat.12499.jpg'
I copied the directory in google drive!
I solved with these commands
from google.colab import drive
drive.mount('/content/drive')
and I've modified the code in this way:
img = load_img("drive/My Drive/app/train/cat/cat.12499.jpg") # t

Resources