Google Colaboratory : OSError: [Errno 5] Input/output error

Google Colaboratory : OSError: [Errno 5] Input/output error - python-3.x

I am using Google Colaboratory, and mounting Google Drive. When I access a csv file, it gets me the following error:
OSError: [Errno 5] Input/output error.
This did not happen before.
How can I access to the csv file as I used to?
I have tried this, but did not work:
Input/output error while using google colab with google drive
This happened after conducting the following code.
for segment_id in tqdm(range(segment_num)):
with h5py.File(os.path.join(INPUT_PATH, "train.h5"), "r") as f:
train_answers.append(f['time_to_failure'][segment_id*segment_interval + SEGMENT_LENGTH])
The tqdm bar progressed until 37%, and than gave the following error.
OSError: Unable to open file (file read failed: time = Thu May 2 14:14:09 2019
, filename = './drive/My Drive/Kaggle/LANL-Earthquake-Prediction/input/train.h5', file descriptor = 74, errno = 5, error message = 'Input/output error', buf = 0x7ffc31926d00, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
Since then, large files like train.csv(9GB), which is on Google Drive cannot be read from Google Colaboratory. It gives the following error.
OSError: [Errno 5] Input/output error
Does anyone have a same problem?
Does anyone know how to solve this?

There are quota set by Google which are not necessary shown while using Colab. I have run in the same problem. Basically, once the limit is passed you get the [Errno 5] Input/output error independent on the file or the operation you were doing.
The problem seems to be solved since I asked to increase the quota regarding storage (limited to 1 TB total per we).
You access the quota page by visiting this page and clicking on quota:
https://cloud.google.com/docs/quota
If you don't ask to increase the quota, you might have to wait for 7-14 days until your usage is set back to 0 and can use the full quota.
I hope this helps!

I've encounter the same error (during too intensive testing of transfer learning). According to Google the reason may be in too many I/O operations with small files or due to shared and more intensively used resources - every reason related to usage of Google drive. Mostly after 1 day the quota should be refreshed.
You may also try another solution (for impatient users like me) - copy your resources (in my case a zipped folder data containing folders train and validation with images) as a zip file to your Google drive and then unzip it directly into Colab VM by use of:
!unzip -qq '/content/grive/My Drive/CNN/Datafiles/data.zip'
You can then access the data from folder /content/data/... (and say Goodbye to the I/O Error ;) )

Related

Unable to retrieve data after using dill or pickle

I dumped a Jupyter Notebook session using dill.dump_session(filename), and at one point it told me that the disk storage was full. However, I made some space on the disk and tried again. Now, I am unable to load back the session using, dill.load_session(filename).
I get the following error:
~/.local/lib/python3.6/site-packages/dill/_dill.py in load_session(filename, main)
408 unpickler._main = main
409 unpickler._session = True
--> 410 module = unpickler.load()
411 unpickler._session = False
412 main.__dict__.update(module.__dict__)
EOFError: Ran out of input
And the file (i.e. filename) is about 30 gigs in size of data.
How can I retrieve my data from the file?
BTW, I’m running all this on Google Cloud, and it’s costing me a fortune to keep the instance up and running.
I have tried using undill, and other unpickle methods.
For example I tried this:
open(file, 'a').close()
try:
with open(file, "rb") as Score_file:
unpickler = pickle.Unpickler(Score_file)
scores = unpickler.load()
return scores
But got this error:
`6 with open(file, "rb") as Score_file:
7 unpickler = pickle.Unpickler(Score_file);
----> 8 scores = unpickler.load();
9
10 return scores
ModuleNotFoundError: No module named '__builtin__'`

I know this probably isn't the answer you want to hear, but... it sounds like you may have a corrupt pickle file. If that's the case, you can get the data back only if you edit it by hand, and can understand what the pickled strings are and how they are structured. Note that there are some very rare cases that an object will dump, but not load -- however, it's much more likely you have a corrupt file. Either way, the resolution is the same... a hand edit is the only way to potentially save what you have pickled.
Also, note that if you use dump_session, you really should use load_session (as it does a sequence of steps on top of a standard load, reversing what is done in dump_session) -- that's really irrelevant for the issue however, the issue likely is having an incomplete or corrupt pickle file.

Loading local data google colab

I have a npy file, (largeFIle.npy) saved in the same "colab notebooks" folder on my google drive that I have my google colab notebook saved in. I'm trying to load the data into my notebook with the code below but I'm getting the error below. This code works fine when I run it locally on my laptop with the notebook in the same folder as the file. Is there something different I need to do when loading data with notebooks in google colab? I'm very new to colab.
code:
dataset_name = 'largeFIle.npy'
dataset = np.load(dataset_name, encoding='bytes')
Error:
FileNotFoundError Traceback (most recent call last)
<ipython-input-6-db02a0bfcf1d> in <module>()
----> 1 dataset = np.load(dataset_name, encoding='bytes')
/usr/local/lib/python3.6/dist-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
370 own_fid = False
371 if isinstance(file, basestring):
--> 372 fid = open(file, "rb")
373 own_fid = True
374 elif is_pathlib_path(file):
FileNotFoundError: [Errno 2] No such file or directory: 'largeFIle.npy'

When you launch a new notebook on colab, it connects you with a remote machine for 12 hours and all you have there is the notebook and preloaded functions. To access your folders on drive, you need to connect the remote instance to your drive and authenticate it.
This thing bugged me for sometime when I was beginning too, so I'm creating a gist and I'll update it as I learn more. For your case, check out section 2 (Connecting with Drive). You don't have to edit or understand anything, just copy the cell and run it. It will run a bunch of functions and then give you an authentication link. You need to go to that link and sign-in with Google, you'll get an access token there. Put it back in the input box and press Enter. If it doesn't work or if there's some error, run the cell again.
In the next part I mount my drive to the folder '/drive'. So now, everything that's on your drive exists in this folder, including your notebook. Next, you can change your working directory. For me, I'm keeping all my notebooks in '/Colab' folder, edit it accordingly.
Hope it helps you. Feel free to suggest me edits to the gist as you learn more. :)

Have you set up your google drive with google colab with this method. After mounting Google drive use below command for your problem (Assuming you have stored largeFIle.npy in Colab Notebook folder.)
dataset = np.load('drive/Colab Notebooks/largeFIle.npy, encoding='bytes')

Read NetCDF file from Azure file storage

I have uploaded a file to my Azure file storage account and created a SAS (shared access signature). Let's pretend the file in question is called fileA.nc
Now, with Python3, I am attempting to read fileA.nc:
from netCDF4 import Dataset
url ='https://<my-azure-resource-group>.file.core.windows.net/<some-file-share>/fileA.nc<SAS-token>';
dataset = Dataset(url)
print(dataset.variables.keys())
The above code does not work, instead giving me the following error:
Traceback (most recent call last): File "yadaYadaYada/test.py", line
8, in
dataset = Dataset(url) File "netCDF4/_netCDF4.pyx", line 1848, in netCDF4._netCDF4.Dataset.init (netCDF4/_netCDF4.c:13983)
OSError: NetCDF: Malformed or unexpected Constraint
This is line 8:
dataset = Dataset(url)
I know the URL provided works. If I paste it into the browser, the file downloads...
I have checked the netCDF4 documentation, which says this:
Remote OPeNDAP-hosted datasets can be accessed for reading over
http
if a URL is provided to the Dataset constructor instead of a filename.
However, this requires that the netCDF library be built with OPenDAP
support, via the --enable-dap configure option (added in version
4.0.1).
However, I have no idea how to tell if when Pycharms installed netcdf4, it used the --enable-dap argument, but I cannot imagine why it would not. Besides, if I stick in a url which points to some HTML, I get the HTML in the error dump and so from that I would think netcdf4 is actually trying to load a remote dataset and so the problem is somewhere else.
I'd really appreciate some help here. Maybe someone knows of another Python 3 netCDF library that will allow me to load my datasets from Azure?
UPDATE
Okay, I can now confirm that the python netcdf4 library does come with --OPenDAP enabled:
Hello again, netCDF4 1.0.4 with OpenDAP support is now available in
the conda respoitory on Unix. To install: $ conda install netcdf4
Ilan

I have found a solution. It turns out that you cannot read directly from an Azure File share, even though when you paste the link to a file in the browser, the file begins to download.
What I needed to do was to mount the File Share on my OS. In my case, I was using Windows but this can be done with Linux, too. The following code should be modified accordingly and then put into Command Prompt:
net use <drive-letter>: \\<storage-account-name>.file.core.windows.net\<share-name>
example :
net use z: \\samples.file.core.windows.net\logs
Once the File Share is mounted, you can read from it as if it were an external HDD. You may need to add permission, but I didn't.
Here is the link to the documentation for mounting the File Share: Documentation

How to resolve Vsam File status error code 93?

When I am trying to access a Vsam Sequential dataset(which is also opened in CICS) from batch, I use EXTEND mode to open the file and append some data to it.
Earlier it was working fine. All of a sudden , it is not working now and I am getting File status : 93 error code which means "Resource not available".
OPEN EXTEND <filename>
Foe KSDS datasets I have used EXCI(external CICS Interface) calls to access from batch even though it was opened in Online.
But I do not know how to do the same for ESDS.
Could someone help me to resolve this error.

Oracle Table to SAS Dataset

I am facing a problem in converting a large Oracle table to a SAS dataset. I did this earlier and the method worked. However, this time, it is giving me the following error messages.
SAS code:
option compress = yes;
libname sasdata ".";
libname myora oracle user=scott password=tiger path=XYZDATA ;
data sasdata.expt_tabl;
set myora.expt_tabl;
run;
Log file:
You are running SAS 9. Some SAS 8 files will be automatically converted
by the V9 engine; others are incompatible. Please see
http://support.sas.com/rnd/migration/planning/platform/64bit.html
PROC MIGRATE will preserve current SAS file attributes and is
recommended for converting all your SAS libraries from any
SAS 8 release to SAS 9. For details and examples, please see
http://support.sas.com/rnd/migration/index.html
This message is contained in the SAS news file, and is presented upon
initialization. Edit the file "news" in the "misc/base" directory to
display site-specific news and information in the program log.
The command line option "-nonews" will prevent this display.
NOTE: SAS initialization used:
real time 1.63 seconds
cpu time 0.03 seconds
1 option compress = yes;
2 libname sasdata ".";
NOTE: Libref SASDATA was successfully assigned as follows:
Engine: V9
Physical Name: /******/dibyendu
3 libname myora oracle user=scott password=XXXXXXXXXX path=XYZDATA ;
NOTE: Libref MYORA was successfully assigned as follows:
Engine: ORACLE
Physical Name: XYZDATA
4 data sasdata.expt_tabl;
5 set myora.expt_tabl;
6 run;
NOTE: There were 6422133 observations read from the data set MYORA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: File SASDATA.EXPT_TABL.DATA is damaged. I/O processing did not complete.
NOTE: The data set SASDATA.EXPT_TABL.DATA has 6422133 observations and 49 variables.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
2 The SAS System 21:40 Monday, April 1, 2013
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
NOTE: Compressing data set SASDATA.EXPT_TABL.DATA decreased size by 78.88 percent.
Compressed is 37681 pages; un-compressed would require 178393 pages.
ERROR: File SASDATA.EXPT_TABL.DATA is damaged. I/O processing did not complete.
NOTE: SAS set option OBS=0 and will continue to check statements. This might cause NOTE: No observations in data set.
NOTE: DATA statement used (Total process time):
real time 8:55.98
cpu time 1:39.33
7
ERROR: Errors printed on pages 1,2.
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time 8:58.67
cpu time 1:39.40
This is running on a RH Linux Server.
Any suggestion will be appreciated.
Thanks and regards,

This sounds like a space issue on your server. How large is the file system in your default directory (from your libname sasdata '.'; statement)? Use the data set option obs=1 on your Oracle table reference to create a new SAS dataset with one row and inspect the variables.
data sasdata.dummy_test;
set myora.expt_tabl(obs=1);
run;
Perhaps there are extremely large VARCHAR or BLOB columns that are consuming too much space. Remember that SAS does not have a VARCHAR type.

Though I am not totally sure, I believe the main issue was that I was initially trying to create/write the dataset in a directory, which was restricted in some (?) sense. This was indirectly causing trouble, since the dataset created was defective. When I created it elsewhere, it was okay.
Thanks and regards,
Dibyendu

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string