how to fix an error during uploading custom dataset in to colab? - python-3.x

I have followed all the steps described in most tutorials on how to upload your custom dataset into google colab. but I am getting an error which I try a lot to fix but not working.
I am trying to train a CNN model using my custom dataset. I try to upload it on colab using the code snippet given in most tutorials.
the following error is displayed when I run the code snippet
Downloading zip file
---------------------------------------------------------------------------
HttpError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pydrive/files.py in FetchMetadata(self, fields, fetch_all)
236 fields=fields)\
--> 237 .execute(http=self.http)
238 except errors.HttpError as error:
6 frames
HttpError: <HttpError 404 when requesting https://www.googleapis.com/drive/v2/files/https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D1RqLx88tx2FCV0Z3CHsqVtx7S3_ffE-UW?alt=json returned "File not found: https://drive.google.com/open?id=1RqLx88tx2FCV0Z3CHsqVtx7S3_ffE-UW">
During handling of the above exception, another exception occurred:
ApiRequestError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/pydrive/files.py in FetchMetadata(self, fields, fetch_all)
237 .execute(http=self.http)
238 except errors.HttpError as error:
--> 239 raise ApiRequestError(error)
240 else:
241 self.uploaded = True
ApiRequestError: <HttpError 404 when requesting https://www.googleapis.com/drive/v2/files/https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D1RqLx88tx2FCV0Z3CHsqVtx7S3_ffE-UW?alt=json returned "File not found: https://drive.google.com/open?id=1RqLx88tx2FCV0Z3CHsqVtx7S3_ffE-UW">
#this is the code snippet I have taken from tutorials to upload dataset to google colab.
!pip install -U -q PyDrive
# Insert your file ID
# Get it by generating a share URL for the file
# An example : https://drive.google.com/file/d/1iz5JmTB4YcBvO7amj3Sy2_scSeAsN4gd/view?usp=sharing
zip_id = 'https://drive.google.com/open?id=1RqLx88tx2FCV0Z3CHsqVtx7S3_ffE-UW'
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import zipfile, os
# 1. Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
if not os.path.exists('MODEL'):
os.makedirs('MODEL')
# 2. Download Zip
print ("Downloading zip file")
myzip = drive.CreateFile({'id': zip_id})
myzip.GetContentFile('model.zip')
# 3. Unzip
print ("Uncompressing zip file")
zip_ref = zipfile.ZipFile('model.zip', 'r')
zip_ref.extractall('MODEL/')
zip_ref.close()

OMG. after a long hour (almost 8 hours) researching on the internet and brainstorming i found the answer. if any one who is new working on colab and face a similar error here is how i solved this error. The problem on the above code is the way we assign the file id. zip_id = 'https://drive.google.com/open?id=1RqLx88tx2FCV0Z3CHsqVtx7S3_ffE-UW'. most of the tutorial I have seen told us just to take the file id by right clicking the file in google drive and copy the share link address. but the file id is not the whole thing we copied. the file id is only after the id= which is in my case 1RqLx88tx2FCV0Z3CHsqVtx7S3_ffE-UW. After giving this as an id the error is gone. Hope this response will help other colab starters.

Related

Why am I getting a keyerror when trying to import the requests module? KeyError: 'requests'

I don't understand why I am getting a KeyError. When running my code through the idle debugger.The directory with the requests module/folder is added to the PATH. Installing with pip shows that all of the dependencies are installed, and the module shows up when I run pip freeze. I'm not sure what the problem is here... Any help would be greatly appreciated.
import requests
def get_btc_price():
# Make the API call to retrieve the current price of Bitcoin
response = requests.get(
"https://api.kucoin.com/api/v1/market/orderbook/level2_100?symbol=BTC-USDT"
)
# Check if the API call was successful
if response.status_code == 200:
# Parse the JSON response
data = response.json()
# Return the current price of Bitcoin
return print(float(data["data"]["bids"][0][0]))
# If the API call was unsuccessful, return 0
return 0
get_btc_price
I think there was a conflict with an old version of Python... I deleted all PATH variables related to Python, then did an uninstall/reinstall of Python and the issue is resolved.

How to catch an exception and get exception type in Python?

[Question] I need user to provide GDrive link to their kaggle.json file so I can set it up, when the script is run first time without kaggle.json file it throws an error and I am trying to handle it but without any success, my first question is same as title of this post, and second is does this make any sense all of this? Is there a better way to do this?
[Background] I am trying to write a script that acts as an interface providing limited access to functionalities of Kaggle library so that it runs in my projects and still being able to share it on GitHub so that others can use it in similar projects, I will bundle this along with configuration management tool or with shell script.
This is the code:
#!/usr/bin/env python3
import os
import sys
import traceback
import gdown
import kaggle
import argparse
"""
A wrapper around the kaggle library that provides limited access to the kaggle library
"""
#*hyperparameters
dataset = 'roopahegde/cryptocurrency-timeseries-2020'
raw_data_folder = './raw'
kaggle_api_file_link = None
#*argument parser
parser = argparse.ArgumentParser(description="download dataset")
parser.add_argument('--kaggle_api_file', type=str, default=kaggle_api_file_link, help="download and set kaggle API file [Gdrive link]")
parser.add_argument("--kaggle_dataset", type=str,
default=dataset, help="download kaggle dataset using user/datasets_name")
parser.add_argument("--create_folder", type=str, default=raw_data_folder, help="create folder to store raw datasets")
group = parser.add_mutually_exclusive_group()
group.add_argument('-preprocess_folder', action="store_true", help="create folder to store preprocessed datasets")
group.add_argument('-v', '--verbose', action="store_true", help="print verbose output")
group.add_argument('-q', '--quiet', action="store_true", help="print quiet output")
args = parser.parse_args()
#*setting kaggle_api_file
if args.kaggle_api_file:
gdown.download(args.kaggle_api_file, os.path.expanduser('~'), fuzzy=True)
#*creating directories if not exists
if not os.path.exists(args.create_folder):
os.mkdir(args.create_folder)
if not os.path.exists('./preprocessed') and args.preprocess_folder:
os.mkdir('./preprocessed')
def main():
try:
#*downloading datasets using kaggle.api
kaggle.api.authenticate()
kaggle.api.dataset_download_files(
args.kaggle_dataset, path=args.create_folder, unzip=True)
kaggle.api.competition_download_files
#*output
if args.verbose:
print(
f"Dataset downlaoded from https://www.kaggle.com/{args.kaggle_dataset} in {args.create_folder}")
elif args.quiet:
pass
else:
print(f"Download Complete")
except Exception as ex:
print(f"Error occured {type(ex)} {ex.args} use flag --kaggle_api_file to download and set kaggle api file")
if __name__ == '__main__':
sys.exit(main())
I tried to catch IOError and OSError instead of catching generic Exception, still no success. I want to print a message telling user to use --kaggle_api_file flag to set up kaggle.json file.
This is the error:
python get_data.py
Traceback (most recent call last):
File "get_data.py", line 7, in <module>
import kaggle
File "/home/user/.local/lib/python3.8/site-packages/kaggle/__init__.py", line 23, in <module>
api.authenticate()
File "/home/user/.local/lib/python3.8/site-packages/kaggle/api/kaggle_api_extended.py", line 164, in authenticate
raise IOError('Could not find {}. Make sure it\'s located in'
OSError: Could not find kaggle.json. Make sure it's located in /home/user/.kaggle. Or use the environment method.

403 permission error when executing from command line client on Bigquery

I have set-up gcloud in my local system. I am using Python 3.7 to insert records in big-query dataset situated in projectA. So I try it from command line client with the project set to projectA. The first command I give is to get authenticated
gcloud auth login
Then I use Python 3 and get into Python mode, and I give the following commands:
from googleapiclient.discovery import build
from google.cloud import bigquery
import json
body={json input} //pass the json string here
bigquery = build('bigquery', 'v2', cache_discovery=False)
bigquery.tabledata().insertAll(projectId="projectA",datasetId="compute_reports",tableId="compute_snapshot",body=body).execute()
I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
return wrapped(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/googleapiclient/http.py", line 915, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://bigquery.googleapis.com/bigquery/v2/projects/projectA/datasets/compute_reports/tables/compute_snapshot/insertAll?alt=json returned "Access Denied: Table projectA:compute_reports.compute_snapshot: User does not have bigquery.tables.updateData permission for table projectA:compute_reports.compute_snapshot."
I am executing it as a user with role/Owner and BigQueryDataOwner permissions for the project and also added DataEditor to the dataset also, which has these permissions including:
bigquery.tables.update
bigquery.datasets.update
Still I am getting this error.
Why with my credentials am I still not able to execute insert in the big-query?
The error lies in the permissions, so the service account which was used by the python run-time, which is the default service account as set in the bash profile did not have the Bigquery dataeditor access for projectA. Once I gave the access it started working

Creating a Spark RDD from a file located in Google Drive using Python on Colab.Research.Google

I have been successful in running Python 3 / Spark 2.2.1 program in Google's Colab.Research platform :
!apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.2.1/spark-2.2.1-bin-hadoop2.7.tgz
!tar xf spark-2.2.1-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.2.1-bin-hadoop2.7"
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
this works perfectly when I uploaded text files from my local computer to the Unix VM using
from google.colab import files
datafile = files.upload()
and read them as follows :
textRDD = spark.read.text('hobbit.txt').rdd
so far so good ..
My problem starts when I am trying to read a file that is lying in my Google drive colab directory.
Following instructions I have authenticated user and created a drive service
from google.colab import auth
auth.authenticate_user()
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
after which I have been able to access the file lying in the drive as follows :
file_id = '1RELUMtExjMTSfoWF765Hr8JwNCSL7AgH'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
downloaded.seek(0)
print('Downloaded file contents are: {}'.format(downloaded.read()))
Downloaded file contents are: b'The king beneath the mountain\r\nThe king of ......
even this works perfectly ..
downloaded.seek(0)
print(downloaded.read().decode('utf-8'))
and gets the data
The king beneath the mountain
The king of carven stone
The lord of silver fountain ...
where things FINALLY GO WRONG is where I try to grab this data and put it into a spark RDD
downloaded.seek(0)
tRDD = spark.read.text(downloaded.read().decode('utf-8'))
and I get the error ..
AnalysisException: 'Path does not exist: file:/content/The king beneath the mountain\ ....
Evidently, I am not using the correct method / parameters to read the file into spark. I have tried quite a few of the methods described
I would be very grateful if someone can help me figure out how to read this file for subsequent processing.
A complete solution to this problem is available in another StackOverflow question that is available at this URL.
Here is the notebook where this solution is demonstrated.
I have tested it and it works!
It seems that spark.read.text expects a file name. But you give it the file content instead. You can try either of these:
save it to a file then give the name
use just downloaded instead of downloaded.read().decode('utf-8')
You can also simplify downloading from Google Drive with pydrive. I gave an example here.
https://gist.github.com/korakot/d56c925ff3eccb86ea5a16726a70b224
Downloading is just
fid = drive.ListFile({'q':"title='hobbit.txt'"}).GetList()[0]['id']
f = drive.CreateFile({'id': fid})
f.GetContentFile('hobbit.txt')

OSError: broken data stream when reading image file

I am trying to read an image file using Image package of Keras.
Here is my code.
from keras.preprocessing import image
img_path = 'test/test_image.jpg' # This is an image I took in my kitchen.
img = image.load_img(img_path, target_size=(224, 224))
When I run the code, I get the following error.
anaconda3/lib/python3.5/site-packages/PIL/ImageFile.py in load(self)
238 if not self.map and not LOAD_TRUNCATED_IMAGES and err_code < 0:
239 # still raised if decoder fails to return anything
--> 240 raise_ioerror(err_code)
241
242 # post processing
anaconda3/lib/python3.5/site-packages/PIL/ImageFile.py in raise_ioerror(error)
57 if not message:
58 message = "decoder error %d" % error
---> 59 raise IOError(message + " when reading image file")
60
61
OSError: broken data stream when reading image file
Please note, if I convert test_image.jpg to test_image.png, then the given code works perfectly. But I have several thousands of pictures and I can't convert all of them to png format. I tried several things after searching for solution in web but couldn't get rid of the problem.
Any help would be appreciated!
Use this at the beginning of your code:
from PIL import Image, ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
I found it here. And this is working for me.
According to here Pillow upgrade by pip install Pillow --upgrade should solve this issue.
If you are still facing the problem you can use mogrify to batch convert all your images. mogrify -format png *.jpg

Resources