How can I load data from an external drive into a Jupyter notebook on MacOs? - python-3.x

I have running code in my home directory in Jupyter notebook. I'd like to import some data from an external drive, as the data is too large to be stored on my local drive. I am unable to import this data into my notebook.

If you can browse to the data using Finder, you should be able to load the data into your Jupyter notebook.
Your code to open a file in your notebook probably looks something like this:
with open("data.txt","r") as dataFile:
You need to replace data.txt with the path to the file you want to open.
One way to get the path name is to navigate to the file in Finder and then drag the file into your Terminal. The path to the file will then be pasted into the command prompt of the Terminal.
The path may look something like this: /Volumes/Remove/Data/bigfile.txt
You can also navigate to the file using the cd command in the Terminal.
You may even be able to drag the file directly into your Jupyter notebook to get the path.
Hope this helps!

Related

How to add a Notebook (.ipynb file) to Launcher?

I would like to add a new specific notebook or a file in the Launcher tab instead of a kernel.
What kind of ways are there?

Getting excel files to run on python

I have the following code in jupyter notebook, using python. I get the error when I run it saying "FileNotFoundError" but all of the files are in a labeled folder.
file_path=os.path.dirname(os.path.abspath("__file__"))
df=pd.read_csv(file_path+ "\\score_NFL.csv",encoding="utf-8")
teams=pd.read_csv(file_path+"\\nfl_teams.csv",encoding='utf-8')
games_elo=pd.read_csv(file_path+"\\nfl_games3.csv",encoding="utf-8")
games_elo18=pd.read_csv(file_path + "\\nfl_games_2019_1.csv",encoding="utf-8")
You don't want to have quotes around __file__. It refers to a special object created when running a script or using an imported module (see the answers here for details). Your first line should be
file_path=os.path.dirname(os.path.abspath(__file__))
However, you state in your question that you're attempting to run this code in a Jupyter notebook. In an instance like this, similar to using the REPL on the command line, __file__ is not defined because you're not running the script from a file - it's interactive.
This method will work if you save your code in a .py file and run it from the directory containing your CSV files. At the same time, if you're running the code from the same directory, you don't need to go through all the hassle of creating a full absolute path to the CSVs, you can simply use
df = pd.read_csv("score_NFL.csv", encoding="utf-8")
for example.

How do I write files to my local drive when using Google Colab?

I have a set of data I want to write to an Excel file. I'm using Google Colab, and while I successfully use the following snippet to import files into my notebook...
from google.colab import files
uploaded = files.upload()
...I can't seem to figure out how to write files to my local drive. I'm using the xlwt library to write the data to an Excel file, as follows:
wb.save('Washington_1791.xls')
It completes the task without throwing an error, but I'm not sure of the save location, and I need to get it to save to my local drive. Google gives me lots of suggestions on how to import from the local drive, but not how to save.
Any files written by your Python code will be on the disk of the virtual machine on which the code is running: if you want to get the files to your local disk, you will have to download them explicitly.
To do this, go to the Files tab in the left pane (click the (>) in the upper left to open the pane). Within that file viewer you can right-click any file and choose "Download".

How do I run a .py script?

I just started learning Python last week to automate some stuff I do (thanks to automatetheboringstuff.com). Assume I know nothing about programming. The only thing I know is HTML and CSS.
I created a simple automation workflow already and I want to improve not the code (maybe in the future because it's not yet finished) but how I can maintain my setup/program on two laptops -- Both Mac OS running on High Sierra.
I have a .py file that contains my automated workflow. I don't know where to place it. It currently resides in my Dropbox so i can use it on laptop1 and laptop2.
I also created a virtualenv for each machine and did the requirements.txt thing as well (just to prep for the future). The directory is on both username/python/project_name.
I read in some posts that these files and other resources can exist anywhere whether inside each virtualenv or not. And that it's just a preference. I also read that the virtualenv itself isn't recommended to be placed inside apps like Dropbox (that's why i separated it on each laptop).
I switch between both laptops frequently. The environment which contains the packages doesn't really concern me that much when switching. It's the other files that is bothering me. For example, there's an image I need, this has to be available on both laptops so my solution to this is to have a Resources folder inside Dropbox as well. It currently looks like this:
Dropbox
Projects
Project 1 files (images, etc.)
Project 2 files (images, etc.)
Workflows (this would contain my completed .py files)
I read some stuff about the virtualenvwrapper, but haven't looked at it yet. Maybe in the future when i do have more projects to manage. Because right now, it's just this one.
Lastly, I noticed that every time i open up Terminal and activate my virtualenv, the file directory is in Users/username
How can i set it to default to Dropbox/Projects/project_name? I always have to set it using the chdir(). That way, when i do have multiple projects (and virtualenv) i don't have to worry about where the files load/ save.
Finally, how do I run the .py script? If i open the IDLE, open the .py file there, and use f5, it runs properly. But as far as I know, that doesn't look into the virtualenv i setup. Is that correct?
I tried right-clicking, then Open With > Python Launcher the .py file. and i'm getting an error saying there are no modules found. It seems it's not loading the right virtualenv. So there must be something wrong with the file i made.
Then I read about the #! you place at the beginning of the .py files but i don't understand it. Can someone explain that further? Is that why my file isn't loading properly?
Thanks for helping out!
You can run .py scripts from the command line using:
python test.py
That tells terminal to run test.py in the python interpreter and send the output to your terminal, just like when you run it in the IDLE. If your .py script is not in your current directory and you don't want to change directories, you can access it using it's absolute path:
python /Users/username/Dropbox/Workflows/test.py
As long as you have already activated your virtualenv, it should run your script using only the libraries you have added to your virtualenv. Also, once your virtualenv is activated, you can move around directories using "cd" and it will bring your virtualenv with you.

PyCharm project path different from interactive session path

When running an interactive session, PyCharm thinks of os.getcwd() as my project's directory. However, when I run my script from the command line, PyCharm thinks of os.getcwd() as the directory of the script.
Is there a good workaround for this? Here is what I tried and did not like:
going to Run/Edit Configurations and changing the working directory manually. I did not like this solution, because I will have to do it for every script that I run.
having one line in my code that "fixes" the path for the purposes of interactive sessions and commenting it out before running from command line. This works, but feels wrong.
Is there a way to do this or is it just the way it is supposed to be? Maybe I shouldn't be trying to run random scripts within my project?
Any insight would be greatly appreciated.
Clarification:
By "interactive session" I mean being able to run each line individually in a Python/IPython Console
By "running from command line" I mean creating a script my_script.py and running python path_to_myscript/my_script.py (I actually press the Run button at PyCharm, but I think it's the same).
Other facts that might prove worth mentioning:
I have created a PyCharm project. This contains (among other things) the package Graphs, which contains the module Graph and some .txt files. When I do something within my Graph module (e.g. read a graph from a file), I like to test that things worked as expected. I do this by running a selection of lines (interactively). To read a .txt file, I have to go (using os.path.join()) from the current working directory (the project directory, ...\\project_name) to the module's directory ...\\project_name\\Graphs, where the file is located. However, when I run the whole script via the command line, the command reading the .txt file raises an Error, complaining that no file was found. By looking on the name of the file that was not found, I see that the full file name is something like this:
...\\project_name\\Graphs\\Graphs\\graph1.txt
It seems that this time the current working directory is ...\\project_name\\Graphs\\, and my os.path.join() command actually spoils it.
I user various methods in my python scripts.
set the working directory as first step of your code using os.chdir(some_existing_path)
This would mean all your other paths should be referenced to this, as you hard set the path. You just need to make sure it works from any location and your specifically in your IDE. Obviously, another os.chdir() would change the working directory and os.getcwd() would return the new working directory
set the working directory to __file__ by using os.chdir(os.path.dirname(__file__))
This is actually what I use most, as it is quite reliable, and then I reference all further paths or file operations to this. Or you can simply refer to as os.path.dirname(__file__) in your code without actually changing the working directory
get the working directory using os.getcwd()
And reference all path and file operations to this, knowing it will change based on how the script is launched. Note: do NOT assume that this returns the location of your script, it returns the working directory of the shell !!
[EDIT based on new information]
By "interactive session" I mean being able to run each line
individually in a Python/IPython Console
By running interactively line-by-line in a Python console, the __file__ is not defined, afterall: you are not executing a file. Hence you cannot use os.path.dirname(__file__) you will have to use something like os.chdir(some_known_existing_dir) to reference a path. As a programmer you need to be very aware of working directory and changes to this, your code should reflect that.
By "running from command line" I mean creating a script my_script.py
and running python path_to_myscript/my_script.py (I actually press the
Run button at PyCharm, but I think it's the same).
This, both executing a .py from command line as well as running in your IDE, will populate the __file__, hence you can use os.path.dirname(__file__)
HTH
I am purposely adding another answer to this post, in regards the following:
Other facts that might prove worth mentioning:
I have created a PyCharm project. This contains (among other things)
the package Graphs, which contains the module Graph and some .txt
files. When I do something within my Graph module (e.g. read a graph
from a file), I like to test that things worked as expected. I do this
by running a selection of lines (interactively). To read a .txt file,
I have to go (using os.path.join()) from the current working directory
(the project directory, ...\project_name) to the module's directory
...\project_name\Graphs, where the file is located. However, when I
run the whole script via the command line, the command reading the
.txt file raises an Error, complaining that no file was found. By
looking on the name of the file that was not found, I see that the
full file name is something like this:
...\project_name\Graphs\Graphs\graph1.txt It seems that this time
the current working directory is ...\project_name\Graphs\, and my
os.path.join() command actually spoils it.
I strongly believe that if a python script takes input from any file, that the author of the script needs to cater for this in the script.
What I mean is you as the author need to make sure you know the following regardless of how your script is executed:
What is the working directory
What is the script directory
These two you have no control over when you hand off your script to others, or run it on other peoples machines. The working directory is dependent on how the script is launched. It seems that you run on Windows, so here is an example:
C:\> c:\python\python your_script.py
The working directory is now C:\ if your_script.py is in C:\
C:\some_dir\another_dir\> c:\python\python.exe c:\your_script_dir\your_script.py
The working directory is now C:\some_dir\another_dir
And the above example may even give different results if the SYSTEM PATH variable is set to the path of the location of your_script.py
You need to ensure that your script works even if the user(s) of your script are placing this in various locations on their machines. Some people (and I don't know why) tend to put everything on the Desktop. You need to ensure your script can cope with this, including any spaces in the path name.
Furthermore, if your script is taking input from a file, the you as the author need to ensure that you can cope with changes in working directory, and changes of script directory. There are a few things you may consider:
Have your script input from a known (static) directory, something like C:\python_input\
Have your script input from a known (configurable) directory, use ConfigParser, you can search here on stackoverflow on many posts
Have your script input from a known directory related to the location of the script (using os.path.dirname(__file__))
any other method you may employ to ensure your script can get to the input
Ultimately this is all in your control, and you need to code to ensure it is working.
HTH,
Edwin.

Resources