How do I open a csv file using VS python Jupiter cells - python-3.x

I have been trying to set The Desktop as my working Directory, so I can load a csv
import os
path="/Users/HOME/Desktop"
os.getcwd()
It returns
/
using pandas library I'm failing to use
DF = pd.read_csv("filename", "mode")

You did not actually change the working directory, you merely assigned a path to a variable. You need to invoke os.chdir() with that path (see https://stackoverflow.com/a/431715/14015737):
import os
path="/Users/HOME/Desktop"
os.chdir(path)
os.getcwd()
This should return the path.
In order to then read your .csv file that is located there (e.g. /Users/HOME/Desktop/test.csv) you can call read_csv() without a path. Full example:
import os
import pandas
path='/Users/HOME/Desktop'
os.chdir(path)
df = pandas.read_csv('test.csv')

Related

How do you read csv without the absolute path ( python 3.10 panda)

so i put the python file and the csv file in the same folder.
i know that its usually pd.read_csv("C://Users/XXX//XXX//XXX//XXX//Data.csv")
how can i make it into "pd.read_csv('Data.csv')"
thanks
use __file__:
import os.path
pd.read_csv(os.path.join(os.path.dirname(__file__), "Data.csv")

How does one import the contents of a text or configuration file innately in a project?

I tried the following in an __init__.py file, thinking that it would be evaluated according to its location at the time of import:
# /../../proj_dir_foo/__init__.py
# opens file: .../proj_dir_foo/foo.csv
import pandas as pd
settings = pd.read_csv('foo.csv')
And from a different file:
from foo.bar.proj_dir_foo import settings
Yields: FileNotFoundError
But this is not really convenient. Instead of accumulating configuration files that are much easier to modify, I am accumulating source code in proj_dir_foo which stores configuration info.
The sole reason it is in source code is because having a project module that knows where the root's resources or materials folder full of configs is is not technically a "module". Instead, it is an integrated cog in a machine. Or, rather, a thing I can no longer easily refactor.
How does one modularize any arbitrary configuration file in python project?
Your script's current directory is the directory from which you started it. import os; print(os.getcwd()) will show you that.
If you want to open a file what sits in a place relative to your code, you have several options.
Use sys.argv[0] to get the path to your script; Use path.dirname() to extract the directory from it, and path.join() make a path to a particular file:
# some script.
import json, sys, path
my_path = path.dirname(sys.argv[0])
cfg_path = path.join(my_path, 'config', 'settings.json')
with open(my_path) as cfg_file:
my_settings = json.load(cfg_file)
Alternatively, if you import a module, you can use its __file__ attribute to learn where did you import it from, and use to locate a config:
# some script.
import path
import foo.bar.baz
baz_cfg_path = path.join(path.dirname(foo.bar.baz.__file__), 'baz.cfg')

azure pyspark from modules import myfunctions; No module name

I have tried a number of methods to import a local script containing an bunch of shared functions from our shared team directory with example in the code below. I also tried "from . import sharedFunctions" with the importing script in the same directory and "from sharedModules import sharedFunctions" from the parent directory. All of these return No module named 'sharedFunctions' based on some google searches. What is the best way to set this up in Azure?
Thanks
import sys, os
dir_path = '/Shared/XXX/sharedModules'
sys.path.insert(0, dir_path)
print(sys.path)
# dir_path = os.path.dirname(os.path.realpath(__file__))
# sys.path.insert(0, dir_path)
import sharedFunctions
sourceTable='dnb_raw'
sourceQuery='select DUNSNumber , GlobalUltimate_Name, BusinessName from'
sourceId = 'DUNSNumber'
sourceNameList=['Tradestyle','BusinessName']
NewTable = 'default.' + sourceTable + '_enhanced'
#dbutils.fs.rm("dbfs:/" + NewTable + "/",recurse=True)
clean_names(sourceTable,sourceQuery,sourceId,sourceNameList)
when you're working with notebooks in Databricks, they are not on some file system that is understood by Python as module.
If you want to include another notebook with some other definitions into the current context, you can use %run magic command, passing the name of another notebook as an argument:
%run /Shared/XXX/sharedModules/sharedFunctions
But the %run is not the full substitution for imports, as described in the documentation
You cannot use %run to run a Python file and import the entities defined in that file into a notebook. To import from a Python file you must package the file into a Python library, create a Databricks library from that Python library, and install the library into the cluster you use to run your notebook.
If you want to execute another notebook to get some results from it, you can use so-called notebook workflow - when exeucting via dbutils.notebook.run, the notebook is scheduled for execution, you can pass some parameters to it, etc., but results will be shared mostly via file system, managed table, etc.

How to create a folder of current date and time And copy some other folder in that recently made folder in python

I have tried it in Jetbrains Pycharm but it is showing an error.
Here is my code
from datetime import datetime
import os
import shutil
now = datetime.now()
dt=now.strftime("%d%m%Y %H%M%S")
os.mkdir(dt)
shutil.copytree(r"C:\Users\Computer\PycharmProjects\06\08",r"C:\Users\Computer\PycharmProjects\Filecopy\dt")
If you are trying to use the path of your newly-created directory, the line should do:
..., r"C:\Users\Computer\PycharmProjects\Filecopy\{}".format(dt))

How to change python script to .exe with user defined input and output paths in python

I have python script , i want to change this simple script to .exe file with user defined input and output path .
in below script 'csv' is input folder and contain multiple txt files ,
import pandas as pd
import numpy as np
import os
for file in os.listdir('csv/'):
filename = 'csv/{}'.format(file)
print(filename)
df=pd.read_csv(filename)
df.to_csv(path_out)
A simple way you can do this with cx_freeze is as follows:
conda install -c conda-forge cx_freeze, or pip install cx_freeze to your env with numpy and pandas
Make a folder called dist for your new .exe
Save the code below as csv_thing.py, or whatever you want it to be called.
Run the command: cxfreeze csv_thing.py --target-dir C:\somepath\dist
There is a good chance that without using a cx_freeze setup file (spec file in pyinstaller) that not all of the files will get copied over to the dist dir. Numpy and pandas from Anaconda envs are often tricky.
If file failure occurs, you can manually copy the .dll files over into the dist folder; it's easy if you just grab them all. If you're using an Anaconda env, they likely live here: C:\Users\your_user_account\Anaconda3\envs\panel\Library\bin. Otherwise grab all of them from the numpy location: C:\Users\matth\Anaconda3\envs\panel\Lib\site-packages\numpy and copy to the dist dir.
import numpy as np
import pandas as pd
import os
in_dir = input(' enter a folder path where your .csvs are located: ')
out_dir = input(' enter a folder path where your .csvs will go: ')
csv_list = [os.path.join(in_dir, fn) for fn in next(os.walk(in_dir))[2]if fn.endswith('.csv')]
for csv in csv_list:
file_name = os.path.basename(csv)
print(file_name)
df = pd.read_csv(csv)
df.to_csv(os.path.join(out_dir, file_name))

Resources