I have python code in Jupyter notebook and accompanying data in the same folder. I will be bundling both the code and data into a zip file and submitting for evaluation. I am trying to read the data inside the Notebook using pandas.read_csv using a relative path and thats not working. the API doesnt seem to work with relative path. What is the correct way to handle this?
Update:
My findings so far seem to suggest that, I should be using os.chdir() to set the current working directory. But I wouldn't know where the zip file will get extracted. The code is supposed to be read-only..So I cannot expect the receiver to update the path as appropriate.
You could append the current working directory with the relative path to avoid problem as such:
import os
import pandas as pd
BASE_DIR = os.getcwd()
csv_path = "csvname.csv"
df = pd.read_csv(os.path.join(BASE_DIR, csv_path)
where csv_path is the relative path.
I think first of all you should make a unzip file then you can run.
You may use the below code to unzip file,
from zipfile import ZipFile
file_name = "folder_name.zip"
with ZipFile(file_name, 'r') as zip:
zip.extractall()
print("Done !")
Related
First of all, I have to say that I'm totally new to Python. I am trying to use it to analyze EEG data with a toolbox. I have 30 EEG data files.
I want to create a loop instead of doing each analysis separately. Below you can see the code I wrote to access all the directories to be analyzed in a folder:
import os
path = 'my/data/directory'
folder = os.fsencode(path)
filenames = []
for file in os.list(folder):
filename = os.fsdecode(file)
if filename.endswith('.csv') and filename.startswith('p'): # whatever file types you're using
filenames.append(filename)
filenames.sort()
But after that, I couldn't figure out how to use them in a loop. This way I could list all the files but I couldn't find how to refer each of them in iteration within the following code:
file = pd.read_csv("filename", header=None)
fg.fit(freqs, file, [0, 60]) #The rest of the code is like this, but this part is not related
Normally I have to write the whole file path in the part that says "filename". I can open all file paths with the code I created above, but I don't know how to use them in this code respectively.
I would be glad if you help.
first of all, this was a good attempt.
What you need to do is make a list of files. From there you can do whatever you want...
You can do this as follows:
from os import listdir
from os.path import isfile, join
mypath = 'F:/code' # whatever path you want
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
# now do the for loop
for i in onlyfiles:
# do whatever you want
print(i)
I have a csv file with utf-16le encoding, I tried to open it in cloud function using
import pandas as pd
from io import StringIO as sio
with open("gs://bucket_name/my_file.csv", "r", encoding="utf16") as f:
read_all_once = f.read()
read_all_once = read_all_once.replace('"', "")
file_like = sio(read_all_once)
df = pd.read_csv(file_like, sep=";", skiprows=5)
I get the error that the file is not found on location. what is the issue? When I run the same code locally with a local path it works.
Also when the file is in utf-8 encoding I can read it directly with
df = pd.read_csv("gs://bucket_name/my_file.csv, delimiter=";", encoding="utf-8", skiprows=0,low_memory=False)
I need to know if I can read the utf16 file directly with pd.read_csv()? if no, how do I make with open() recognize the path?
Thanks in advance!
Yes, you can read the utf-16 csv file directly with the pd.read_csv() method.
For the method to work please make sure that the service account attached to your function has access to read the CSV file in the Cloud Storage bucket.
Please ensure whether the encoding of the csv file you are using is “utf-16” or “utf-16le” or “utf-16be” and use the appropriate one in the method.
I used python 3.7 runtime.
My main.py file and requirement.txt file looks as below. You can
modify the main.py according to your use case.
main.py
import pandas as pd
def hello_world(request):
#please change the file's URI
data = pd.read_csv('gs://bucket_name/file.csv', encoding='utf-16le')
print (data)
return f'check the results in the logs'
requirement.txt
pandas==1.1.0
gcsfs==0.6.2
In have a .csv file that contains ~100 links to dropbox files. The current method I have downloads the files missing the ?dl=0 extension that seems to be critical
#import packages
import pandas as pd
import wget
#read the .csv file, iterate through each row and download it
data = pd.read_csv("BRAIN_IMAGING_SUMSTATS.csv")
for index, row in data.iterrows():
print(row['Links'])
filename = row['Links']
wget.download(filename)
Output:
https://www.dropbox.com/s/xjtu071g7o6gimg/metal_roi_volume_dec12_2018_pheno1.txt.zip?dl=0
https://www.dropbox.com/s/9oc9j8zhd4mn113/metal_roi_volume_dec12_2018_pheno2.txt.zip?dl=0
https://www.dropbox.com/s/0jkdrb76i7rixa5/metal_roi_volume_dec12_2018_pheno3.txt.zip?dl=0
https://www.dropbox.com/s/gu5p46bakgvozs5/metal_roi_volume_dec12_2018_pheno4.txt.zip?dl=0
https://www.dropbox.com/s/8zfpfscp8kdwu3h/metal_roi_volume_dec12_2018_pheno5.txt.zip?dl=0
These look like the correct links, but the download files are in the format
metal_roi_volume_dec12_2018_pheno1.txt.zip instead of metal_roi_volume_dec12_2018_pheno1.txt.zip?dl=0, so I cannot unzip them. Any ideas how to download the actual dropbox files?
By default (without extra URL parameters, or with dl=0 like in your example), Dropbox shared links point to an HTML preview page for the linked file, not the file data itself. Your code as-is will download the HTML, not the actual zip file data.
You can modify these links for direct file access though, as documented in this Dropbox help center article.
So, you should modify the link, e.g., to use raw=1 instead of dl=0, before calling wget.download on it.
Quick fix would be something like:
#import packages
import pandas as pd
import wget
import os
from urllib.parse import urlparse
#read the .csv file, iterate through each row and download it
data = pd.read_csv("BRAIN_IMAGING_SUMSTATS.csv")
for index, row in data.iterrows():
print(row['Links'])
filename = row['Links']
parsed = urlparse(filename)
fname = os.path.basename(parsed.path)
wget.download(filename, fname)
Basically, you extract filename from the URL and then use that filename as the output param in the wget.download fn.
I am currently working on an automation project for a company, and one of the tasks require that I loop through a directory and convert all the pdf files into a CSV file. I am using the camelot-py library (which has been better than the others I have tried). When I apply the code below to a single file, it works just fine; however, I wish to make it loop through all pdf files in the directory. I get the following error with the code below:
"OSError: [Errno 22] Invalid argument"
import camelot
import csv
import pandas as pd
import os
directoryPath = r'Z:\testDirectory'
os.chdir(directoryPath)
print(os.listdir())
folderList = os.listdir(directoryPath)
for folders, sub_folders, file in os.walk(directoryPath):
for name in file:
if name.endswith(".pdf"):
filename = os.path.join(folders,name)
print(filename)
print(name)
tables = camelot.read_pdf(filename, flavor = 'stream', columns= ['72,73,150,327,442,520,566,606,683'])
tables = tables[0].df
print(tables[0].parsing_report)
tables.to_csv('foo2.csv')
I expect all files to be converted to '.csv' files but I get the error 'OSError: [Errno 22] Invalid argument'. My error appears to be from line 16.
I don’t know if you have the same problem, but in my case I made a really stupid mistake of not putting the files in the correct directory. I was getting the same error but once I found out the problem, script works within a regular for loop.
Instead of the to methods, I am using the bulk export to export the results in sql, but that should not be a problem.
Just want to know is there any proper way to load multiple config files to python scripts.
Directory structure as below.
dau
|-APPS
|---kafka
|---brokers
|-ENVS
As per the above, my base directory is dau. I'm planing to hold the script in Kafka and Broker directories. All global environments store in ENVS directory with ".ini" format. I want to load those ini files to all the script without adding one by one, because we may have to add more environments files in the future , in that case we don't have to add them manually on each and every scripts.
Sample env.ini
[DEV]
SERVER_NAME = dev123.abcd.net
i was trying to use the answer of below link, but still we have to add them manually, or if the parent path change in the dau directory, we have to edit the code.
Stack-flow-answer
Hi I came up with below solution, Thanks for the support.
Below code will get the all .ini file as list and return.
import os
def All_env_files():
try:
BASE_PATH = os.path.abspath(os.path.join(__file__,"../.."))
ENV_INI_FILES = [os.path.join(BASE_PATH + '/ENVS/',each) for each in os.listdir(BASE_PATH + '/ENVS') if each.endswith('.ini')]
return ENV_INI_FILES
except ValueError:
raise ValueError('Issue with Gathering Files from ENVS Directory')
Below code will take the list ini files and provide it to ConfigParser.
import ConfigParser, sys , os
"""
This is for kafka broker status check
"""
#Get Base path
Base_PATH = os.path.abspath(os.path.join(__file__,"../../.."))
sys.path.insert(0, Base_PATH)
#Importing configs python file on ../Configs.py
import Configs, edpCMD
#Taking All the ENVS ini file as list
List_ENVS = Configs.All_env_files()
Feel free to provide any shorter way to this.