Open files where the filename may contain special characters on windows - python-3.x

I try to read all text files of a directory. The below code just works on linux. But I have problems reading files with special characters in it like: unsere-fotowand.html_tx_yag_pi1%5Bc142%5D%5BalbumUid%5D=1&tx_yag_pi1%5Bc142%5D%5BgalleryUid%5D=1&tx_yag_pi1%5Baction%5D=index&tx_yag_pi1%5Bcontroller%5D=Gallery&cHash=de647de667336c05d26cce3a7cb3a28a.txt on windows. I have tried things like filename.encode().decode('utf8') but this does not help.
import os
import sys
for r, d, f in os.walk(path):
for file in f:
if file.endswith('.txt'):
filename = os.path.join(r, file)
print(f'process file {filename}')
# throws file not found exception if tilename containts & or %
with open(filename, 'r', encoding="utf-8") as txtfile:
text = txtfile.read()
How can I make this work on linux and windows?

Related

How to print relative path in file

So, I am trying to output folder & file structure to a txt file, using the below code. But I want to get relative paths outputed instead of absolute paths.
import os
absolute_path = os.path.dirname(__file__)
with open("output.txt", "w", newline='') as a:
for path, subdirs, files in os.walk(absolute_path):
a.write(path + os.linesep)
for filename in files:
a.write('\t%s\n' % filename)
For now this gives me something like this
C:\Users\User\OneDrive\bla\bla
C:\Users\User\OneDrive\bla\bla\folder1
file1.xxx
file2.xxx
file3.xxx
C:\Users\User\OneDrive\bla\bla\folder2
test1.txt
but I want to show only relative paths to where the script ran, not more
.\bla
.\bla\folder1
file1.xxx
file2.xxx
file3.xxx
.\bla\folder2
test1.txt
I have fiddled around a bit, but not getting to the solution, nor finding it here (or maybe I am not searching for the correct thing)
Any help would be appreciated
If you know the path to you current module, and you know the path of each file you find, and all your files will be in subdirectories of the current directory, you can calculate the relative path yourself.
Strings have the .replace(string_to_replace, replacement_value) method which will do this for you.
import os
absolute_path = os.path.dirname(__file__)
with open("output.txt", "w", newline='') as a:
for path, subdirs, files in os.walk(absolute_path):
a.write(path.replace(absolute_path, '.') + os.linesep)
for filename in files:
a.write('\t%s\n' % filename)
Beauty of python is, it os.path module ready for this kind of problems. You can use os.path.relpath() function
import os
absolute_path = os.path.dirname(__file__)
with open("output.txt", "w", newline='') as a:
for path, subdirs, files in os.walk(absolute_path):
a.write('.\\' + os.path.relpath(path + os.linesep, start=absolute_path))
for filename in files:
a.write('\t%s\n' % filename)
Here start parameter is used to current directory from where relative path should be resolved.

How to read data from home directory in Python

I am trying to read/get data from a json file. This json file is stored in the project > Requests > request1.json. In a script i am trying to read data from the json file and failing badly. This is the code i'm trying to use to open file in read mode.
Trying to replace(in windows)
f = open('D:\\Test\\projectname\\RequestJson\\request1.json', 'r') with
f = open(os.path.expanduser('~user') + "Requests/request1.json", 'r')
Any help would be greatly appreciated.
Using current directory path (assuming that is in the project) and appending the remaining static file path:
import os
current_dir = os.path.abspath(os.getcwd())
path = current_dir + "/RequestJson/request1.json"
with open(path, 'r') as f:
f.write(data)

Python3: Index out of range for script that worked before

the attached script returns:
IndexError: list index out of range
for the line starting with values = {line.split (...)
values=dict()
with open(csv) as f:
lines =f.readlines()
values = {line.split(',')[0].strip():line.split(',')[1].strip() for line in lines}
However, I could use it yesterday for doing exactly the same:
replacing certain text in a dir of xml-files with different texts
import os
from distutils.dir_util import copy_tree
drc = 'D:/Spielwiese/00100_Arbeitsverzeichnis'
backup = 'D:/Spielwiese/Backup/'
csv = 'D:/persons1.csv'
copy_tree(drc, backup)
values=dict()
with open(csv) as f:
lines =f.readlines()
values = {line.split(',')[0].strip():line.split(',')[1].strip() for line in lines}
#Getting a list of the full paths of files
for dirpath, dirname, filename in os.walk(drc):
for fname in filename:
#Joining dirpath and filenames
path = os.path.join(dirpath, fname)
#Opening the files for reading only
filedata = open(path,encoding="Latin-1").read()
for k,v in values.items():
filedata=filedata.replace(k,v)
f = open(path, 'w',encoding="Latin-1")
# We are writing the the changes to the files
f.write(filedata)
f.close() #Closing the files
print("In case something went wrong, you can find a backup in " + backup)
I don't see anything weird and I could, as mentioned before use it before ... :-o
Any ideas on how to fix it?
best Wishes,
K

FileNotFoundError long file path python - filepath longer than 255 characters

Normally I don't ask questions, because I find answers on this forum. This place is a goldmine.
I am trying to move some files from a legacy storage system(CIFS Share) to BOX using python SDK. It works fine as long as the file path is less than 255 characters.
I am using os.walk to pass the share name in unix format to list files in the directory
Here is the file name.
//dalnsphnas1.mydomain.com/c$/fs/hdrive/home/abcvodopivec/ENV Resources/New Regulation Review/Regulation Reviews and Comment Letters/Stormwater General Permits/CT S.W. Gen Permit/PRMT0012_FLPR Comment Letter on Proposed Stormwater Regulations - 06-30-2009.pdf
I also tried to escape the file, but still get FileNotFoundError, even though file is there.
//dalnsphnas1.mydomain.com/c$/fs/hdrive/home/abcvodopivec/ENV Resources/New Regulation Review/Regulation Reviews and Comment Letters/Stormwater General Permits/CT S.W. Gen Permit/PRMT0012_FLPR\ Comment\ Letter\ on\ Proposed\ Stormwater\ Regulations\ -\ 06-30-2009.pdf
So I tried to shorten the path using win32api.GetShortPathName, but it throws the same FileNotFoundError. This works fine on files with path length less than 255 characters.
Also tried to copy the file using copyfile(src, dst) to another destination folder to overcome this issue, and still get the same error.
import os, sys
import argparse
import win32api
import win32con
import win32security
from os import walk
parser = argparse.ArgumentParser(
description='Migration Script',
)
parser.add_argument('-p', '--home_path', required = True, help='Home Drive Path')
args = vars(parser.parse_args())
if args['home_path']:
pass
else:
print("Usage : script.py -p <path>")
print("-p <directory path>/")
sys.exit()
dst = (args['home_path'] + '/' + 'long_file_path_dir')
for dirname, dirnames, filenames in os.walk(args['home_path']):
for filename in filenames:
file_path = (dirname + '/' + filename)
path_len = len(file_path)
if(path_len > 255):
#short_path = win32api.GetShortPathName(file_path)
copyfile(file_path, dst, follow_symlinks=True)
After a lot of trial and error, figured out the solution (thanks to stockoverflow forum)
switched from unix format to UNC path
Then appending each file generated through os.walk with r'\\?\UNC' like below. UNC path starts with two backward slashes, I have to remove one to make it to work
file_path = (r'\\?\UNC' + file_path[1:])
Thanks again for everyone who responded.
Shynee

creating corresponding subfolders and writing a portion of the file in new files inside those subfolders using python

I have a folder named "data". It contains subfolders "data_1", "data_2", and "data_3". These subfolders contain some text files. I want to parse through all these subfolders and generate corresponding subfolders with the same name, inside another folder named "processed_data". I want to also generate corresponding files with "processed" as a prefix in the name and want to write all those lines from the original file where "1293" is there in the original files.
I am using the below code but not able to get the required result. Neither the subfolders "data_1", "data_2", and "data_3" nor the files are getting created
import os
folder_name=""
def pre_processor():
data_location="D:\data" # folder containing all the data
for root, dirs, files in os.walk(data_location):
for dir in dirs:
#folder_name=""
folder_name=dir
for filename in files:
with open(os.path.join(root, filename),encoding="utf8",mode="r") as f:
processed_file_name = 'D:\\processed_data\\'+folder_name+'\\'+'processed'+filename
processed_file = open(processed_file_name,"w", encoding="utf8")
for line_number, line in enumerate(f, 1):
if "1293" in line:
processed_file.write(str(line))
processed_file.close()
pre_processor()
You might need to elaborate on the issue you are having; e.g., are the files being created, but empty?
A few things I notice:
1) Your indentation is off (not sure if this is just a copy-paste issue though): the pre_processor function is empty, i.e. you are defining the function at the same level as the declaration, not inside of it.
try this:
import os
folder_name=""
def pre_processor():
data_location="D:\data" # folder containing all the data
for root, dirs, files in os.walk(data_location):
for dir in dirs:
#folder_name=""
folder_name=dir
for filename in files:
with open(os.path.join(root, filename), encoding="utf8",mode="r") as f:
processed_file_name = 'D:\\processed_data\\'+folder_name+'\\'+'processed'+filename
processed_file = open(processed_file_name,"w", encoding="utf8")
for line_number, line in enumerate(f, 1):
if "1293" in line:
processed_file.write(str(line))
processed_file.close()
pre_processor()
2) Check if the processed_data and sub_folders exist; if not, create them first as this will not do so.
Instead of creating the path to the new Folder by hand you could just replace the name of the folder.
Furthermore, you are not creating the subfolders.
This code should work but replace the Linux folder slashes:
import os
folder_name=""
def pre_processor():
data_location="data" # folder containing all the data
for root, dirs, files in os.walk(data_location):
for dir in dirs:
# folder_name=""
folder_name = dir
for filename in files:
joined_path = os.path.join(root, filename)
with open(joined_path, encoding="utf8", mode="r") as f:
processed_folder_name = root.replace("data/", 'processed_data/')
processed_file_name = processed_folder_name+'/processed'+filename
if not os.path.exists(processed_folder_name):
os.makedirs(processed_folder_name)
processed_file = open(processed_file_name, "w", encoding="utf8")
for line in f:
if "1293" in line:
processed_file.write(str(line))
processed_file.close()
pre_processor()

Resources