how to open files names from directory dialog in pyqt - pyqt

path = str(QtGui.QFileDialog.getExistingDirectory(self, "Select Directory "))
How to read the file form directorydialog

file = str(QtGui.QFileDialog.getOpenFileName(self, "Select File", "", "*.png *.jpg"))
print file
This will list only png and jpg files, type the file extension you want to list.

Related

Converting multiple files in a directory into .txt format. But file names become Binary

So I am creating plagiarism software, for that, I need to convert .pdf, .docx,[enter image description here][1] etc files into a .txt format. I successfully found a way to convert all the files in one directory to another. BUT the problem is, this method is changing the file names
into binary values. I need to get the original file name which I am gonna need in the next phase.
**Code:**
import os
import uuid
import textract
source_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/mainfolder")
for filename in os.listdir(source_directory):
file, extension = os.path.splitext(filename)
unique_filename = str(uuid.uuid4()) + extension
os.rename(os.path.join(source_directory, filename), os.path.join(source_directory, unique_filename))
training_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/trainingdata")
for process_file in os.listdir(source_directory):
file, extension = os.path.splitext(process_file)
# We create a new text file name by concatenating the .txt extension to file UUID
dest_file_path = file + '.txt'
# extract text from the file
content = textract.process(os.path.join(source_directory, process_file))
# We create and open the new and we prepare to write the Binary Data which is represented by the wb - Write Binary
write_text_file = open(os.path.join(training_directory, dest_file_path), "wb")
# write the content and close the newly created file
write_text_file.write(content)
write_text_file.close()
remove this line where you rename the files:
os.rename(os.path.join(source_directory, filename), os.path.join(source_directory, unique_filename))
that's also not binary, but a uuid instead.
Cheers

How to merge multiple text file located in two different folders and create a new column in the combine file in python?

All,
I have two folders that contains ~1000 txt files. Say Folder 1 and Folder 2. I would like to combine all the files in one txt file and create a new column call "Label" and assign labels such that if 001.txt file belong to Folder 1, the the label column will have "Folder 1" as a label.Likewise, if the txt file belong to "Folder 2" than the label will "Folder 2". So far I have below code, where I manage to marge all the txt file in folder 1 and rename to folder 1, but that's not What I want.
Folder1=001.txt,002.txt....1000.txt
Folder2=001.txt,002.txt....1000.txt
Reference dataset can be found here
Download File name reference = polarity dataset v1.0
import fileinput
import glob
file_list = glob.glob("*txt") #Looking at the files that has .txt extension
with open('Folder1.txt', 'w') as file:
input_lines = fileinput.input(file_list)
file.writelines(input_lines)

Create folders dynamically and write csv files to that folders

I would like to read several input files from a folder, perform some transformations,create folders on the fly and write the csv to corresponding folders. The point here is I have the input path which is like
"Input files\P1_set1\Set1_Folder_1_File_1_Hour09.csv" - for a single patient (This file contains readings of patient (P1) at 9th hour)
Similarly, there are multiple files for each patient and each patient files are grouped under each folder as shown below
So, to read each file, I am using wildcard regex as shown below in code
I have already tried using the glob package and am able to read it successfully but am facing issue while creating the output folders and saving the files. I am parsing the file string as shown below
f = "Input files\P1_set1\Set1_Folder_1_File_1_Hour09.csv"
f[12:] = "P1_set1\Set1_Folder_1_File_1_Hour09.csv"
filenames = sorted(glob.glob('Input files\P*_set1\*.csv'))
for f in filenames:
print(f) #This will print the full path
print(f[12:]) # This print the folder structure along with filename
df_transform = pd.read_csv(f)
df_transform = df_transform.drop(['Format 10','Time','Hour'],axis=1)
df_transform.to_csv("Output\" + str(f[12:]),index=False)
I expect the output folder to have the csv files which are grouped by each patient under their respective folders. The screenshot below shows how the transformed files should be arranged in output folder (same structure as input folder). Please note that "Output" folder is already existing (it's easy to create one folder you know)
So to read files in a folder use os library then you can do
import os
folder_path = "path_to_your_folder"
dir = os.listdir(folder_path)
for x in dir:
df_transform = pd.read_csv(f)
df_transform = df_transform.drop(['Format 10','Time','Hour'],axis=1)
if os.path.isdir("/home/el"):
df_transform.to_csv("Output/" + str(f[12:]),index=False)
else:
os.makedirs(folder_path+"/")
df_transform.to_csv("Output/" + str(f[12:]),index=False)
Now instead of user f[12:] split the x in for loop like
file_name = x.split('/')[-1] #if you want filename.csv
Let me know if this is what you wanted

openfile .nc and .txt and other using wxpython

i use openfile in my code i need to open nectcdf4 file and txt file how can i add it in my code :
def onOpen(self, event):
wildcard = "netCDF4 files (*.nc)|*.nc" #here need add .TXT
dialog = wx.FileDialog(self, "Open netCDF4", wildcard=wildcard,
style=wx.FD_OPEN | wx.FD_FILE_MUST_EXIST)
if dialog.ShowModal() == wx.ID_CANCEL:
return
path = dialog.GetPath()
i use wxpython for python3.6
thanks for help
You can either add a semi-colon followed by another wildcard string, such as
"Audio|*.mp3;*.wav;*.flac;*.ogg;*.dss;*.aac;*.wma;*.au;*.ra;*.dts;*.aif"
which is useful for creating groups of related files
or
add another pipe character | and a new wildcard description and definition, such as:
wildcard ="netCDF4 files (nc)|*.nc| Text files (txt) |*.txt| All files |*.*"

Read zip files from hdfs and create a dataframe with file name and file content

I am trying to read zip files located in wasp location and, creating a df with the file name and file content. Rach zip file is having multiple .rcv files.
I tried doing it but it's giving the content of all files as an rdd, and not exactly how I want it.
the below code gives the zipfile name and its files contents as key value pair. what i expect is to have to the individual file name and its content as key value pair
eg:
ACARS 20170507/file1.rcv "file content as string"
ACARS 20170507/file2.rcv "file content as string"
def zip_extract(x):
in_memory_data = io.BytesIO(x[1])
file_obj = zipfile.ZipFile(in_memory_data, "r")
files = [i for i in file_obj.namelist()]
return dict(zip(files, [file_obj.open(file).read() for file in files]))
zips = sc.binaryFiles("hdfs://ACARS 20170507.zip")
files_data = zips.map(zip_extract).collect()

Resources