Create zip files using spark(python)

Create zip files using spark(python) - apache-spark

I'm trying to create a zip file from several files. For example, I have 3 files
file1
file2
file3
I want to create a zip folder that contains all these files
Is there some way to create zip folder containing multiple files in Spark(Python)?

First of all spark is a framework which is also in python language. I did not understand how do you relate zip files and spark in this question.
If you want to create zip files in python, check out zipfiles library. Try and if you're stuck copy your code here. Then we will try to help.

from zipfile import ZipFile
# create a ZipFile object
with ZipFile('sampleDir.zip', 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
for filename in filenames:
#create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath)
o/p:
sampleDir/file1.csv 2018-11-30 21:44:46 2829
sampleDir/file2.csv 2018-11-30 21:44:36 3386
sampleDir/file3.csv 2018-11-30 21:44:56 3552

Related

zipfile.ZipFile creating an archive without file paths

I create an empty archive and write files there.
import zipfile #include the module
zname=r'D:\bdseoru.zip' # create variable - file name and location
newzip=zipfile.ZipFile(zname,'w') #create archive
newzip.write(r'D:\1\milion.txt') #add file to archive
newzip.write(r'D:\1\links.txt') #add file2 to archive
newzip.close() #close the archive
If I unzip this archive, I get files that are in folder 1. How to pack only files (milion.txt links.txt) into the archive so that there is no folder 1 in the archive.

Renaming Files in Subdirectories using file path

Scenario: I am trying to Rename all .txt file named "a.txt" in all subfolders of a directory.
Question: I came up with the following code, but it has and issue: My loops don't work as expected, I was hoping to get the directory loop, to use the last part of the path, and use that string to rename the file. Right now, my code will rename the file with the latest directory name. How can this be fixed?
Code:
import os
import fnmatch
directory = "C:/Users/DGMS/Desktop/Test"
for root, subdirectories, files in os.walk(directory):
for subdirectory in subdirectories:
pathtest = os.path.basename(os.path.normpath(os.path.join(root, subdirectory)))
print(pathtest)
for file in files:
if fnmatch.fnmatch(file, 'a.txt'):
os.rename(os.path.join(root, file),(os.path.join(root, pathtest)))
print(os.path.join(root, file))

Here is a better code for what you want. All "a.txt" now becomes "b.txt"
import os
rootdir = 'C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file == "a.txt"
os.rename(os.path.join(subdir, file),os.path.join(subdir, "b.txt"))

Saving docx files names and pdf files names into a txt file and copying those docx and pdf files into a specific folder

I have a directory called found_files and a txt file called foundFile located into Users/PaCY/Downloads. I want to create a python script that saves docx file names and pdf file names into that txt file called foundFile and then copies those docx and pdf files into a directory called found_files.
this is the code below!
import glob, os
os.chdir("/Users/PaCY/Downloads")
a = open("foundFile.txt", "w")
for file in glob.glob("*.txt"):
for path, subdirs, files in os.walk(r'/Users/PaCY/Downloads/'):
for filename in files:
a.write(str(filename) + os.linesep)
PS: I'm running the program and getting the names of all files located in current working directory.
Anyone with the help on this program guys.
I appreciate your help.

Extract zip file and keeping top folder using python

I have folder in like CW1234.zip and it has various folders and subfolders like below. So, CW1234.zip has CW_All folder which in turn has CW123 and CW234 folders and so on
CW1234.zip
CW_All
CW123
xyz.pdf
CW234
abc.doc
and to extract I use this code:
from zipfile import ZipFile
with ZipFile(r'CW41234.zip', 'r') as zipObj:
# Extract all the contents of zip file in current directory
zipObj.extract()
The only problem is the unzipped folder I get is from CW_All and all the subfolders and file.
What I want is to get it from CW1234 as one folder and then the structure follows?
Current Output
CW_All
CW123
xyz.pdf
CW234
abc.doc
Expected Output
CW1234
CW_All
CW123
xyz.pdf
CW234
abc.doc
Couldn't find anything in the documentation also!!

Using ZipFile.extractall() we can simply provide a new path to extract the contents of the archive to, which we can base on the filename of the archive.
I have a .zip file with the following structure:
archive1024.zip:.
│
└───Folder_with_script
stuff.py
Here is the script to extract all of the files inside of the archive into a sub-folder:
from zipfile import ZipFile
file = "archive1024.zip"
with ZipFile(file, "r") as zFile:
zFile.extractall(path=file.split(".")[0])
I now have a folder-structure like this:
J:.
│ archive1024.zip
│ unzip.py
│
└───archive1024
└───Folder_with_script
stuff.py

Python - Find, extract archives in sub folders and delete after extraction

I have a lot of sub folders containing zip files. I want find all zips, extract where they are and delete after extraction. So far I've managed to write this:
import zipfile,fnmatch,os
rootPath = r"."
pattern = '*.zip'
for root, dirs, files in os.walk(rootPath):
for filename in fnmatch.filter(files, pattern):
print(os.path.join(root, filename))
zipfile.ZipFile(os.path.join(root, filename)).extractall(os.path.join(root, os.path.splitext(filename)[0]))
I find and extract all the zips but no matter how I try to delete a zip I get an access denied error. Is this a proper way of implementing what I'm after? How can I delete archives after extraction?
Thx

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Create zip files using spark(python) - apache-spark

I'm trying to create a zip file from several files. For example, I have 3 files file1 file2 file3 I want to create a zip folder that contains all these files Is there some way to create zip folder containing multiple files in Spark(Python)?

First of all spark is a framework which is also in python language. I did not understand how do you relate zip files and spark in this question. If you want to create zip files in python, check out zipfiles library. Try and if you're stuck copy your code here. Then we will try to help.

Related

zipfile.ZipFile creating an archive without file paths

Renaming Files in Subdirectories using file path

Saving docx files names and pdf files names into a txt file and copying those docx and pdf files into a specific folder

Extract zip file and keeping top folder using python

Python - Find, extract archives in sub folders and delete after extraction

Categories

Resources