Extract zip file and keeping top folder using python - python-3.x

I have folder in like CW1234.zip and it has various folders and subfolders like below. So, CW1234.zip has CW_All folder which in turn has CW123 and CW234 folders and so on
CW1234.zip
CW_All
CW123
xyz.pdf
CW234
abc.doc
and to extract I use this code:
from zipfile import ZipFile
with ZipFile(r'CW41234.zip', 'r') as zipObj:
# Extract all the contents of zip file in current directory
zipObj.extract()
The only problem is the unzipped folder I get is from CW_All and all the subfolders and file.
What I want is to get it from CW1234 as one folder and then the structure follows?
Current Output
CW_All
CW123
xyz.pdf
CW234
abc.doc
Expected Output
CW1234
CW_All
CW123
xyz.pdf
CW234
abc.doc
Couldn't find anything in the documentation also!!

Using ZipFile.extractall() we can simply provide a new path to extract the contents of the archive to, which we can base on the filename of the archive.
I have a .zip file with the following structure:
archive1024.zip:.
│
└───Folder_with_script
stuff.py
Here is the script to extract all of the files inside of the archive into a sub-folder:
from zipfile import ZipFile
file = "archive1024.zip"
with ZipFile(file, "r") as zFile:
zFile.extractall(path=file.split(".")[0])
I now have a folder-structure like this:
J:.
│ archive1024.zip
│ unzip.py
│
└───archive1024
└───Folder_with_script
stuff.py

Related

Renaming Files in Subdirectories using file path

Scenario: I am trying to Rename all .txt file named "a.txt" in all subfolders of a directory.
Question: I came up with the following code, but it has and issue: My loops don't work as expected, I was hoping to get the directory loop, to use the last part of the path, and use that string to rename the file. Right now, my code will rename the file with the latest directory name. How can this be fixed?
Code:
import os
import fnmatch
directory = "C:/Users/DGMS/Desktop/Test"
for root, subdirectories, files in os.walk(directory):
for subdirectory in subdirectories:
pathtest = os.path.basename(os.path.normpath(os.path.join(root, subdirectory)))
print(pathtest)
for file in files:
if fnmatch.fnmatch(file, 'a.txt'):
os.rename(os.path.join(root, file),(os.path.join(root, pathtest)))
print(os.path.join(root, file))
Here is a better code for what you want. All "a.txt" now becomes "b.txt"
import os
rootdir = 'C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
if file == "a.txt"
os.rename(os.path.join(subdir, file),os.path.join(subdir, "b.txt"))

How to unzip archive directly into target folder without creating a subfolder with the archive name (7zip, command line)?

I'm using the 7zip command line interface to extract archives, like so:
7za.exe x -y {path_to_zipfile} -o{path_to_target_folder}
If my zipfile is named my_archive.7z, then I get the following filestructure in the target folder:
🗁 target_folder
└─ 🗁 my_archive
├─ 🗋 foo.png
├─ 🗁 bar
│ ├─ 🗋 baz.txt
│ └─ 🗋 qux.txt
...
However, I don't want the subfolder 🗁 my_archive. I'm looking for flags to apply on the 7zip command such that everything extracts directly in the target folder, without creating the 🗁 my_archive subfolder.
NOTES
I can't replace x with e because the filestructure shouldn't be lost (the e flag pushes all files to the toplevel).
I'm working on a Windows 10 computer, but the solution must also work on Linux.
I'm using the following version: 7-Zip (a) 19.00 (x64)
Some background info: I'm calling 7zip from a Python program, like so:
# Variables:
# 'sevenzip_abspath': absolute path to 7za executable
# 'zipfile_abspath': absolute path to zipped file (`.7z` format)
# 'targetdir_abspath': absolute path to target directory
commandlist = [
sevenzip_abspath,
'x',
'-y',
zipfile_abspath,
f'-o{targetdir_abspath}',
]
output = subprocess.Popen(
commandlist,
stdout=subprocess.PIPE,
shell=False,
).communicate()[0]
if output is not None:
print(output.decode('utf-8'))
I know I could do all kinds of things in Python after the unzipping has finished (move/rename directories, etc etc), but that's for plan B. First I want to check if there is an elegant solution.
I'd like to stick to 7zip for reasons that would lead us too far here.
You can rename the top level folder to match the target folder before extracting the archive.
7za rn {path_to_zipfile} my_archive target_folder
This will permanently change the archive. If you don't want that, take a copy first.

zipfile : zip only the files present in a directory

I want to zip just a file in Python which present in a folder. I am finding hard to create zip file with the below code snippet. It does create zip file, but it has complete folder structure inside.
import zipfile as zip
root=r"C:\XXXX\YYYYYY\ZZZZ\"
file="abc.txt"
zipper=zip.ZipFile(file=os.path.join(root,file.replace("txt","zip")),mode="w",compression=zip.ZIP_DEFLATED)
zipper.write(os.path.join(root,file))
zipper.close()
Actual output:
#################
abc.zip
|
XXXX - Folder
|
YYYYYY - Folder
|
ZZZZ - Folder
|
abc.txt
Expected output
###############
abc.zip
|
abc.txt
One way I learnt and working is :
os.chdir(root)
To set the working directory to the folder where the files are present. Then, pass just the filename instead of complete path to create zip.
Not sure, if it is the correct and best way.

Create zip files using spark(python)

I'm trying to create a zip file from several files. For example, I have 3 files
file1
file2
file3
I want to create a zip folder that contains all these files
Is there some way to create zip folder containing multiple files in Spark(Python)?
First of all spark is a framework which is also in python language. I did not understand how do you relate zip files and spark in this question.
If you want to create zip files in python, check out zipfiles library. Try and if you're stuck copy your code here. Then we will try to help.
from zipfile import ZipFile
# create a ZipFile object
with ZipFile('sampleDir.zip', 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
for filename in filenames:
#create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath)
o/p:
sampleDir/file1.csv 2018-11-30 21:44:46 2829
sampleDir/file2.csv 2018-11-30 21:44:36 3386
sampleDir/file3.csv 2018-11-30 21:44:56 3552

Does the following program access a file in a subfolder of a folder?

using
import sys
folder = sys.argv[1]
for i in folder:
for file in i:
if file == "test.txt":
print (file)
would this access a file in the folder of a subfolder? For Example 1 main folder, with 20 subfolders, and each subfolder has 35 files. I want to pass the folder in commandline and access the first subfolder and the second file in it
Neither. This doesn't look at files or folders.
sys.argv[1] is just a string. i is the characters of that string. for file in i shouldn't work because you cannot iterate a character.
Maybe you want to glob or walk a directory instead?
Here's a short example using the os.walk method.
import os
import sys
input_path = sys.argv[1]
filters = ["test.txt"]
print(f"Searching input path '{input_path}' for matches in {filters}...")
for root, dirs, files in os.walk(input_path):
for file in files:
if file in filters:
print("Found a match!")
match_path = os.path.join(root, file)
print(f"The path is: {match_path}")
If the above file was named file_finder.py, and you wanted to search the directory my_folder, you would call python file_finder.py my_folder from the command line. Note that if my_folder is not in the same directory as file_finder.py, then you have to provide the full path.
No, this won't work, because folder will be a string, so you'll be iterating through the characters of the string. You could use the os module (e.g., the os.listdir() method). I don't know what exactly are you passing to the script, but probably it would be easiest by passing an absolute path. Look at some other methods in the module used for path manipulation.

Resources