python3 ZipFile.extractall extracts a empty file - multithreading

Say I compress a .txt file as a .zip format with a password 123, the .txt file has a few characters like abcd. Then I make a new thread, using the zipfile lib in python3 to uncompress the .zip file. The core code in the thread is:
import zipfile as zf
zipf = zf.ZipFile(target)
zipf.extractall(path='./', pwd=password)
However, the .txt file extracted is empty, namely there is no character in it. Then I do the same experiment with a .jpeg image, this time, the image could be extracted perfectly.
I am so confuse about this, could anyone propose a reasonable explanation ?

Related

Is there a tool to extract a file from a ZIP archive when that file is not present in central directory but has its own LFH?

I'm looking for a tool that can extract files by searching aggressively through a ZIP archive. The compressed files are preceded with LFHs but no CDHs are present. Unzip outputs an empty folder.
I found one called 'binwalk' but even though it finds the hidden files inside ZIP archives it seems not to know how to extract them.
Thank You in advance.
You can try sunzip. It reads the zip file as a stream, and will extract files as it encounters the local headers and compressed data.
Use the -r option to retain the files decompressed in the event of an error. You will be left with a temporary directory starting with _z containing the extracted files, but with temporary, random names.

Python: Access a zipped XL file without extracting it

Is there a way I can process an open the excel file within a zip file without first extracting it. I am not interested in modifying it.
from zipfile import ZipFile
from openpyxl import load_workbook
procFile ="C:\\Temp2\\XLFile-Demo-PW123.zip"
xl_file = "XLFile-Demo.xlsx"
myzip = ZipFile(procFile)
myzip.setpassword(bytes('123', 'utf-8'))
# line below returns an error
with load_workbook(myzip.open(xl_file)) as wb_obj:
print(wb_obj.sheetnames)
Most of the examples that perform this only directly open text files.
I would like to simulate the behaviour of archiving programs such as WinRar and 7zip.
Thanks

read_csv one file from several files in a gzip?

I have several files in my tar.gz zip file. I want to read only one of them into a pandas data frame. Is there any way to do that?
Pandas can read a file inside a gz. But seems like there is no way to tell it specifically read one of them if there are several files inside the gz.
Would appreciate any thoughts.
Babak
To read a specific file in any compressed folder we just need to give its name or position for e.g to read a specific csv file in a zipped folder we can just open that file and read the content.
from zipfile import ZipFile
import pandas as pd
# opening the zip file in READ mode
with ZipFile("results.zip") as z:
read = pd.read_csv(z.open(z.infolist()[2].filename))
print(read)
Here the folder structure of results looks like and I want to read test.csv :
$ data_description.txt sample_submission.csv test.csv train.csv
If you use pardata, you can do this in one line:
import pardata
data = pardata.load_dataset_from_location('path-to-zip.zip')['table/csv']
The returned data variable should be a dictionary of all csv files in the zip archive.
Disclaimer: I'm one of the main co-authors of pardata.

result shows file full of some symbols rather than text when I loop files

I was looping some files to copy the content of somes file to a new file but after I run the code, the result shows lot of symbols in the new file, not the text content of the files I looped.
first, when I ran the code without putting the 'encoding' attribute in open file line, it showed an error message like,
UnicodeEncodeError: 'charmap' codec can't encode character '\x8b' in position 12: character maps to .
I tried various encodings like utf-8,latin1 but nothing worked and when i put 'errors=ignore' in the open file line, then the result showed like I described above.
import os
import glob
folder = os.path.join('R:', os.sep, 'Files')
def notes():
for doc in glob.glob(folder + r'\*'):
if doc.endswith('.pdf'):
with open(doc,'r') as f:
x = f.readlines()
with open('doc1.text', 'w+') as f1:
for line in x:
f1.write(line)
notes()
If I understand your example correctly and you’re trying to read PDF files, your problem is not one of encoding but of file format. PDF files don’t just to store your text in coding materials are unique format that you need to be able to read in order to extract the text. There are a couple of python libraries that can read PDF files (such as Py2PDF), please refer to this thread for more information: How to extract text from a PDF file?

Tclsh convert base64 dump into zip file

I have written a Tclsh code that will fetch a zip file content in base64 format through xml-rpc method. I am dumping that base64 data into a file using the following snippet:
#!/usr/bin/tclsh
...
set mybase64Dump [myXmlRpcCallToReturnThisDump]
set zipFilePtr [open "xyz.zip" "w"]
puts $zipFilePtr $mybase64Dump
close $zipFilePte
Zip file was getting generated with XKbytes of size, but when trying to open using 7zip it says, Is not Archive. But I copy pasted the same base64 dump in a online converter. It was giving me a proper extractable zip file.
Is it something I am doing wrongly?
You probably need to configure the output file to be binary, not ascii. The default translation for a newly opened file is "auto", which does system-specific translation of the end-of-line characters, which is not what you want for a .zip file. Configure this using fconfigure on the handle after opening it or by adding the BINARY access flag to the open command.
See http://www.tcl.tk/man/tcl8.5/TclCmd/open.htm and http://www.tcl.tk/man/tcl8.5/TclCmd/fconfigure.htm for details on the syntax.

Resources