Custom filetype in Python 3 - python-3.x

How to start creating my own filetype in Python ? I have a design in mind but how to pack my data into a file with a specific format ?
For example I would like my fileformat to be a mix of an archive ( like other format such as zip, apk, jar, etc etc, they are basically all archives ) with some room for packed files, plus a section of the file containing settings and serialized data that will not be accessed by an archive-manager application.
My requirement for this is about doing all this with the default modules for Cpython, without external modules.
I know that this can be long to explain and do, but I can't see how to start this in Python 3.x with Cpython.

Try this:
from zipfile import ZipFile
import json
data = json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
with ZipFile('foo.filetype', 'w') as myzip:
myzip.writestr('digest.json', data)
The file is now a zip archive with a json file (thats easy to read in again in many lannguages) for data you can add files to the archive with myzip write or writestr. You can read data back with:
with ZipFile('foo.filetype', 'r') as myzip:
json_data_read = myzip.read('digest.json')
newdata = json.loads(json_data_read)
Edit: you can append arbitrary data to the file with:
f = open('foo.filetype', 'a')
f.write(data)
f.close()
this works for winrar but python can no longer process the zipfile.

Use this:
import base64
import gzip
import ast
def save(data):
data = "[{}]".format(data).encode()
data = base64.b64encode(data)
return gzip.compress(data)
def load(data):
data = gzip.decompress(data)
data = base64.b64decode(data)
return ast.literal_eval(data.decode())[0]
How to use this with file:
open(filename, "wb").write(save(data)) # save data
data = load(open(filename, "rb").read()) # load data
This might look like this is able to be open with archive program
but it cannot because it is base64 encoded and they have to decode it to access it.
Also you can store any type of variable in it!
example:
open(filename, "wb").write(save({"foo": "bar"})) # dict
open(filename, "wb").write(save("foo bar")) # string
open(filename, "wb").write(save(b"foo bar")) # bytes
# there's more you can store!

This may not be appropriate for your question but I think this may help you.
I have a similar problem faced... but end up with some thing like creating a zip file and then renamed the zip file format to my custom file format... But it can be opened with the winRar.

Related

Converting multiple files in a directory into .txt format. But file names become Binary

So I am creating plagiarism software, for that, I need to convert .pdf, .docx,[enter image description here][1] etc files into a .txt format. I successfully found a way to convert all the files in one directory to another. BUT the problem is, this method is changing the file names
into binary values. I need to get the original file name which I am gonna need in the next phase.
**Code:**
import os
import uuid
import textract
source_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/mainfolder")
for filename in os.listdir(source_directory):
file, extension = os.path.splitext(filename)
unique_filename = str(uuid.uuid4()) + extension
os.rename(os.path.join(source_directory, filename), os.path.join(source_directory, unique_filename))
training_directory = os.path.join(os.getcwd(), "C:/Users/syedm/Desktop/Study/FOUNDplag/Plagiarism-checker-Python/trainingdata")
for process_file in os.listdir(source_directory):
file, extension = os.path.splitext(process_file)
# We create a new text file name by concatenating the .txt extension to file UUID
dest_file_path = file + '.txt'
# extract text from the file
content = textract.process(os.path.join(source_directory, process_file))
# We create and open the new and we prepare to write the Binary Data which is represented by the wb - Write Binary
write_text_file = open(os.path.join(training_directory, dest_file_path), "wb")
# write the content and close the newly created file
write_text_file.write(content)
write_text_file.close()
remove this line where you rename the files:
os.rename(os.path.join(source_directory, filename), os.path.join(source_directory, unique_filename))
that's also not binary, but a uuid instead.
Cheers

How to read file as .dat and write it as a .txt

So I'm making a thing where it reads data from a .dat file and saves it as a list, then it takes that list and writes it to a .txt file (basically a .dat to .txt converter). However, whenever I run it and it makes the file, it is a .txt file but it contains the .dat data. After troubleshooting the variable that is written to the .dat file is normal legible .txt not weird .dat data...
Here is my code (pls don't roast I'm very new I know it sucks and has lots of mistakes just leave me be xD):
#import dependencies
import sys
import pickle
import time
#define constants and get file path
data = []
index = 0
path = input("Absolute file path:\n")
#checks if last character is a space (common in copy+pasting) and removes it if there is a space
if path.endswith(' '):
path = path[:-1]
#load the .dat file into a list names bits
bits = pickle.load(open(path, "rb"))
with open(path, 'rb') as fp:
bits = pickle.load(fp)
#convert the data from bits into a new list called data
while index < len(bits):
print("Decoding....\n")
storage = bits[index]
print("Decoding....\n")
str(storage)
print("Decoding....\n")
data.append(storage)
print("Decoding....\n")
index += 1
print("Decoding....\n")
time.sleep(0.1)
#removes the .dat of the file
split = path[:-4]
#creates the new txt file with _converted.txt added to the end
with open(f"{split}_convert.txt", "wb") as fp:
pickle.dump(data, fp)
#tells the user where the file has been created
close_file = str(split)+"_convert.txt"
print(f"\nA decoded txt file has been created. Run this command to open it: cd {close_file}\n\n")
Quick review; I'm setting a variable named data which contains all of the data from the .dat file, then I want to the save the variable to a .txt file, but whenever I save it to a .txt file it has the contents of the .dat file, even though when I call print(data) it tells me the data in normal, legible text. Thanks for any help.
with open(f"{split}_convert.txt", "wb") as fp:
pickle.dump(data, fp)
When you're opening the file in wb mode, it will automatically write binary data to it. To write plain text to .txt file, use
with open(f"{split}_convert.txt", "w") as fp:
fp.write(data)
Since data is a list, you can't write it straight away as well. You'll need to write each item, using a loop.
with open(f"{split}_convert.txt", "w") as fp:
for line in data:
fp.write(line)
For more details on file writing, check this article as well: https://www.tutorialspoint.com/python3/python_files_io.htm

two pieces of python code creates zip archive one of two is broken

Initially I want to create zip file dynamically and return it in http response. I use python 3.7 lib zipfile.
I tried both io buffer and tmp dir, neither one of them creates valid zip archive. Archive is only opened if its saved on disc
import zipfile
import io
#==============================================
# V1
file_like_object = io.BytesIO()
myZipFile = zipfile.ZipFile(file_like_object, "w", compression=zipfile.ZIP_DEFLATED)
myZipFile.writestr(u'test.py', b'test')
tmparchive="zip1.zip"
out = open(tmparchive,'wb') ## Open temporary file as bytes
out.write(file_like_object.getvalue())
out.close()
r = open(tmparchive, 'rb')
print (r.read())
r.close()
#==============================================
# V2
tmparchive2 = 'zip2.zip'
myZipFile2 = zipfile.ZipFile(tmparchive2, "w", compression=zipfile.ZIP_DEFLATED)
myZipFile2.writestr(u'test.py', b'test')
r2 = open(tmparchive2, 'rb')
print (r2.read())
r2.close()
#====================================================
It's preferable to use a context manager like so:
import zipfile, io
file_like_object = io.BytesIO()
with zipfile.ZipFile(file_like_object, "w", compression=zipfile.ZIP_DEFLATED) as myZipFile:
myZipFile.writestr(u'test.txt', b'test')
# file_like_object.getvalue() are the bytes you send in your http response.
I wrote it to file. It's definitely a valid zip file.
If you want to open the archive, you need to save it to disk. Applications like Explorer and 7-Zip have no way to read the BytesIO object that exists in the python process. They can only open archives saved to disk.
Calling print(r.read()) isn't going to open the archive. It's just going to print the bytes that make up the tiny zip file you just created.

How to register .gz format in shutil.register_archive_format to use same format in shutil.unpack_archive

I have Example.json.gz and I want to unpack it or extract it in python using shutil.unpack_archive()
However it gives error shutil.ReadError: Unknown archive format as '.gz' format is not in the list of default format.
So it has to be register first using shutil.register_archive_format. Can somebody please help me register and unpack (extract it)
You should define a function that knows how to extract a gz file and then register this function. You could use the gzip library, for instance:
import os
import re
import gzip
import shutil
def gunzip_something(gzipped_file_name, work_dir):
"""gunzip the given gzipped file"""
# see warning about filename
filename = os.path.split(gzipped_file_name)[-1]
filename = re.sub(r"\.gz$", "", filename, flags=re.IGNORECASE)
with gzip.open(gzipped_file_name, 'rb') as f_in: # <<========== extraction happens here
with open(os.path.join(work_dir, filename), 'wb') as f_out:
shutil.copyfileobj(f_in, f_out)
try:
shutil.register_unpack_format('gz', ['.gz', ], gunzip_something)
except:
pass
shutil.unpack_archive("Example.json.gz", os.curdir, 'gz')
WARNING: if you extract on the same dir where your gzipped file resides and your file does not have a .gz extension I'm not sure what happens (overwrite?).

Python adding files from URL to file on local file

I am trying to combine two files from the internet, and save the output on my computer. I have the below code,but no made what I try I always get the same result. I get the first URL, and nothing more.
To be exact, I am trying to comebine VideoURL and videoURL1, together into one file called output.mp4...
videoURL= 'http://file-examples.com/wp-content/uploads/2017/04/file_example_MP4_480_1_5MG.mp4'
videoURL1 = 'http://techslides.com/demos/sample-videos/small.mp4'
# print(str(embeddHTMLString).find('sources: ['))
local_filename = videoURL.split('/')[-1]
# NOTE the stream=True parameter
response = urlopen(videoURL)
response1 = urlopen(videoURL1)
with open(local_filename, 'wb') as f:
while True:
chunk = response.read(1024)
if not chunk:
break
f.write(chunk)
with open(local_filename, 'ab+') as d:
while True:
chunk1 = response1.read(1024)
if not chunk1:
break
d.write(chunk1)
You're doing it wrong. The gist of this answer has already been given by #Tempo810, you need to download the files separately and concatenate them into a single file later.
I am assuming you have both video1.mp4 and video2.mp4 downloaded from your urls separately. Now to combine them, you simply cannot use append to concat the files, since video files contains format header and metadata, and combining two media files into one means you need to rewrite new metadata and format header, and remove the old ones.
Instead you can use the the library moviepy to save yourself. Here is a small sample of code how to utilise moviepy's concatenate_videoclips() to concat the files:
from moviepy.editor import VideoFileClip, concatenate_videoclips
# opening the clips
clip1 = VideoFileClip("video1.mp4")
clip3 = VideoFileClip("video2.mp4")
# lets concat them up
final_clip = concatenate_videoclips([clip1,clip2])
final_clip.write_videofile("output.mp4")
Your resulting combined file is output.mp4. Thats it!

Resources