whether keep previous format after convert the image to webp - jpeg

my question is not about how to convert other formats to webp, instead what i want to ask is should i keep the original file with its original format and store the new generated file with .webp format in somewhere else or should i delete the old format instead.
my code that converts the format is this:
[$name,] = explode('.',$this->getAddress());
Image::make($this->getRawFile())->encode('webp', 90)
->save(self::PATH . "/" . $this->getFolderName() . "/" . $this->getDateYear() . "/" . $this->getDateMonth() .'/'.$name . '.webp');
return $this;

Related

What is the appropriate way to take in files that have a filename with a timestamp in it?

What is the appropriate way to take in files that have a filename with a timestamp in it and read properly?
One way I'm thinking of so far is to take these filenames into one single text file to read all at once.
For example, filenames such as
1573449076_1570501819_file1.txt
1573449076_1570501819_file2.txt
1573449076_1570501819_file3.txt
Go into a file named filenames.txt
Then something like
with open('/Documents/filenames.txt', 'r') as f:
for item in f:
if item.is_file():
file_stat = os.stat(item)
item = item.replace('\n', '')
print("Fetching {}".format(convert_times(file_stat)))
My question is how would I go about this where I can properly read the names in the text file given that they have timestamps in the actual names? Once figuring that out I can convert them.
If you just want to get the timestamps from the file names, assuming that they all use the same naming convention, you can do so like this:
import glob
import os
from datetime import datetime
# Grab all .txt files in the specified directory
files = glob.glob("<path_to_dir>/*.txt")
for file in files:
file = os.path.basename(file)
# Check that it contains an underscore
if not '_' in file:
continue
# Split the file name using the underscore as the delimiter
stamps = file.split('_')
# Convert the epoch to a legible string
start = datetime.fromtimestamp(int(stamps[0])).strftime("%c")
end = datetime.fromtimestamp(int(stamps[1])).strftime("%c")
# Consume the data
print(f"{start} - {end}")
...
You'll want to add some error checking and handling; for instance, if the first or second index in the stamps array isn't a parsable int, this will fail.

How to convert written file's string elements to integer in Python

I have created a file and written some random numbers by using file.write() . As i have to write these numbers into string format ex. file.write(str(scerets.randbelow(100)) + "\n"), now I want to read this file again and to save those string numbers into an integer list. How can I do that?
Assuming your write your numbers with
file.write(str(scerets.randbelow(100)) + "\n")
And there's nothing else in that file, this should work to read it back:
numbers = [int(line) for line in file]
This assumes you've already opened the file in a "read mode" with something like this:
with open('yourfile.txt') as file:
numbers = [int(line) for line in file]

How can I decode a .bin into a .pdf

I extracted an embedded object from an excel spreadsheet that was a pdf but the excel zip file saves embedded objects as binary files.
I am trying to read the binary file and return it to it's original format as a pdf. I took some code from another question with a similar issue but when i try opening the pdf adobe gives error "can't open because file is damaged...not decoded correctly.."
Does anyone know of a way to do this?
with open('oleObject1.bin','rb') as f:
binaryData = f.read()
print(binaryData)
with open(os.path.expanduser('test1.pdf'), 'wb') as fout:
fout.write(base64.decodebytes(binaryData))
Link to the object file on github
Thanks Ryan, I was able to see what you are talking about. Here is solution for future reference.
str1 = b'%PDF-' # Begin PDF
str2 = b'%%EOF' # End PDF
with open('oleObject1.bin', 'rb') as f:
binary_data = f.read()
print(binary_data)
# Convert BYTE to BYTEARRAY
binary_byte_array = bytearray(binary_data)
# Find where PDF begins
result1 = binary_byte_array.find(str1)
print(result1)
# Remove all characters before PDF begins
del binary_byte_array[:result1]
print(binary_byte_array)
# Find where PDF ends
result2 = binary_byte_array.find(str2)
print(result2)
# Subtract the length of the array from the position of where PDF ends (add 5 for %%OEF characters)
# and delete that many characters from end of array
print(len(binary_byte_array))
to_remove = len(binary_byte_array) - (result2 + 5)
print(to_remove)
del binary_byte_array[-to_remove:]
print(binary_byte_array)
with open(os.path.expanduser('test1.pdf'), 'wb') as fout:
fout.write(binary_byte_array)
The bin file contains a valid PDF. There is no decoding required. The bin file though does have bytes before and after the PDF that need to be trimmed.
To get the first byte look for the first occurrence of string %PDF-
To get the final byte look for the last %%EOF.
Note, I do not know what "format" the leading/trailing bytes are, that are added by Excel. The solution above obliviously would not work if either of the ascii strings above could also be in the leading/trailing data.
You should try using a python library that allows you to write pdf files like reportlab or pyPDF

How to extract a PDF's text using pdfrw

Can pdfrw extract the text out of a document?
I was thinking something along the lines of
from pdfrw import PdfReader
doc = PdfReader(pdf_path)
page_texts = []
for page_nr in doc.numPages:
page_texts.append(doc.getPage(page_nr).parse_page()) # ..or something
In the docs the explain how to extract the text. However, it's just a bytestream. You could iterate over the pages and decode them individually.
from pdfrw import PdfReader
doc = PdfReader(pdf_path)
for page in doc.pages:
bytestream = page.Contents.stream # This is a string with bytes, Not a bytestring
string = #somehow decode bytestream. Maybe using zlib.decompress
# do something with that text
Edit:
May be worth nothing that pdfrw does not yet support text decompression due to its complexity according to the author.
Depends on which filters are applied to the page.Contents.stream. If it is only FlateDecode you can use pdfrw.uncompress.uncompress([page.Contents]) to decode it.
Note: Give the whole Contents object in a list to the function
Note: This is not the same as pdfrw.PdfReader.uncompress()
And then you have to parse the string to find your text. It will be be in blocks of lines between BT (begin text) and ET (end text) markers on lines ending in either 'TJ' or 'Tj' inside round brackets.
Here's an example that may be useful:
for pg_num in range(number_of_pages):
pg_obj = pdfreader.getPage(pg_num)
print(pg_num)
if re.search(r'CSE', pg_obj.extractText()):
cse_count+= 1
pdfwriter.addPage(pg_obj)
Here extractText() would extract the text of the page containing the keyword CSE

I want to export the name of multiple matlab files into a .csv

I want to use a for loop to export the names of a folder of .mat files into one .csv file by row.
I am trying to use xlswrite, but don't understand how to access the filename of the .mat files.
I am working with to write them to the csv.
xlswrite(fullfile(dest_dir,'result.csv'), FILENAME HERE, ['A' num2str(i)]);
xlswrite creates a excel workbook
With dir command you can get a structure whose one of field is name of file.
Using simple file IO function your requirement can be met.
I think either of this will do what you need:
fid=fopen('result.csv','wt');
files=dir('*.mat');
x={files(:).name};
csvFun = #(str)sprintf('%s,',str);
xchar = cellfun(csvFun, x,'UniformOutput', false);
xchar = strcat(xchar{:});
xchar = strcat(xchar(1:end-1),'\n');
fprintf(fid,xchar);
fclose(fid);
Or
If you just need .mat names in a column form:
fid=fopen('result.csv','wt');
files=dir('*.mat');
x={files(:).name};
[rows,cols]=size(x);
for i=1:rows
fprintf(fid,'%s,',x{i,1:end-1});
fprintf(fid,'%s\n',x{i,end});
end
fclose(fid);
To get your filename, load the files using dir command, read them individually in your for loop, and then write them to file. (I think I already posted something like this?) Here is the prototype:
files = dir('*.mat'); % obtain all files with .mat extenstion
fid = fopen('result.csv','a');
for k = 1:length(data)
filename = files(k).name; % get the filename
fprintf(fid, '%s,\n', filename);
end
fid = fclose(fid);
Unfortunately, you cannot just save the file names as strings in a cell and write it as you might a matrix using say csvwrite, so you can just do this.

Resources