MD5 Hash, Python 3 . How to Generate In Python - python-3.x

I need advice on how to get the md5 hash for a zip file. I will be constantly downloading files from an ftp using ftplib. As you know ftplib cannot tell if a file has been modified or not.
I want to use the md5 hash of each new file to tell if it has been modified or not by simply comparing the hashes after downloading the new file to tempdir. If the hashes are similar, I remove newly downloaded file. However, if hashes are different, newly downloaded file is kept, old hash is replaced with new hash and the script continues.
Please advice on how to achieve this. Are there any standalone modules for hashing md5 or similar.
Thanks.``

hope this is helpful
import hashlib
m=hashlib.md5();
m.update(open('yourzipfile.zip').read());
a=m.hexdigest()
print (a);
output
sh-4.3$ python3 1.py
f5c6a076bd116efbd4b1ce03c96eaf7a

Very simply, in python 3.8+, I use to keep the code as quick and compact as possible.
import hashlib
file_hash = hashlib.md5(open(old_file_path,'rb').read()).hexdigest()
print(file_hash)

Related

Not able to find number of pages of PDF using Python 3.X: DependencyError: PyCryptodome is required for AES algorithm

I am performing data validation on files that I download from a url. One of those validation checks involves checking the number of pages of a PDF. Using PyPDF2 package and PdfFileReader module, this worked until I encountered a PDF with 256-bit AES encryption that has a permissions password but no document open password. I have no access to any passwords since these files are from manufacturer websites so I concluded that for now I can just check to see if the PDF is encrypted, and if it is, skip it for now, but regardless if I try to retrieve the page count or check if the PDF is encrypted, I get this error:
DependencyError: PyCryptodome is required for AES algorithm
This error occurs at line 6, the if statement.
This is despite having pycryptodome installed and the AES module imported. Also, I am using Jupyter Notebook. Here is my code:
! pip install PyPDF2
! pip install pycryptodome
from PyPDF2 import PdfFileReader
from Crypto.Cipher import AES
if PdfFileReader('Media Downloaded Files/spk-10-3144 bro.pdf').isEncrypted:
print('This file is encrypted.')
else:
print(PdfFileReader('Media Downloaded Files/spk-10-3144-bro.pdf').numPages)
Solution:
! pip install pikepdf
from pikepdf import Pdf
pdf = Pdf.open('Media Downloaded Files/spk-10-3144-bro.pdf')
len(pdf.pages)
I had a problem using PyPDF3 (it's a fork from PyPDF2) involving encryptation. I solved replacing it for pikepdf. It has more encryption algorithms implementations. Try it out!

How to Generate and Verify Checksum of a file without downloading the file in Node JS

Is there a way to generate a checksum of an mp3 file without downloading it?
I am using the crypto library, but it seems that I have to download the file.
Since a "checksum" is a value derived from a collection of data, you'd need a copy of those data to be able to generate the checksum.

Read file metadata along with data in Python

I am using Python 3.7 along with a library for AES CBC on Windows 10 to encrypt files and it works perfectly. Except, after decrypting them, they lose their metadata like the date they were created. Because I want the user to feel like they never 'deleted' or 'lost' the original file, I need to preserve that data.
This is what I'm doing to read the data:
f = open(file_name, "rb")
data = f.read()
f.close()
After I encrypt the data, I write the encrypted bytes into a new file. When I decrypt this new file, I would like the metadata to be preserved so that the file (like an image) is exactly like it was before encryption. (P.S. I don't know if this will help but overwriting the new data on the original file might help but I want to try and avoid this if possible)
How do I include metadata of the file in a variable WITH the data that I am encrypting, such that when I decrypt, I get the exact same file with the same date created etc.?
EDIT:
I found a way to get the file creation time but I STILL NEED to get all the metadata as the file can be in any format: for example an image or a video, or a doc file's author. I also want to store this in the decrypted file which I don't know how to.
os.path.getctime(file_name)

unable to make script work with context menu or calculate hash

I want to make a python script which can get the sha256 hash of a file and check it with virus total. I can't find out how to make it work with file context menus and also can't find the right hash command
here's what I have done so far
import hashlib
import webbrowser
hash = "hash" #calculate the hash
webbrowser.open(f"https://www.virustotal.com/gui/file/{hash}/detection") #check with virustotal

reading a big xls file into R

I have an excel file with ~10000 rows and ~250 columns, currently I am using RODBC to do the importing:
channel <- odbcConnectExcel(xls.file="s:/demo.xls")
demo <- sqlFetch(channel,"Sheet_1")
odbcClose(channel)
But this way is a bit slow (I need a minute or two to import them), and the excel is originally encrypted, I need to remove the password to work on it, which is something that I prefer not to, I wonder if there is any better way (i.e. import faster, and capable of importing encrypted excel files)
Thanks.
I recommend to try using the XLConnect package instead of RODBC.
http://cran.r-project.org/web/packages/XLConnect/index.html

Resources