Although i am new to python i just want to compare two .ico extension files.
Anyone with the expertise can tell how can i do that?
Is there any package or library readily available in python to do so ?
Thanks for reading the question. Your suggestions will be appreciated.
What I am currently doing is as follows but It is not giving me what i expect :
import cv2
import numpy as np
Original = cv2.imread("1.ico")
Edited = cv2.imread("chrome.ico")
diff = cv2.subtract(Original, Edited)
cv2.imwrite("diff.jpg", diff)
If you just want to check if files have changes, you can use hashlib of python to get it. The below code finds hash:
import hashlib
h = hashlib.md5()
with open('ico_file.ico', 'rb') as f:
buffer = f.read()
h.update(buffer)
print(buffer) # May not be needed
print(h.hexdigest())
Use the above code for the two files you want to compare and then match their output hash. If it's the same, then files are very likely to be same. If different, then they are definitely different.
Related
I am trying to figure out how to print in the terminal specifically as follows:
and then write it to a .txt file.
My Attempt:
I have been using "tabulate" for a while, so my natural go to attempt was to use the library. Here is my code:
from tabulate import tabulate
import numpy as np
lista = np.zeros(4)
print(tabulate([lista, lista], headers=['n', r'$\phi_{n}$', 'a_{n}', 'e_{n}'],
numalign="center"))
with open('table.txt', 'w') as f:
f.write(tabulate([lista, lista], headers=['n', r'$\phi_{n}$', 'a_{n}', 'e_{n}'], numalign="center"))
The above code generates the following result:
Which is nice, but not what i want. I tried to delete the 'headers' parameter, but it still gives me a table containing a header like structure. Furthermore, it does not contain the '&' character I need nor the '\' thing. I suspect i might need to do it manually somehow.
Thanks in advance, Lucas
I have a .gds file. How can I read that file with pandas and do some analysis? What is the best way to do that in Python? The file can be downloaded here.
you need to change the encoding and read the data using latin1
import pandas as pd
df = pd.read_csv('example.gds',header=27,encoding='latin1')
will get you the data file, also you need to skip the first 27 rows of data for the real pandas meat of the file.
The gdspy package comes handy for such applications. For example:
import numpy
import gdspy
gdsii = gdspy.GdsLibrary(infile="filename.gds")
main_cell = gdsii.top_level()[0] # Assume a single top level cell
points = main_cell.polygons[0].polygons[0]
for p in points:
print("Points: {}".format(p))
I have a binary file (.man), containing data that I want to read, using python 3.7. The idea is to convert this binary file into a txt or a csv file.
I know the total number of values in the binary file but not the number of bytes per value.
I have red many post talking about binary file but none was helpful...
Thank you in advance,
Simply put, yes.
with open('file.man', 'rb') as f:
data = f.readlines()
print(data) # binary values represented as string
Opening a file with the optimal parameter 'rb' means that it will read a binary file and translate it to ASCII (abstracted) for you.
The solution I found is that:
import struct
import numpy as np
data =[]
with open('binary_file', "rb") as f:
while len(data)<length(binary_file):
data.extend([struct.unpack('f',f.read(4))])
Of course, this works because I know that the encoding is simple precision.
I have a large csv file containing some text in Russian language. When I upload it to Azure ML Studio as dataset, it appears like "����". What I can do to fix that problem?
I tried changing encoding of my text to UTF8, KOI8-R.
There is no code, but I can share part of the dataset for you to try.
One workaround may be zipping your csv and reading it using python module. Your python script in this case should look something like :
# coding: utf-8
# The script MUST contain a function named azureml_main
# which is the entry point for this module.
# imports up here can be used to
import pandas as pd
# The entry point function can contain up to two input arguments:
# Param<dataframe1>: a pandas.DataFrame
# Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):
russian_ds = pd.read_csv('./Script Bundle/your_russian_dataset.csv', encoding = 'utf-8')
# your logic goes here
return russian_ds
It worked for with french datasets so hopefully you will find it useful
I'm having some trouble identifying why the output doesn't match the input of the PDF when pulling the text. And if there are any tricks I could do to fix this as it's not an isolated issue.
with open(file, 'rb') as f:
binary = PyPDF2.pdf.PdfFileReader(f)
text = binary.getPage(x).extractText()
print(text)
file: "I/O filters, 292–293"
output: "I/O Þlters, 292Ð293"
The Ð seems to represent all instances of '-' and Þ seems to be used for all instances of "fi".
I am using Windows CMD as my output for testing and I do know some characters don't show up right, but that leaves me baffled for something like the 'fi'
The text extraction of PyPDF2 was massively improved in versions 2.x. The whole project moved to pypdf.
I recommend you give it another try: https://pypdf.readthedocs.io/en/latest/user/extract-text.html
from pypdf import PdfReader
reader = PdfReader("example.pdf")
page = reader.pages[0]
print(page.extract_text())