Computing the MD5 hash of an integer in python 3? - python-3.x

I need to compute a hash of an integer using python 3. Is there a cleaner and more efficient solution than the following?
>>> import hashlib
>>> N = 123
>>> hashlib.md5(str(N).encode("ascii")).hexdigest()
'202cb962ac59075b964b07152d234b70'
It seems weird to have to convert to a unicode string, then decode it to a byte array.

A cryptographic hash such as MD5 can only be applied to bytes. There are more efficient ways of encoding a number as bytes, but you still need to follow the contract.
>>> hashlib.md5(int(-123).to_bytes(8, 'big', signed=True)).hexdigest()
'fc1063e1bcb35f0d52cdceae4626c39b'

Ignacio's answer is perfect, but in case you need the code to work with both python 2 and python 3, and if you have NumPy installed, then this works great:
>>> import numpy as np
>>> import hashlib.md5
>>> N = 123
>>> hashlib.md5(np.int64(N)).hexdigest()
'f18b8dbefe02a0efce281deb55a209cd'

Related

How to get utf-8 bytes in of emoji?

How would I get the actual character points for an emoji in python3?
>>> '😋'
'😋'
>>> ?
\xF0\x9F\x98\x8B
And then, vice-versa, how would I print the emoji from code points?
>>> print ('\xF0\x9F\x98\x8B')
'😋'
This was the behavior in python2.7 but not in 3 so curious how to do it here.
Python 2.7.18 (default, Nov 13 2021, 06:17:34)
>>> '😋'
'\xf0\x9f\x98\x8b'
>>> print('\xf0\x9f\x98\x8b')
😋
You can use the string/bytes default decode / encode methods:
>>> '😀'.encode('utf-8')
b'\xf0\x9f\x98\x80'
>>> b'\xf0\x9f\x98\x80'.decode('utf-8')
'😀'

Different precision values with brackets and without brackets while multiplication numbers in python

While performing multiplication of three numbers with and without braces in python shows different answers.
>>>from math import sqrt
>>> a=100000000 #my input
>>> b=sqrt(3)/4
0.4330127018922193
>>> c=b*a*a #print with format specifier
>>> print("{:.2f}".format(c))
4330127018922193.50
>>> d=b*(a*a)
>>> print("{:.2f}".format(d))
4330127018922193.00
Can someone please explain me why the precision changes with brackets?

How to get number of rows in sas dataset in Python

Is there any way to get the no of rows in sas dataset("xxxx.sas7bdat") without actually reading the dataset in python. The reason for not reading the sas dataset is, it is huge.
You might be able to do this by simply counting the rows in the file by using the wc -l shell command and reading it into a variable:
import os
>>> import os
>>> stream = os.popen('wc -l example.txt')
>>> output = stream.read()
>>> output
' 3 example.txt\n'
You could further tokenize the output to get number of rows as a variable:
>>> output.split()
['3', 'example.txt']
>>> output.split()[0]
'3'
>>> int(output.split()[0])
3
Hope this helps.

Turn numeric text string with powers of ten nomenclator (e+) into float in python pandas

I've got a dataframe with more than 30000 rows and almost 40 columns exported from a csv file.
The most part of it mixes str with int features.
-integers are int
-floats and powers of ten are str
It looks like this:
Id A B
1 2.5220019e+008 1742087
2 1.7766118e+008 2223964.5
3 3.3750285e+008 2705867.8
4 97782360 2.5220019e+008
I've tried the following code:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point, LineString, shape
df = pd.read_csv('mycsvfile.csv').astype(float)
Which yields the this error message:
ValueError: could not convert string to float: '-1.#IND'
I guess that it has to do about the exponencial nomenclator of powers of ten (e+) that the python libraries isn't able to transform.
Is there a way to fix it?
From my conversation with QuangHoang I should apply the function:
pd.to_numeric(df['column'], errors='coerce')
Since almost the whole DataFrame are str objects, I ran the following code line:
df2 = df.apply(lambda x : pd.to_numeric(x, errors='coerce'))

How to identify error with scipy.stats.chisquare returns negative values?

I am using spyder 3.1.3 with python 3.6.8 under window 10, having scipy 1.2.1. I want to get the chisquare value but notice there is negative values returned. Why is that?
from scipy.stats import chisquare
chisquare(f_obs=[2,1], f_exp=[100000,1])
#Power_divergenceResult(statistic=14096.65412, pvalue=0.0)
but
chisquare(f_obs=[2,1], f_exp=[1000000,1])
#Power_divergenceResult(statistic=-731.379964, pvalue=1.0)
Is there an upperbound for expect values in chisquare? Thanks.
On Windows, the default integer type for numpy arrays is 32 bit. I can reproduce the problem by passing numpy arrays with dtype np.int32 to chisquare:
In [5]: chisquare(f_obs=np.array([2,1], dtype=np.int32), f_exp=np.array([1000000,1], dtype=np.int32))
Out[5]: Power_divergenceResult(statistic=-731.379964, pvalue=1.0)
This is a bug. I created an issue for this on the SciPy github site: https://github.com/scipy/scipy/issues/10159
To work around the problem, convert the input arguments to arrays with data type numpy.int64 or numpy.float64:
In [6]: chisquare(f_obs=np.array([2,1], dtype=np.int64), f_exp=np.array([1000000,1], dtype=np.int64))
Out[6]: Power_divergenceResult(statistic=999996.000004, pvalue=0.0)
In [7]: chisquare(f_obs=np.array([2,1], dtype=np.float64), f_exp=np.array([1000000,1], dtype=np.float64))
Out[7]: Power_divergenceResult(statistic=999996.000004, pvalue=0.0)

Resources