Python vs BigQuery FarmHash Sometimes Do Not Equal

Python vs BigQuery FarmHash Sometimes Do Not Equal - python-3.x

In BigQuery when i run
select farm_fingerprint('6823339101') as f
resuls in
-889610237538610470
In Python
#pip install pyfarmhash
import farmhash
print(farmhash.hash64('6823339101'))
results in
17557133836170941146
BigQuery & Python do agree on most inputs, but there are specific ones like the one above where there is a mismatch for the same input
'6823339101'
How can I get bigquery & python to agree 100% of the time?
Links to bigquery & python hash documentation
https://pypi.org/project/pyfarmhash/
https://cloud.google.com/bigquery/docs/reference/standard-sql/hash_functions

As mentioned in the comments, the function is returning an unsigned int.
So we need to convert that as follows;
import numpy as np
np.uint64(farmhash.fingerprint64(x)).astype('int64')
Relevant issues: https://github.com/lovell/farmhash/issues/26#issuecomment-524581600
Results:
>>> import farmhash
>>> import numpy as np
>>> np.uint64(farmhash.fingerprint64('6823339101')).astype('int64')
-889610237538610470

Quickly scanning over the documentation that you have linked and pyfarmhash source:
The docs for farm_fingerprint read:
Computes the fingerprint of the STRING or BYTES input using the Fingerprint64 function
But in your python code, you are using the hash64 function, which according to the pyfarmhash source code uses a different function from the farmhash library than fingerprint does
Solution:
Use the same function farm_fingerprint is using
import farmhash
print(farmhash.fingerprint64('6823339101'))

Related

Calculating Log base c in python3

Is there a way to calculate log base c in python?
c is a variable and may change due to some dependencies.
I am new to programming and also python3.

There is already a built in function in the math module in python that does this.
from math import log
def logOc(c, num):
return log(num,c)
print(log(3,3**24))
You can read more about log and the python math module here

Yes, you can simply use math's function log():
import math
c = 100
val = math.log(10000,c) #Where the first value is the number and the second the base.
print(val)
Example:
print(val)
2.0

Reading MATLAB data file (.mat) in python

I have an array of complex numbers in Matlab and I want to import that data in Python. I have tried all methods including Scipy module and h5py etc. Can anyone tell me any other possible way?
My Matlab version is 2017b. and python version is 2.7.

In MATLAB, save your data with the '-v7' option:
myMat = complex(rand(4), rand(4));
save('myfile', 'myMat', '-v7')
In Python, load the .mat file with scipy.io.loadmat. The result is a Python dict:
>>> d = scipy.io.loadmat('myfile.mat')
>>> m = d['myMat']
>>> m[0,0]
'(0.421761282626275+0.27692298496088996j)'
and so on.

Python equivalent of Pulse Integration (pulsint) MATLAB function?

I am surfing online to find Python equivalent of function pulsint of MATLAB , and till now I haven't found anything significantly close to it.
Any heads up in this regards will be really helpful!

You can easily create your own pulsint function.
The pulsint formula:
You need the numpy library to keep the things simple
import numpy as np
import matplotlib as mpl
# Non coherent integration
def pulsint(x):
return np.sqrt(np.sum(np.power(np.absolute(x),2),0))
npulse = 10;
# Random data (100x10 vector)
x = np.matlib.repmat(np.sin(2*np.pi*np.arange(0,100)/100),npulse,1)+0.1*np.random.randn(npulse,100)
# Plot the result
mpl.pyplot.plot(pulsint(x))
mpl.pyplot.ylabel('Magnitude')

Program using ast.literal_eval is too slow

I tried to transform strings into list using ast.literal_eval function for a column in a CSV file. The string is something like this '['abbb','cddd','cdcdc']'. For some reason this is a string instead of list, I tried to use ast.literal_eval to transform it into a list with components 'abbb','cddd' and 'cdcdc'. The problem is the execution is to slow (there are 1326101 rows to execute). The code I use is this:
import pandas as pd
import ast
import sys
user_dataset = pd.read_csv('user.csv')
for x in range(len(user_dataset['friends'])):
if user_dataset['friends'][x]!=[]:
"""Covert string to list"""
user_dataset['friends'][x] = ast.literal_eval(user_dataset['friends'][x])
Thanks a lot!

Python: Alternative to binascii b-function for predefined string

I want to use the hashlib function which requires byte-representation of strings. In this example from the Python documentation they solve this by putting a 'b' in front of the string:
>>> import hashlib, binascii
>>> dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000)
This only seems to work when the string is defined in the function call. I would like to use predefined strings but I cannot seem to use the b-function. I would like to do something like:
>>> import hashlib, binascii
>>> mystr = 'password'
>>> dk = hashlib.pbkdf2_hmac('sha256', b(mystr), b'salt', 100000)
Or
>>> dk = hashlib.pbkdf2_hmac('sha256', b mystr, b'salt', 100000)
Obviously, non of these worked. I researched and found some more complex solutions, but I wonder if there is any solution for predefined strings that is as smooth as for strings defined directly in the function.
Thanks!

You can use bytes(my_string) or bytes(my_string, encoding) to convert a string to bytes. No need for the binascii module.
Documentation can be found here: https://docs.python.org/3/library/functions.html#bytes

So what did the trick was
bytes(mystr, 'utf8')

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python vs BigQuery FarmHash Sometimes Do Not Equal - python-3.x

Related

Calculating Log base c in python3

Reading MATLAB data file (.mat) in python

Python equivalent of Pulse Integration (pulsint) MATLAB function?

Program using ast.literal_eval is too slow

Python: Alternative to binascii b-function for predefined string

Categories

Resources