Python: Alternative to binascii b-function for predefined string - python-3.x

I want to use the hashlib function which requires byte-representation of strings. In this example from the Python documentation they solve this by putting a 'b' in front of the string:
>>> import hashlib, binascii
>>> dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000)
This only seems to work when the string is defined in the function call. I would like to use predefined strings but I cannot seem to use the b-function. I would like to do something like:
>>> import hashlib, binascii
>>> mystr = 'password'
>>> dk = hashlib.pbkdf2_hmac('sha256', b(mystr), b'salt', 100000)
Or
>>> dk = hashlib.pbkdf2_hmac('sha256', b mystr, b'salt', 100000)
Obviously, non of these worked. I researched and found some more complex solutions, but I wonder if there is any solution for predefined strings that is as smooth as for strings defined directly in the function.
Thanks!

You can use bytes(my_string) or bytes(my_string, encoding) to convert a string to bytes. No need for the binascii module.
Documentation can be found here: https://docs.python.org/3/library/functions.html#bytes

So what did the trick was
bytes(mystr, 'utf8')

Related

ipython: print numbers with thousands separator

I am using ipython 5.8.0 on Debian 10.
This is how output looks like:
In [1]: 50*50
Out[1]: 2500
Is it possible to configure ipython to print all numbers with thousands separators? ie:
In [1]: 50*50
Out[1]: 2'500
In [2]: 5000*5000
Out[2]: 25'000'000
And perhaps, is it possible to make ipython also understand thousands separators on input?
In [1]: 5'000*5'000
Out[1]: 25'000'000
UPDATE
The accepted answer from #Chayim Friedman works for integers, but does not work for float:
In [1]: 500.1*500
Out[1]: 250050.0
Also, when it works, it uses , as the character for thousand separator:
In [1]: 500*500
Out[1]: 250,000
Can I use ' instead?
Using ' as thousands separator in input is quite problematic because Python uses ' to delimit strings, but you can use _ (PEP 515, Underscores in Numeric Literals):
Regarding output, this is slightly harder, but can be done using IPython extensions.
Put the following Python code in a new file at ~/.ipython/extensions/thousands_separator.py:
default_int_printer = None
def print_int(number, printer, cycle):
printer.text(f'{number:,}') # You can use `'{:,}'.format(number)` if you're using a Python version older than 3.6
def load_ipython_extension(ipython):
global default_int_printer
default_int_printer = ipython.display_formatter.formatters['text/plain'].for_type(int, print_int)
def unload_ipython_extension(ipython):
ipython.display_formatter.formatters['text/plain'].for_type(int, default_int_printer)
This code tells IPython to replace the default int formatter with one that prints thousand separators when this extension is loaded, and restore the original when it is unloaded.
Edit: If you want a different separator, for instance ', replace the f'{number:,}' with f'{number:,}'.replace(',', "'").
You can load the extension using the magic command %load_ext thousands_separator and unload it using %unload_ext thousands_separator, but if you want it always, you can place it in the default profile.
Run the following code in the terminal:
ipython3 profile create
It will report that a file ~/.ipython/profile_default/ipython_config.py was created. Enter it, and search for the following string:
## A list of dotted module names of IPython extensions to load.
#c.InteractiveShellApp.extensions = []
Replace it with the following:
# A list of dotted module names of IPython extensions to load.
c.InteractiveShellApp.extensions = [
'thousands_separator'
]
This tells IPython to load this extension by default.
Done!
Edit: I saw that you want to a) use ' as separator, and b) do the same for floats:
Using different separator is quite easy: just str.replace():
def print_int(number, printer, cycle):
printer.text(f'{number:,}'.replace(',', "'"))
Doing the same for floats is also easy: just setup print_int so it prints floats to. I also suggest to change the name to print_number.
Final code:
default_int_printer = None
default_float_printer = None
def print_number(number, printer, cycle):
printer.text(f'{number:,}'.replace(',', "'"))
def load_ipython_extension(ipython):
global default_int_printer
global default_float_printer
default_int_printer = ipython.display_formatter.formatters['text/plain'].for_type(int, print_number)
default_float_printer = ipython.display_formatter.formatters['text/plain'].for_type(float, print_number)
def unload_ipython_extension(ipython):
ipython.display_formatter.formatters['text/plain'].for_type(int, default_int_printer)
ipython.display_formatter.formatters['text/plain'].for_type(float, default_float_printer)
After update: you can subclass int:
class Int(int):
def __repr__(self):
return "{:,}".format(self)
Int(1000)
# 1,000
I don't believe you can achieve all that you are looking for without rewriting the iPython interpreter, which means changing the Python language specification, to be able to input numbers with embedded ' characters and have them ignored. But you can achieve some of it. Subclassing the int class is a good start. But you should also overload the various operators you plan on using. For example:
class Integer(int):
def __str__(self):
# if you want ' as the separator:
return "{:,}".format(self).replace(",", "'")
def __add__(self, x):
return Integer(int(self) + x)
def __mul__(self, x):
return Integer(int(self) * x)
"""
define other operations: __sub__, __floordiv__, __mod__, __neg__, etc.
"""
i1 = Integer(2)
i2 = Integer(1000) + 4.5 * i1
print(i2)
print(i1 * (3 + i2))
Prints:
1'009
2'024
Update
It seems that for Python 3.7 you need to override the __str__ method rather than the __repr__ method. This works for Python 3.8 and should work for later releases as well.
Update 2
import locale
#locale.setlocale(locale.LC_ALL, '') # probably not required
print(locale.format_string("%d", 1255000, grouping=True).replace(",", "'"))
Prints:
1'255'000
An alternative if you have package Babel from the PyPi repository:
from babel import Locale
from babel.numbers import format_number
locale = Locale('en', 'US')
locale.number_symbols['group'] = "'"
print(format_number(1255000, locale='en_US'))
Prints:
1'255'000
Or if you prefer to custom-tailor a locale just for this purpose and leave the standard en_US locale unmodified. This also shows how you can parse input values:
from copy import deepcopy
from babel import Locale
from babel.numbers import format_number, parse_number
my_locale = deepcopy(Locale('en', 'US'))
my_locale.number_symbols['group'] = "'"
print(format_number(1255000, locale=my_locale))
print(parse_number("1'125'000", locale=my_locale))
Prints:
1'255'000
1125000
Based on PEP-0378, you can use the following code:
a = 1200
b = 500
c = 10
#res = a
#res = a*b
res = a*b*c
dig = len(str(res)) # to figure out how many digits are required in result
print(format(res, "{},d".format(dig)))
It will produce:
6,000,000

Python vs BigQuery FarmHash Sometimes Do Not Equal

In BigQuery when i run
select farm_fingerprint('6823339101') as f
resuls in
-889610237538610470
In Python
#pip install pyfarmhash
import farmhash
print(farmhash.hash64('6823339101'))
results in
17557133836170941146
BigQuery & Python do agree on most inputs, but there are specific ones like the one above where there is a mismatch for the same input
'6823339101'
How can I get bigquery & python to agree 100% of the time?
Links to bigquery & python hash documentation
https://pypi.org/project/pyfarmhash/
https://cloud.google.com/bigquery/docs/reference/standard-sql/hash_functions
As mentioned in the comments, the function is returning an unsigned int.
So we need to convert that as follows;
import numpy as np
np.uint64(farmhash.fingerprint64(x)).astype('int64')
Relevant issues: https://github.com/lovell/farmhash/issues/26#issuecomment-524581600
Results:
>>> import farmhash
>>> import numpy as np
>>> np.uint64(farmhash.fingerprint64('6823339101')).astype('int64')
-889610237538610470
Quickly scanning over the documentation that you have linked and pyfarmhash source:
The docs for farm_fingerprint read:
Computes the fingerprint of the STRING or BYTES input using the Fingerprint64 function
But in your python code, you are using the hash64 function, which according to the pyfarmhash source code uses a different function from the farmhash library than fingerprint does
Solution:
Use the same function farm_fingerprint is using
import farmhash
print(farmhash.fingerprint64('6823339101'))

Calculating Log base c in python3

Is there a way to calculate log base c in python?
c is a variable and may change due to some dependencies.
I am new to programming and also python3.
There is already a built in function in the math module in python that does this.
from math import log
def logOc(c, num):
return log(num,c)
print(log(3,3**24))
You can read more about log and the python math module here
Yes, you can simply use math's function log():
import math
c = 100
val = math.log(10000,c) #Where the first value is the number and the second the base.
print(val)
Example:
print(val)
2.0

covert ascii to decimal python

I have a data pandas DataFrame, where one of the columns is filled with ascii characters. I'm trying to convert this column from ascii to decimal, where, for example, the following string should be converted from in Hex:
313533313936393239382e323834303638
to:
1531969298.284068
I've tried
outf['data'] = outf['data'].map`(`lambda x: bytearray.fromhex(x).decode())
as well as
outf['data'] = outf['data'].map(lambda x: ascii.fromhex(x).decode())
The error that I get is as follows:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 8: invalid start byte
I'm not sure where the problem manifests itself. I have a txt file and a sample of its contents are as follows:
data time
313533313936393239382e32373737343800 1531969299.283273000
313533313936393239382e32373838303400 1531969299.284253000
313533313936393239382e32373938353700 1531969299.285359000
When the data was normal integers the the lambda would work fine where I used:
outf['data'] = outf['data'].astype(str)
outf['data'] = outf['data'].str[:-2:]
outf['data'] = outf['data'].map(lambda x: bytearray.fromhex(x).decode())
outf['data'] = outf['data'].astype(int)
, however now it says there's something wrong with the encoding.
I've looked on Stackoverflow, but perhaps I wasn't able to find something similar.
However, it hasn't worked. If someone where to help me out, I would very much appreciate it.
You can use map with a lambda function for bytearray.fromhex and astype to float.
out['data'].map(lambda x: bytearray.fromhex(x).decode()).astype(float)
Such lambda would do the trick:
>>> f = lambda v: float((bytearray.fromhex(v)))
>>> f('313533313936393239382e323834303638')
1531969298.284068
Note that the use of numpy's astype hinted by Scott Boston in the comment section may be better performance-wise.

Program using ast.literal_eval is too slow

I tried to transform strings into list using ast.literal_eval function for a column in a CSV file. The string is something like this '['abbb','cddd','cdcdc']'. For some reason this is a string instead of list, I tried to use ast.literal_eval to transform it into a list with components 'abbb','cddd' and 'cdcdc'. The problem is the execution is to slow (there are 1326101 rows to execute). The code I use is this:
import pandas as pd
import ast
import sys
user_dataset = pd.read_csv('user.csv')
for x in range(len(user_dataset['friends'])):
if user_dataset['friends'][x]!=[]:
"""Covert string to list"""
user_dataset['friends'][x] = ast.literal_eval(user_dataset['friends'][x])
Thanks a lot!

Resources