ipython: print numbers with thousands separator - python-3.x

I am using ipython 5.8.0 on Debian 10.
This is how output looks like:
In [1]: 50*50
Out[1]: 2500
Is it possible to configure ipython to print all numbers with thousands separators? ie:
In [1]: 50*50
Out[1]: 2'500
In [2]: 5000*5000
Out[2]: 25'000'000
And perhaps, is it possible to make ipython also understand thousands separators on input?
In [1]: 5'000*5'000
Out[1]: 25'000'000
UPDATE
The accepted answer from #Chayim Friedman works for integers, but does not work for float:
In [1]: 500.1*500
Out[1]: 250050.0
Also, when it works, it uses , as the character for thousand separator:
In [1]: 500*500
Out[1]: 250,000
Can I use ' instead?

Using ' as thousands separator in input is quite problematic because Python uses ' to delimit strings, but you can use _ (PEP 515, Underscores in Numeric Literals):
Regarding output, this is slightly harder, but can be done using IPython extensions.
Put the following Python code in a new file at ~/.ipython/extensions/thousands_separator.py:
default_int_printer = None
def print_int(number, printer, cycle):
printer.text(f'{number:,}') # You can use `'{:,}'.format(number)` if you're using a Python version older than 3.6
def load_ipython_extension(ipython):
global default_int_printer
default_int_printer = ipython.display_formatter.formatters['text/plain'].for_type(int, print_int)
def unload_ipython_extension(ipython):
ipython.display_formatter.formatters['text/plain'].for_type(int, default_int_printer)
This code tells IPython to replace the default int formatter with one that prints thousand separators when this extension is loaded, and restore the original when it is unloaded.
Edit: If you want a different separator, for instance ', replace the f'{number:,}' with f'{number:,}'.replace(',', "'").
You can load the extension using the magic command %load_ext thousands_separator and unload it using %unload_ext thousands_separator, but if you want it always, you can place it in the default profile.
Run the following code in the terminal:
ipython3 profile create
It will report that a file ~/.ipython/profile_default/ipython_config.py was created. Enter it, and search for the following string:
## A list of dotted module names of IPython extensions to load.
#c.InteractiveShellApp.extensions = []
Replace it with the following:
# A list of dotted module names of IPython extensions to load.
c.InteractiveShellApp.extensions = [
'thousands_separator'
]
This tells IPython to load this extension by default.
Done!
Edit: I saw that you want to a) use ' as separator, and b) do the same for floats:
Using different separator is quite easy: just str.replace():
def print_int(number, printer, cycle):
printer.text(f'{number:,}'.replace(',', "'"))
Doing the same for floats is also easy: just setup print_int so it prints floats to. I also suggest to change the name to print_number.
Final code:
default_int_printer = None
default_float_printer = None
def print_number(number, printer, cycle):
printer.text(f'{number:,}'.replace(',', "'"))
def load_ipython_extension(ipython):
global default_int_printer
global default_float_printer
default_int_printer = ipython.display_formatter.formatters['text/plain'].for_type(int, print_number)
default_float_printer = ipython.display_formatter.formatters['text/plain'].for_type(float, print_number)
def unload_ipython_extension(ipython):
ipython.display_formatter.formatters['text/plain'].for_type(int, default_int_printer)
ipython.display_formatter.formatters['text/plain'].for_type(float, default_float_printer)

After update: you can subclass int:
class Int(int):
def __repr__(self):
return "{:,}".format(self)
Int(1000)
# 1,000

I don't believe you can achieve all that you are looking for without rewriting the iPython interpreter, which means changing the Python language specification, to be able to input numbers with embedded ' characters and have them ignored. But you can achieve some of it. Subclassing the int class is a good start. But you should also overload the various operators you plan on using. For example:
class Integer(int):
def __str__(self):
# if you want ' as the separator:
return "{:,}".format(self).replace(",", "'")
def __add__(self, x):
return Integer(int(self) + x)
def __mul__(self, x):
return Integer(int(self) * x)
"""
define other operations: __sub__, __floordiv__, __mod__, __neg__, etc.
"""
i1 = Integer(2)
i2 = Integer(1000) + 4.5 * i1
print(i2)
print(i1 * (3 + i2))
Prints:
1'009
2'024
Update
It seems that for Python 3.7 you need to override the __str__ method rather than the __repr__ method. This works for Python 3.8 and should work for later releases as well.
Update 2
import locale
#locale.setlocale(locale.LC_ALL, '') # probably not required
print(locale.format_string("%d", 1255000, grouping=True).replace(",", "'"))
Prints:
1'255'000
An alternative if you have package Babel from the PyPi repository:
from babel import Locale
from babel.numbers import format_number
locale = Locale('en', 'US')
locale.number_symbols['group'] = "'"
print(format_number(1255000, locale='en_US'))
Prints:
1'255'000
Or if you prefer to custom-tailor a locale just for this purpose and leave the standard en_US locale unmodified. This also shows how you can parse input values:
from copy import deepcopy
from babel import Locale
from babel.numbers import format_number, parse_number
my_locale = deepcopy(Locale('en', 'US'))
my_locale.number_symbols['group'] = "'"
print(format_number(1255000, locale=my_locale))
print(parse_number("1'125'000", locale=my_locale))
Prints:
1'255'000
1125000

Based on PEP-0378, you can use the following code:
a = 1200
b = 500
c = 10
#res = a
#res = a*b
res = a*b*c
dig = len(str(res)) # to figure out how many digits are required in result
print(format(res, "{},d".format(dig)))
It will produce:
6,000,000

Related

how to turn a list item (string type) into a module function in python 3.10

I had a weird idea to grab a list item and call it as module function, here is what I'm trying to do:
if you used dir() on random module it would return a list of its attributes, and I want to grab a specific item which is, in this case, randint, then invoke it as a working function using a,b as arguments and become in this form randint(a, b)
that's what I tried:
from random import randint #to avoid the dot notation and make it simple
a = 10
b = 100
var = dir(random)
print(var)
# here is the result
#randint index is 55
print(var[55])
>>> randint
I couldn't find out the type of the function randint() so I can convert the list item to it and try the following:
something(var[55] + "(a, b)")
is there a way I can achieve what I'm trying to do?
you can use the exec command that can execute any string.
updated your code, the below answer should work
from random import randint #to avoid the dot notation and make it simple
a = 10
b = 100
var = dir(random)
print(var)
function_string = "random_number = " + var[55] + f"({a},{b})"
#note that index number can be changed based on python version
print(function_string)
exec(function_string)
print(random_number)

Trouble using regex patterns any Python to find content in a document

I have a list of regex expressions that I want to find in certain docs.
x = ['\bin\sapp\sdata\b','\bin\sapp\sdata\b','\benough\sdata\b']
The patterns repeat themselves so I converted them to a set (see the first and second values in the list)
y = set(x)
When I try to find them in a specific doc it doesn't find them since it doesn't take them as a repr version:
import pandas as pd
import re
results = list()
doc = 'they wanted in app data and we did not provide it'
for value in y:
results.append(re.findall(pattern = value,string=doc))
results = list(filter(None, results))
results
How do I overcome this?
Thanks
The problem was with the python 3.7 version. The error I got was "bad escape \l at position 0" Once I changed the re to regex it worked perfectly fine, even with the "messed up coding

How to load a big numpy array from a text file with Dask?

I have a text file containing data that I read to memory with numpy.genfromtxt enforcing a custom numpy.dtype. Although the text file is smaller than the available RAM, I often get a MemoryError (which I don't understand, but it is not the point of this question). When looking for ways to resolve it, I came across dask. In the API I found methods for data loading but none of them reads from text files, not to mention my need to support converters in genfromtxt().
I see there is a dask.dataframe.read_csv() method, but in my case I don't use pandas, but rather plain numpy.array with custom dtypes and colum names, as mentioned above. The text file I have is not CSV anyway (thus the abovementioned use of converters in genfromtxt()).
Any ideas on how could I handle it will be appreciated.
You should use the function dask.bytes.read_bytes with delimiter="\n" to read your file(s) and split them into blocks at line-endings. You get back a set of dask.delayed objects, which you can pass to numpy. Unfortunately, numpy wants a file-like, so you must pack the bytes again:
import dask
import dask.array as da
_, blocks = dask.bytes.read_bytes(files, delimiter="\n")
#dask.delayed
def parse(block):
return numpy.genfromtext(io.BytesIO(block), ...)
arrays = [da.from_delayed(parse(block), ...) for block in blocks]
arr = da.stack/concat(arrays)
SO editors rejected my edit to #mdurant's answer, thus, I post the working code (based on that answer) here:
import numpy
import dask
import dask.array as da
import io
fname = 'data.txt'
# data.txt is:
# 1 2
# 3 4
# 5 6
files = [fname]
_, blocks = dask.bytes.read_bytes(files, delimiter="\n")
my_type = numpy.dtype([
('f1', numpy.float64),
('f2', numpy.float64)
])
native_type = numpy.float
used_type = numpy.float64
# If the below line is uncommented, then creating the dask array will work, but it won't
# be possible to perform any operations on it
# used_type = my_type
# Debug
# print('blocks', blocks)
# print('type(blocks)', type(blocks))
# print('blocks[0]', blocks[0])
# print('type(blocks[0])', type(blocks[0]))
#dask.delayed
def parse(block):
r = numpy.genfromtxt(io.BytesIO(block[0]))
print('parse() about to return:\n', r, '\n')
return r
# Below I added shape, which seems compulsatory, the reason for which I don't understand
arrays = [da.from_delayed(value=parse(block), shape=(3, ), dtype=used_type) for block in blocks]
# da.concat did not not work for me
arr = da.stack(arrays)
# The below will not work if used_type is set to my_type
arr += 1
# Neither the below woudl work, it raises NotImplementedError
# arr['f1'] += 1
arr_np = arr.compute()
print('numpy array incremented by one: \n', arr_np)

Python translate a column with multiple languages to english

I have a dataset where there are multiple comments columns having multiple languages and I want to translate these columns into English and create new columns with all the english translations.
Accountability_COMMENT is the column which has multiple comments in different language in every row. I want to create a new column and translate all such comments to English.
I have tried the following code :
from googletrans import Translator
from textblob import TextBlob
translator = Translator()
data_merge['Accountability_COMMENT'] = data_merge['Accountability_COMMENT'].apply(lambda x:
TextBlob(x).translate(to='en'))
The error that I am getting is :
TypeError: The text argument passed to __init__(text) must be a string, not class 'float'
My column has objet format which is correct
You most probably have some comments that only consists of a float (i.e. a decimal number), that even if they are type: object according to pandas they are still interpreted as float by TextBlob. This leads to the error:
TypeError: The text argument passed to __init__(text) must be a string, not <class 'float'>
One solution is to make sure that the input x of TextBlob(x) is a string. You could do this by modifying the apply row like:
data_merge['Accountability_COMMENT'] = data_merge['Accountability_COMMENT'].apply(lambda x: TextBlob(str(x)).translate(to='en'))
Unfortunately this will probably also rais an error like:
raise NotTranslated('Translation API returned the input string unchanged.')
textblob.exceptions.NotTranslated: Translation API returned the input string unchanged.
This is due to the fact that when translating a number, the translation and the original text will be exactly the same, and apparently TextBlob doesn't like that.
What you can do to avoid this is to catch that exception NotTranslated and just return the untranslated TextBlob, like this:
from textblob import TextBlob
from textblob.exceptions import NotTranslated
def translate_comment(x):
try:
# Try to translate the string version of the comment
return TextBlob(str(x)).translate(to='en')
except NotTranslated:
# If the output is the same as the input just return the TextBlob version of the input
return TextBlob(str(x))
data_merge['Accountability_COMMENT'] = data_merge['Accountability_COMMENT'].apply(translate_comment)
EDIT:
If you get the HTTP error Too Many Requests it's probably because you are being kicked out by the Google Translate API. Instead of using apply, you can make your translation "extra-slow" by using a for loop with some sleep in-between cycles. In this case you should import another package (time) and substitute the last line:
from time import sleep
from textblob import TextBlob
from textblob.exceptions import NotTranslated
def translate_comment(x):
try:
# Try to translate the string version of the comment
return TextBlob(str(x)).translate(to='en')
except NotTranslated:
# If the output is the same as the input just return the TextBlob version of the input
return TextBlob(str(x))
for i in range(len(data_merge['Accountability_COMMENT'])):
# Translate one comment at a time
data_merge['Accountability_COMMENT'].iloc[i] = translate_comment(data_merge['Accountability_COMMENT'].iloc[i])
# Sleep for a quarter of second
sleep(0.25)
You can then experiment with different values for the sleep function. Of course the longer the sleep the slower the translation! N.B. sleep argument is in seconds.

Calculating Log base c in python3

Is there a way to calculate log base c in python?
c is a variable and may change due to some dependencies.
I am new to programming and also python3.
There is already a built in function in the math module in python that does this.
from math import log
def logOc(c, num):
return log(num,c)
print(log(3,3**24))
You can read more about log and the python math module here
Yes, you can simply use math's function log():
import math
c = 100
val = math.log(10000,c) #Where the first value is the number and the second the base.
print(val)
Example:
print(val)
2.0

Resources