How would I get the actual character points for an emoji in python3?
>>> '😋'
'😋'
>>> ?
\xF0\x9F\x98\x8B
And then, vice-versa, how would I print the emoji from code points?
>>> print ('\xF0\x9F\x98\x8B')
'😋'
This was the behavior in python2.7 but not in 3 so curious how to do it here.
Python 2.7.18 (default, Nov 13 2021, 06:17:34)
>>> '😋'
'\xf0\x9f\x98\x8b'
>>> print('\xf0\x9f\x98\x8b')
😋
You can use the string/bytes default decode / encode methods:
>>> '😀'.encode('utf-8')
b'\xf0\x9f\x98\x80'
>>> b'\xf0\x9f\x98\x80'.decode('utf-8')
'😀'
While performing multiplication of three numbers with and without braces in python shows different answers.
>>>from math import sqrt
>>> a=100000000 #my input
>>> b=sqrt(3)/4
0.4330127018922193
>>> c=b*a*a #print with format specifier
>>> print("{:.2f}".format(c))
4330127018922193.50
>>> d=b*(a*a)
>>> print("{:.2f}".format(d))
4330127018922193.00
Can someone please explain me why the precision changes with brackets?
Is there any way to get the no of rows in sas dataset("xxxx.sas7bdat") without actually reading the dataset in python. The reason for not reading the sas dataset is, it is huge.
You might be able to do this by simply counting the rows in the file by using the wc -l shell command and reading it into a variable:
import os
>>> import os
>>> stream = os.popen('wc -l example.txt')
>>> output = stream.read()
>>> output
' 3 example.txt\n'
You could further tokenize the output to get number of rows as a variable:
>>> output.split()
['3', 'example.txt']
>>> output.split()[0]
'3'
>>> int(output.split()[0])
3
Hope this helps.
I've got a dataframe with more than 30000 rows and almost 40 columns exported from a csv file.
The most part of it mixes str with int features.
-integers are int
-floats and powers of ten are str
It looks like this:
Id A B
1 2.5220019e+008 1742087
2 1.7766118e+008 2223964.5
3 3.3750285e+008 2705867.8
4 97782360 2.5220019e+008
I've tried the following code:
import pandas as pd
import numpy as np
import geopandas as gpd
from shapely.geometry import Point, LineString, shape
df = pd.read_csv('mycsvfile.csv').astype(float)
Which yields the this error message:
ValueError: could not convert string to float: '-1.#IND'
I guess that it has to do about the exponencial nomenclator of powers of ten (e+) that the python libraries isn't able to transform.
Is there a way to fix it?
From my conversation with QuangHoang I should apply the function:
pd.to_numeric(df['column'], errors='coerce')
Since almost the whole DataFrame are str objects, I ran the following code line:
df2 = df.apply(lambda x : pd.to_numeric(x, errors='coerce'))
I am using spyder 3.1.3 with python 3.6.8 under window 10, having scipy 1.2.1. I want to get the chisquare value but notice there is negative values returned. Why is that?
from scipy.stats import chisquare
chisquare(f_obs=[2,1], f_exp=[100000,1])
#Power_divergenceResult(statistic=14096.65412, pvalue=0.0)
but
chisquare(f_obs=[2,1], f_exp=[1000000,1])
#Power_divergenceResult(statistic=-731.379964, pvalue=1.0)
Is there an upperbound for expect values in chisquare? Thanks.
On Windows, the default integer type for numpy arrays is 32 bit. I can reproduce the problem by passing numpy arrays with dtype np.int32 to chisquare:
In [5]: chisquare(f_obs=np.array([2,1], dtype=np.int32), f_exp=np.array([1000000,1], dtype=np.int32))
Out[5]: Power_divergenceResult(statistic=-731.379964, pvalue=1.0)
This is a bug. I created an issue for this on the SciPy github site: https://github.com/scipy/scipy/issues/10159
To work around the problem, convert the input arguments to arrays with data type numpy.int64 or numpy.float64:
In [6]: chisquare(f_obs=np.array([2,1], dtype=np.int64), f_exp=np.array([1000000,1], dtype=np.int64))
Out[6]: Power_divergenceResult(statistic=999996.000004, pvalue=0.0)
In [7]: chisquare(f_obs=np.array([2,1], dtype=np.float64), f_exp=np.array([1000000,1], dtype=np.float64))
Out[7]: Power_divergenceResult(statistic=999996.000004, pvalue=0.0)