numpy number array to strings with trailing zeros removed - string

question: is my method of converting a numpy array of numbers to a numpy array of strings with specific number of decimal places AND trailing zeros removed the 'best' way?
import numpy as np
x = np.array([1.12345, 1.2, 0.1, 0, 1.230000])
print np.core.defchararray.rstrip(np.char.mod('%.4f', x), '0')
outputs:
['1.1235' '1.2' '0.1' '0.' '1.23']
which is the desired result. (I am OK with the rounding issue)
Both of the functions 'rstrip' and 'mod' are numpy functions which means this is fast but is there a way to accomplish this with ONE built in numpy function? (ie. does 'mod' have an option that I couldn't find?) It would save the overhead of returning copies twice which for very large arrays is slow-ish.
thanks!

Thanks to Warren Weckesser for providing valuable comments. Credit to him.
I converted my code to use:
formatter = '%d'
if num_type == 'float':
formatter = '%%.%df' % decimals
np.savetxt(out, arr, fmt=formatter)
where out is a file handle to which I had already written my headers. Alternatively, I could also use the headers= argument in np.savetxt. I have no clue how I didn't see those options in the documentation.
For a numpy array 1300 by 1300, creating the line by line output as I did before (using np.core.defchararray.rstrip(np.char.mod('%.4f', x), '0')) took ~1.7 seconds and using np.savetxt takes 0.48 seconds.
So np.savetxt is a cleaner, more readable, and faster solution.
Note:
I did try:
np.savetxt(out, arr, fmt='%.4g')
in an effort to not have a switch based on number type but it did not work as I had hoped.

Related

np.where issue above a certain value (#Numpy)

I'm facing to 2 issues in the following snippet using np.where (looking for indexes where A[:,0] is identical to B)
Numpy error when n is above a certain value (see error)
quite slow
DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
So I'm wondering what I'm missing and/or misunderstanding, how to fix it, and how to speed-up the code. This is a basic example I've made to mimic my code, but in fact I'm dealing with arrays having (dozens of) millions of rows.
Thanks for your support
Paul
import numpy as np
import time
n=100_000 # with n=10 000 ok but quit slow
m=2_000_000
#matrix A
# A=np.random.random ((n, 4))
A = np.arange(1, 4*n+1, dtype=np.uint64).reshape((n, 4), order='F')
#Matrix B
B=np.random.randint(1, m+1, size=(m), dtype=np.uint64)
B=np.unique(B) # duplicate values are generally generated, so the real size remains lower than n
# use of np.where
t0=time.time()
ind=np.where(A[:, 0].reshape(-1, 1) == B)
# ind2=np.where(B == A[:, 0].reshape(-1, 1))
t1=time.time()
print(f"duration={t1-t0}")
In your current implementation, A[:, 0] is just
np.arange(n/4, dtype=np.uint64)
And if you are interested only in row indexes where A[:, 0] is in B, then you can get them like this:
row_indices = np.where(np.isin(first_col_of_A, B))[0]
If you then want to select the rows of A with these indices, you don't even have to convert the boolean mask to index locations. You can just select the rows with the boolean mask: A[np.isin(first_col_of_A, B)]
There are better ways to select random elements from an array. For example, you could use numpy.random.Generator.choice with replace=False. Also, Numpy: Get random set of rows from 2D array.
I feel there is almost certainly a better way to do the whole thing that you are trying to do with these index locations.
I recommend you study the Numpy User Guide and the Pandas User Guide to see what cool things are available there.
Honestly, with your current implementation you don't even need the first column of A at all, because row indicies simply equal the elements of A[:, 0]. Here:
row_indices = B[B < n]
row_indices.sort()
print(row_indices)

Multiplying functions together in Python

I am using Python at the moment and I have a function that I need to multiply against itself for different constants.
e.g. If I have f(x,y)= x^2y+a, where 'a' is some constant (possibly list of constants).
If 'a' is a list (of unknown size as it depends on the input), then if we say a = [1,3,7] the operation I want to do is
(x^2y+1)*(x^2y+3)*(x^2y+7)
but generalised to n elements in 'a'. Is there an easy way to do this in Python as I can't think of a decent way around this problem? If the size in 'a' was fixed then it would seem much easier as I could just define the functions separately and then multiply them together in a new function, but since the size isn't fixed this approach isn't suitable. Does anyone have a way around this?
You can numpy ftw, it's fairly easy to get into.
import numpy as np
a = np.array([1,3,7])
x = 10
y = 0.2
print(x ** (2*y) + a)
print(np.sum(x**(2*y)+a))
Output:
[3.51188643 5.51188643 9.51188643]
18.53565929452874
I haven't really got much for it to be honest, I'm still trying to figure out how to get the functions to not overlap.
a=[1,3,7]
for i in range(0,len(a)-1):
def f(x,y):
return (x**2)*y+a[i]
def g(x,y):
return (x**2)*y+a[i+1]
def h(x,y):
return f(x,y)*g(x,y)
f1= lambda y, x: h(x,y)
integrate.dblquad(f1, 0, 2, lambda x: 1, lambda x: 10)
I should have clarified that the end result can't be in floats as it needs to be integrated afterwards using dblquad.

Numpy matrix exponentiation gives negative result

I am trying to create a Fibonacci number with high n by using matrix exponential but it gives me negative result. I have tried to change the integer objects but failed.
import numpy as np
def matrixmul(a,n):
a=np.array([[1,1],[1,0]])
return ((np.array([1,1],[1,0], dtype=np.object))**n)
matrixMul(a,100)
my output is
array([[-1869596475, -980107325],
[ -980107325, -889489150]])
but it was wrong. there should not be any negative number.
It's hard to answer on your question. Your code have some bugs:
You haven't initialize a
Also name of defined function is different than used (python is case-sensitive)
Then in function you are not using a (because it is not in return)
And most important thing is that you can not use **n too get exponential of matrix. Instead you can try to find right function in scipy library. Probably expm() function can be right for this perpuse.

Warning of divide by zero encountered in log2 even after filtering out negative values

We get the error - "divide by zero encountered in log2"
if the value is less than zero. I am facing the error even when I exempt the non-positive values using where statement.
a = pd.Series([1,0,5,6,8])
np.where(a<=0, 1, np.log2(a))
When you are computing the values that should be substituted when values in a are non-positive, log2 is called and applied to a. This doesn't really affect your output though. To suppress this error, you could replace non-positive values with 1 first, and perform log2.
Try the following:
import pandas as pd
import numpy as np
a = pd.Series([1,0,5,6,8])
for digit in a:
np.where(digit<=0, 1, np.log2(digit))
it seems the issue is if you just do
np.where(a<=0, 1, np.log2(a))
it sees the entire series not as an iteration through a.

TypeError: append() missing 1 required positional argument: 'values'

I have variable 'x_data' sized 360x190, I am trying to select particular rows of data.
x_data_train = []
x_data_train = np.append([x_data_train,
x_data[0:20,:],
x_data[46:65,:],
x_data[91:110,:],
x_data[136:155,:],
x_data[181:200,:],
x_data[226:245,:],
x_data[271:290,:],
x_data[316:335,:]],axis = 0)
I get the following error :
TypeError: append() missing 1 required positional argument: 'values'
where did I go wrong ?
If I am using
x_data_train = []
x_data_train.append(x_data[0:20,:])
x_data_train.append(x_data[46:65,:])
x_data_train.append(x_data[91:110,:])
x_data_train.append(x_data[136:155,:])
x_data_train.append(x_data[181:200,:])
x_data_train.append(x_data[226:245,:])
x_data_train.append(x_data[271:290,:])
x_data_train.append(x_data[316:335,:])
the size of the output is 8 instead of 160 rows.
Update:
In matlab, I will load the text file and x_data will be variable having 360 rows and 190 columns.
If I want to select 1 to 20 , 46 to 65, ... rows of data , I simply give
x_data_train = xdata([1:20,46:65,91:110,136:155,181:200,226:245,271:290,316:335], :);
the resulting x_data_train will be the array of my desired.
How can do that in python because it results array of 8 subsets of array for 20*192 each, but I want it to be one array 160*192
Short version: the most idiomatic and fastest way to do what you want in python is this (assuming x_data is a numpy array):
x_data_train = np.vstack([x_data[0:20,:],
x_data[46:65,:],
x_data[91:110,:],
x_data[136:155,:],
x_data[181:200,:],
x_data[226:245,:],
x_data[271:290,:],
x_data[316:335,:]])
This can be shortened (but made very slightly slower) by doing:
xdata[np.r_[0:20,46:65,91:110,136:155,181:200,226:245,271:290,316:335], :]
For your case where you have a lot of indices I think it helps readability, but in cases where there are fewer indices I would use the first approach.
Long version:
There are several different issues at play here.
First, in python, [] makes a list, not an array like in MATLAB. Lists are more like 1D cell arrays. They can hold any data type, including other lists, but they cannot have multiple dimensions. The equivalent of MATLAB matrices in Python are numpy arrays, which are created using np.array.
Second, [x, y] in Python always creates a list where the first element is x and the second element is y. In MATLAB [x, y] can do one of several completely different things depending on what x and y are. In your case, you want to concatenate. In Python, you need to explicitly concatenate. For two lists, there are several ways to do that. The simplest is using x += y, which modifies x in-place by putting the contents of y at the end. You can combine multiple lists by doing something like x += y + z + w. If you want to keep x, unchanged, you can assign to a new variable using something like z = x + y. Finally, you can use x.extend(y), which is roughly equivalent to x += y but works with some data types besides lists.
For numpy arrays, you need to use a slightly different approach. While Python lists can be modified in-place, strictly speaking neither MATLAB matrices nor numpy arrays can be. MATLAB pretends to allow this, but it is really creating a new matrix behind-the-scenes (which is why you get a warning if you try to resize a matrix in a loop). Numpy requires you to be more explicit about creating a new array. The simplest approach is to use np.hstack, which concatenates two arrays horizontally (or np.vstack or np.dstack for vertical and depth concatenation, respectively). So you could do z = np.hstack([v, w, x, y]). There is an append method and function in numpy, but it almost never works in practice so don't use it (it requires careful memory management that is more trouble than it is worth).
Third, what append does is to create one new element in the target list, and put whatever variable append is called with in that element. So if you do x.append([1,2,3]), it adds one new element to the end of list x containing the list [1,2,3]. It would be more like x = [x, {{1,2,3}}}, where x is a cell array.
Fourth, Python makes heavy use of "methods", which are basically functions attached to data (it is a bit more complicated than that in practice, but those complexities aren't really relevant here). Recent versions of MATLAB has added them as well, but they aren't really integrated into MATLAB data types like they are in Python. So where in MATLAB you would usually use sum(x), for numpy arrays you would use x.sum(). In this case, assuming you were doing appending (which you aren't) you wouldn't use the np.append(x, y), you would use x.append(y).
Finally, in MATLAB x:y creates a matrix of values from x to y. In Python, however, it creates a "slice", which doesn't actually contain all the values and so can be processed much more quickly by lists and numpy arrays. However, you can't really work with multiple slices like you do in your example (nor does it make sense to because slices in numpy don't make copies like they do in MATLAB, while using multiple indexes does make a copy). You can get something close to what you have in MATLAB using np.r_, which creates a numpy array based on indexes and slices. So to reproduce your example in numpy, where xdata is a numpy array, you can do xdata[np.r_[1:20,46:65,91:110,136:155,181:200,226:245,271:290,316:335], :]
More information on x_data and np might be needed to solve this but...
First: You're creating 2 copies of the same list: np and x_data_train
Second: Your indexes on x_data are strange
Third: You're passing 3 objects to append() when it only accepts 2.
I'm pretty sure revisiting your indexes on x_data will be where you solve the current error, but it will result in another error related to passing 2 values to append.
And I'm also sure you want
x_data_train.append(object)
not
x_data_train = np.append(object)
and you may actually want
x_data_train.extend([objects])
More on append vs extend here: append vs. extend

Resources