I would like to save different numpy arrays with np.savetxt('xx.csv'). The np.arrays are (m,n) and I would like to add before the first column a list of index and above each line a list of header. Both lists consist of strings.
When I do multiple savetxt it erased the previous values.
I considered creating a huge matrix by merging every numpy arrays but doubt about the optimality.
Thanks for your help
Related
I have an excel sheet with some column data that I would like to use for some matrix multiplications using MMULT-function. For that purpose I need to reshape the column data first. I would like to do the reshaping using a dynamic array function since that could then feed directly into the MMULT function without having to actually display the reshaped matrix in the sheet (i.e. keeping only the column with the input data visible for the user). I am aware of ideas such as the one outlined here http://www.cpearson.com/excel/VectorToMatrix.aspx however however as far as I can see that requires having the reshaped data displayed in the sheet which I do not want. An alternative could be to enter the arrays directly in the formula using curly brackets, however as far as I can see this notation does not allow cell-references, i.e. something like MMULT({A1,A2,A3;A4,A5,A6},{A7,A8;A9,A10;A11,A12}) is not allowed. Any ideas for solving this issue?
An example is shown below, basically I have the column-data in my sheet, but do not want to repeat the data (as reshaped data), however, I would still like to be able to do display the square of the reshaped matrix.
Reshaped data and matrix multiplication:
For reshaping a 9x1 array into a 3x3 array:
INDEX(B3:B11,SEQUENCE(ROWS(B3:B11)/3,3))
I have two colums in pandas: df.lat and df.lon.
Both have a length of 3897 and 556 NaN values.
My goal is to combine both columns and make a dict out of them.
I use the code:
dict(zip(df.lat,df.lon))
This creates a dict, but with one element less than my original columns.
I used len()to confirm this. I can not figure out why the dict has one element
less than my columns, when both columns have the same length.
Another problem is that the dict has only raw values, but not the keys "lat" respectively "lon".
Maybe someone here has an idea?
You may have a different length if there are repeated values in df.lat as you can't have duplicate keys in the dictionary and so these values would be dropped.
A more flexible approach may be to use the df.to_dict() native method in pandas. In this example the orientation you want is probably 'records'. Full code:
df[['lat', 'lon']].to_dict('records')
Can someone please explain me how to do inner product of two tensors in python to get one dimensional array. For example, I have two tensors with size (6,6,6,6,6) and (6,6,6,6). I need an one dimensional array of size (6,1) or (1,6).
Numpy has a function tensordot, check it out:
https://numpy.org/doc/stable/reference/generated/numpy.tensordot.html
I'm very confused. Reading Data.List package, it says:
transpose [[1,2,3],[4,5,6]] -->
[[1,4],[2,5],[3,6]]
Which would mean that every list is a row?
Reading other literature it seems that a list is in fact a column. Which one is it?
Neither. A list is a sequence. We can pretend it is a row, or a column, but that's an arbitrary choice.
transpose takes a list of lists as input. We can think of that as a sequence of rows, forming a matrix (or a "jagged" matrix if rows are of unequal size). The result, interpreted in the same way, is the transposed matrix.
If we want, we can also see the input of transpose as a sequence of columns, forming a matrix. The result, if we interpret it in the same way, is again the transposed matrix.
TL;DR: for transpose it does not matter if we see the list of lists as a list/sequence of rows of a matrix, or a list/sequence of columns of a matrix, as long as we interpret the result in the same way.
I am having a pandas dataframe with multiple columns (some of them non-contiguous) which would need to be label encoded. From my understanding of the LabelEncoder class, for each column I would need to use a different LabelEncoder object. I am using the code below (list_of_string_cols in the code below is a list of all the columns which needs to be label encoded)
for col in list_of_string_cols:
labelenc = LabelEncoder()
train_X[col] = labelenc.fit_transform(train_X[col])
test_X[col] = labelenc.transform(test_X[col])
Is this the correct way?
Yes that's correct.
Since LabelEncoder was primarily made to deal with labels and not features, so it allowed only a single column at a time.
Up until the current version of scikit-learn (0.19.2), what you are using is the correct way of encoding multiple columns. See this question which also does what you are doing:
Label encoding across multiple columns in scikit-learn
From next version onwards (0.20), OrdinalEncoder can be used to encode all categorical feature columns at once.