I am looking for a single vector with values [(0:400) (-400:-1)]
Can anyone help me on how to write this in python.
Using Numpy .array to create the vector and .arange to generate the range:
import numpy as np
arr = np.array([[np.arange(401)], [np.arange(-400, 0)]], dtype=object)
Related
I have a dataset input, which is a list of ~40000 letters (that are represented as strings).
With SKLearn, I first used a TfidfVectorizer to create a TF-IDF matrix representation1:
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import sklearn.pipeline
vectorizer = TfidfVectorizer(lowercase=False)
representation1 = vectorizer.fit_transform(input) # TFIDF representation
Now, I want to manually add one feature representation2 for every letter. This feature should tell the amount of different words compared to all words in a specific letter/string:
count_vectorizer = CountVectorizer()
sum_words = np.sum(count_vectorizer.fit_transform(input).toarray(), axis=-1)
sum_different_words = np.count_nonzero(count_vectorizer.fit_transform(input).toarray(), axis=-1)
representation2 = np.divide(sum_different_words, sum_words) # percentage of different words
The array representation2 is now an array of shape (39077,) (as expected). I now want to combine representation1 and representation2 into one feature vector representation.
I read about using FeatureUnion to combine two kinds of features in SKLearn, but I am not sure how to correctly use the Numpy array representation2as a feature here. I tried:
union = sklearn.pipeline.make_union([representation1, representation2])
But now I can't use e.g. union.get_feature_names_out(), since it throws: AttributeError: Transformer list (type list) does not provide get_feature_names_out.
What did I understand incorrectly here?
I have a pandas dataframe of shape (18837349,2000) and a 3D Numpy Array of shape (18837349,6,601). I want to shuffle the rows of my dataframe and the first dimension of my Numpy Array in unison. I know how to shuffle a dataframe:
df_shuffle = df.sample(frac=1).reset_index(drop=True)
But I don't know how to do it together with a 3D Numpy Array. Insights will be appreciated.
You can shuffle an index and use them for both objects
ix = np.arange(18837349)
np.random.shuffle(ix)
df_shuffle, array_shuffle = your_df.iloc[ix].reset_index(drop=True), your_array[ix]
I have a numpy ndarray in this form:
inputs = np.array([[1],[2],[3]])
How can I convert this ndarray to a deque (collections.deque) so that the structure get preserved (array of arrays) and I could apply normal deque methods such as popleft() and append()? for example:
inputs.popleft()
->>> [[2],[3]]
inputs.append([4])
->>> [[2],[3], [4]]
I think you could pass inputs directly to deque
from collections import deque
i = deque(inputs)
In [1050]: i
Out[1050]: deque([array([1]), array([2]), array([3])])
In [1051]: i.popleft()
Out[1051]: array([1])
In [1052]: i
Out[1052]: deque([array([2]), array([3])])
In [1053]: i.append([4])
In [1054]: i
Out[1054]: deque([array([2]), array([3]), [4]])
Later on, when you want numpy.array back, just pass deque back to numpy
np.array(i)
Out[1062]:
array([[2],
[3],
[4]])
Hmm I think that you can do:
inputs = np.array([[1],[2],[3]])
inputs = collections.deque([list(i) for i in inputs])
inputs.append([4])
inputs.popleft()
EDIT.
I edited code
I am new to python and using numpy to read a csv into an array .So I used two methods:
Approach 1
train = np.asarray(np.genfromtxt(open("/Users/mac/train.csv","rb"),delimiter=","))
Approach 2
with open('/Users/mac/train.csv') as csvfile:
rows = csv.reader(csvfile)
for row in rows:
newrow = np.array(row).astype(np.int)
train.append(newrow)
I am not sure what is the difference between these two approaches? What is recommended to use?
I am not concerned which is faster since my data size is small but instead concerned more about differences in the resulting data type.
You can use pandas also, it is better and simple to use.
import pandas as pd
import numpy as np
dataset = pd.read_csv('file.csv')
# get all headers in csv
values = list(dataset.columns.values)
# get the labels, assuming last row is labels in csv
y = dataset[values[-1:]]
y = np.array(y, dtype='float32')
X = dataset[values[0:-1]]
X = np.array(X, dtype='float32')
So what is the difference in the result?
genfromtxt is the numpy csv reader. It returns an array. No need for an extra asarray.
The second expression is incomplete, looks like would produce a list of arrays, one for each line of the file. It uses the generic python csv reader which doesn't do much other than read a line and split it into strings.
By using 'Canny' function in opencv the output argument is numpy array like [0,0,0,0,255] etc. Can i output a binary array like true/false or 1/0 like if detected return 1. Actually matlab do that as default. Please take a look on output section.
Find edges in intensity image, Matlab
In python code like this:
import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg',0)
edges = cv2.Canny(img,100,200) #numpy array. must be binary array (1/0)
You can convert the output array immediately:
edges_bool = cv2.Canny(img,100,200).astype(bool)
Alternatively, you can use later the following function:
edges_bool = np.asarray(edges, dtype=bool)