ValueError: could not convert string to float for X_t - python-3.x

I want to use the X_train and Y_train from the Dataframe df.
But I encountered ValueError: could not convert string to float
Thank you in advance.
df = pd.DataFrame()
df['images'] = X_train.tolist()
df['label'] = Y_train.tolist()
df = df.sample(frac=1).reset_index(drop=True)
df.head()
X_train = df['images'].astype('str')
Y_train = df['label'].astype('str')
X_train = np.asarray(X_train).astype(np.float32)
Y_train = np.asarray(Y_train).astype(np.float32)
X_test = np.asarray(X_test).astype(np.float32)
Y_test = np.asarray(Y_test).astype(np.float32)
print (Y_train)

Related

ValueError: Failed to convert a NumPy array to a Tensor

I used `
x_train = np.array([np.array(val) for val in x_train])
y_train = np.array([np.array(val) for val in y_train])
`
but I failed to convert numpy to tensor
My code is `
x_train = np.array([np.array(val) for val in x_train])
y_train = np.array([np.array(val) for val in y_train])
model.fit(x_train,y_train,epochs =5,batch_size = 128,validation_split = 0.2,shuffle =True)
test_loss,test_acc = model.evaluate(x_test,y_test)
print('Test loss',test_loss)
print('Accuracy',test_acc)
`
Error:
ValueError Traceback (most recent call last)
<ipython-input-39-43fd775bb14b> in <module>
1 x_train = np.array([np.array(val) for val in x_train])
2 y_train = np.array([np.array(val) for val in y_train])
----> 3 model.fit(x_train,y_train,epochs =5,batch_size = 128,validation_split = 0.2,shuffle =True)
4 test_loss,test_acc = model.evaluate(x_test,y_test)
5 print('Test loss',test_loss)
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
100 dtype = dtypes.as_dtype(dtype).as_datatype_enum
101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)
103
104
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
My model:
`
model = tf.keras.Sequential([
tf.keras.layers.Embedding(words,embed_size,input_shape =(x_train.shape[0],)),
tf.keras.layers.Conv1D(128,3,activation = 'relu'),
tf.keras.layers.MaxPooling1D(),
tf.keras.layers.LSTM(128,activation = 'tanh'),
tf.keras.layers.Dense(10,activation='relu',input_dim=300),
tf.keras.layers.Dense(1,activation='sigmoid',input_dim=300) ])
model.summary()
`
The error you are getting is because of the data type of an array, as Tensorflow models do not support Object data type, So, try to cast these tensors. I am casting it to float32.
x_train = np.array([np.array(val) for val in x_train])
y_train = np.array([np.array(val) for val in y_train])
x_train = tf.cast(x_train , dtype=tf.float32)
y_train = tf.cast(y_train , dtype=tf.float32)

dimension related problem in training LightGBM for Multiclass Multilable Classification?

I would like to classify by LightGBM algorithm for Multiclass Multilable Classification but I encounter a problem during training because of not being a list the input. DATA
is The length of real rows is 10000
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,np.r_[0:6, 7:27]].values
y = dataset.iloc[:,np.r_[6]].values
x_train, x_test, y_train, y_test = train_test_split(X, y,test_size = 0.25, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)
import lightgbm as lgb
d_train = lgb.Dataset(x_train, label=y_train)
params = {}
params['learning_rate'] = 0.003
params['boosting_type'] = 'gbdt'
params['objective'] = 'binary'
params['metric'] = 'binary_logloss'
params['sub_feature'] = 0.5
params['num_leaves'] = 10
params['min_data'] = 50
params['max_depth'] = 10
clf = lgb.train(params, d_train, 100)
y_pred=clf.predict(x_test)
for i in range(0,99):
if y_pred[i]>=.5:
y_pred[i]=1
else:
y_pred[i]=0
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
I encounter this problem:
clf = lgb.train(params, d_train, 100)
File "..\lightgbm\engine.py", line 228, in train
...
File "..\lightgbm\basic.py", line 1336, in set_label
label = list_to_1d_numpy(_label_from_pandas(label), name='label')
File "..\lightgbm\basic.py", line 86, in list_to_1d_numpy
"It should be list, numpy 1-D array or pandas Series".format(type(data).__name__, name))
This error is found in basic.py in a function: """Convert data to numpy 1-D array.""" While when I have changed my data to 1D by
y_train = np.reshape(y_train, [1,trainsize])
x_train = np.reshape(x_train, [1,trainsize*26])
The problem is not solved!
Then I use ravel to make 1D for x_train, y_train
x_train = np.ravel(x_train)
y_train = np.ravel(y_train)
but new error is shown:
\lib\site-packages\lightgbm\basic.py", line 872, in __init_from_np2d
raise ValueError('Input numpy.ndarray must be 2 dimensional')
ValueError: Input numpy.ndarray must be 2 dimensional
What is wrong? How I can solve this?

Always getting accuracy of 1 how to fix it?

I'm trying to apply logistic regression on my dataset but its giving accuracy of 1
df = pd.read_csv("train.csv", header=0)
df = df[["PassengerId", "Survived", "Sex", "Age", "Embarked"]]
df.dropna(inplace=True)
X = df[["Sex", "Age"]]
X_train = np.array(X)
Y = df["Survived"]
Y_train = np.array(Y)
clf = LogisticRegression()
clf.fit(X_train, Y_train)
df1 = pd.read_csv("test.csv", header=0)
df1 = df1[["PassengerId", "Survived", "Sex", "Age", "Embarked"]]
df1.dropna(inplace=True)
X = df1[["Sex", "Age"]]
X_test = np.array(X)
Y = df1["Survived"]
Y_test = np.array(Y)
X_test = X_test.astype(float)
Y_test = Y_test.astype(float)
#to convert string data to float
accuracy = clf.score(X_test, Y_test)
print("Accuracy = ", accuracy)
I expect the output between 0 and 1, but always getting 1.0

error when exporting predictions of 4 machine learning models

I am training and testing my date on a kfold equal to 10 with 4 different models. I would like for each models to export the prédictions and the corrected classes for each split.
this is my code and the result :
for train_index, test_index in kf.split(X, labels):
print('TRAIN:', train_index,
'TEST:', test_index)
X_train, X_val = X[train_index], X[test_index]
y_train, y_val = labels[train_index], labels[test_index]
model1 = LinearSVC()
model2 = MultinomialNB()
model3 = LogisticRegression()
model4 = RandomForestClassifier()
model1.fit(X_train, y_train)
model2.fit(X_train, y_train)
model3.fit(X_train, y_train)
model4.fit(X_train, y_train)
result1 = model1.predict(X_val)
result2 = model2.predict(X_val)
result3 = model3.predict(X_val)
result4 = model4.predict(X_val)
df = pd.DataFrame(data = {"id": X_val, "Prediction": y_val})
df.to_excel('result.xlsx')
so far I have this below but it only prints the first lines (1-198) but i do not understand the export , could you help me
I have approximately 2000 sentences.
When you set K in KFold == 10, the .split() method splits your dataset into 10 portions. For each iteration, test_index will be indices of the i-th portion while train_index will be the rest of the 9 portions.
In your original code, the df shows the test set (X_val, Y_val) (instead of the predictions) for each iteration.
I am not sure that you intend to do but if you would like to see the prediction for each model, the following code will do:
df = pd.DataFrame(data={
"id": [],
"ground_true": [],
"original_sentence": [],
"pred_model1": [],
"pred_model2": [],
"pred_model3": [],
"pred_model4": []})
for train_index, test_index in kf.split(X, labels):
print('TRAIN:', train_index,'TEST:', test_index)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = labels[train_index], labels[test_index]
model1 = LinearSVC()
model2 = MultinomialNB()
model3 = LogisticRegression()
model4 = RandomForestClassifier()
model1.fit(X_train, y_train)
model2.fit(X_train, y_train)
model3.fit(X_train, y_train)
model4.fit(X_train, y_train)
result1 = model1.predict(X_val)
result2 = model2.predict(X_val)
result3 = model3.predict(X_val)
result4 = model4.predict(X_val)
temp_df = pd.DataFrame(data={
"id": X_val,
"ground_true": y_val,
"original_sentence": verbatim_train_remove_stop_words[test_index],
"pred_model1": result1,
"pred_model2": result2,
"pred_model3": result3,
"pred_model4": result4})
df = pd.concat([df, temp_df])

For loop and Linear regression

Good evening,
I would like to reiterate both a subsetting and a linear regression, over the same data frame.
#I get the unique codes of the articles
codes = np.unique(data["cod_id"])
#Split
X = data['price']
y = data["quantity"]
accuracy = []
for i in np.nditer(codes):
data = data.loc[df["cod_id"] == i]
#Arrange an if statement to avoid 0-element arrays, while splitting (80% train, 20% test)
if int(len(data)) <= 2:
X_train = X
y_train = y
# Test dataset
X_test = X
y_test = y
else:
t = 0.8
t = int(t*len(data))
#Split
t = int(t*len(data))
# Train dataset
X_train = X[:t]
y_train = y[:t]
# Test dataset
X_test = X[t:]
y_test = y[t:]
#Run the Algorithm
lr = linear_model.LinearRegression()
lr.fit(X_train, y_train)
predicted_test_tr = lr.predict(X_test)
pred_cost = (X_test["price"] * predicted_test_tr).sum()
real_cost = (X_test["price"] * y_test).sum()
delta = (pred_cost - owner_cost)/owner_cost
accuracy.append(delta)
But it reports a list "accuracy", as long as the "codes" one, but with the same value at each position
print(accuracy)
5.43234
5.43234
5.43234
...
How can I fix this issue?
Thank you

Resources