I am running the survcomp package and wonder about the y and z values. I have multiple clinical data:
> colnames(ClinicalDataHep)
[1] "follow_upTime"
[2] "RecurrenceTime"
[3] "Age"
[4] "OS"
[5] "Survival_dead0_alive1"
[6] "Tumour_size"
[7] "HVB_preop"
[8] "HCV_preop"
[9] "HBD_preop"
[10] "Cirrhosis_preop"
[11] "Status:_no_recurrence-0._recurrence-1_"
[12] "Surgery:_resection-1._tx-2;_rfa-3;_resection+rfa-4;tx+rfa-5"
[13] "new_time"
[14] "new_death"
[15] "death_event"
Is it corrent to use Overall Survival as the y-variable and dead/alive as the z variable?
cindexall.Hep.serum <- as.data.frame(t(apply(X=matrix_cpm, MARGIN=1, function(x, y, z) {
tt <- concordance.index(x=x, surv.time=y, surv.event=z, method="noether", na.rm=TRUE);
return(c("cindex"=tt$c.index, "cindex.se"=tt$se, "lower"=tt$lower, "upper"=tt$upper,"p.value"=tt$p.value)); },
y=ClinicalData$OS, z=ClinicalData$Survival_dead0_alive1)))
I'm trying to perform some benchmarking in clustering by various frameworks, But in the case of porting Scikit-learn from python to julia, I can't make it even work. Here is the code:
using PyCall
Train = rand(Float64, 1611, 10)
def Silhouette_py(Train, k):
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans
model = KMeans(n_clusters=k)
return silhouette_score(Train, model.labels_)
function test(Train, k)
py"Silhouette_py"(Train, k)
The following code leads to an error:
julia> test(Train, 3)
ERROR: PyError ($(Expr(:escape, :(ccall(#= C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 =# #pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'AttributeError'>
AttributeError("'KMeans' object has no attribute 'labels_'")
File "C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyeval.jl", line 5, in Silouhette_py
const _namespaces = Dict{Module,PyDict{String,PyObject,true}}()
[1] pyerr_check
# C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:62 [inlined]
[2] pyerr_check
# C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:66 [inlined]
[3] _handle_error(msg::String)
# PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:83
[4] macro expansion
# C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\exception.jl:97 [inlined]
[5] #107
# C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:43 [inlined]
[6] disable_sigint
# .\c.jl:473 [inlined]
[7] __pycall!
# C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:42 [inlined]
[8] _pycall!(ret::PyObject, o::PyObject, args::Tuple{Matrix{Float64}, Int64}, nargs::Int64, kw::Ptr{Nothing})
# PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:29
[9] _pycall!(ret::PyObject, o::PyObject, args::Tuple{Matrix{Float64}, Int64}, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
# PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:11
[10] (::PyObject)(::Matrix{Float64}, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(),
# PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
[11] (::PyObject)(::Matrix{Float64}, ::Vararg{Any})
# PyCall C:\Users\Shayan\.julia\packages\PyCall\ygXW2\src\pyfncall.jl:86
[12] t(Train::Matrix{Float64}, k::Int64)
# Main .\REPL[12]:2
[13] top-level scope
# REPL[20]:1
The libpython and related stuff configuration:
julia> PyCall.libpython
julia> PyCall.pyversion
julia> PyCall.current_python()
Further tests
But if I say:
julia> sk = pyimport("sklearn")
julia> model = sk.cluster.KMeans(3)
PyObject KMeans(n_clusters=3)
julia> model.fit(Train)
sys:1: ConvergenceWarning: Number of distinct clusters (1) found smaller than n_clusters (3). Possibly due to duplicate points in X.
PyObject KMeans(n_clusters=3)
julia> model.labels_
1611-element Vector{Int32}:
But I need it to work in a function. As you can see, it doesn't throw AttributeError("'KMeans' object has no attribute 'labels_'") anymore in this case.
It seems this would work:
KMeans = pyimport("sklearn.cluster").KMeans
silhouette_score = pyimport("sklearn.metric").silhouette_score
Train = rand(Float64, 1611, 10);
function test(Train, k)
model = KMeans(k)
return silhouette_score(Train, model.labels_)
julia> test(Train, 3)
Situation: I am trying to use XGBoost classifier, however this error pops up to me:
"ValueError: Invalid classes inferred from unique values of y. Expected: [0 1 2 ... 1387 1388 1389], got [0 1 2 ... 18609 24127 41850]".
Unlike this solved one: Invalid classes inferred from unique values of `y`. Expected: [0 1 2 3 4 5], got [1 2 3 4 5 6], it seems that I have a different scenario which is about not starting from 0.
X = data_concat
y = data_concat[['forward_count','comment_count','like_count']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=72)
#Train, test split
print ('Train set:', X_train.shape, y_train.shape) #Check the size after split
print ('Test set:', X_test.shape, y_test.shape)
xgb = XGBClassifier()
clf = xgb.fit(X_train, y_train, eval_metric='auc') #HERE IS WHERE GET THE ERROR
The Datafrme and frame info is like this:
DataFrame Info.
I have adopted different y, meaning when y has less or more columns, the list "[0 1 2 ... 1387 1388 1389]" will simultaneously shrink or expand.
If you need further info, please let me know. Appreciate your help :)
Need to transform the y_train value to fit xgboost, it starts from 0 but not 1.
Here is the code:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)
I am doing deep learning using Keras in Rstudio.I copy and paste this link https://tensorflow.rstudio.com/tutorials/beginners/basic-ml/tutorial_basic_regression/
boston_housing <- dataset_boston_housing()
c(train_data, train_labels) %<-% boston_housing$train
c(test_data, test_labels) %<-% boston_housing$test
paste0("Training entries: ", length(train_data), ", labels: ", length(train_labels))
train_data[1, ] # Display sample features, notice the different scales
column_names <- c('CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE',
'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT')
train_df <- train_data %>%
as_tibble(.name_repair = "minimal") %>%
setNames(column_names) %>%
mutate(label = train_labels)
test_df <- test_data %>%
as_tibble(.name_repair = "minimal") %>%
setNames(column_names) %>%
mutate(label = test_labels)
train_labels[1:10] # Display first 10 entries
spec <- feature_spec(train_df, label ~ . ) %>%
step_numeric_column(all_numeric(), normalizer_fn = scaler_standard())
spec <- fit(spec)
layer <- layer_dense_features(
feature_columns = dense_features(spec),
dtype = tf$float32
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: ('We expected a dictionary here. Instead we got: ', CRIM ZN INDUS CHAS NOX ... TAX PTRATIO B LSTAT label
0 1.23247 0.0 8.14 0.0 0.5380 ... 307.0 21.0 396.90 18.72 15.2
1 0.02177 82.5 2.03 0.0 0.4150 ... 348.0 14.7 395.38 3.11 42.3
Can you please try the fix mentioned here.
Provided the solution below as well if in case the link is broken -
To install the fix you should be sure to close all R sessions then open a fresh R session and execute:
The reason you need to close all R sessions is that windows shared libraries won't be successfully overwritten if they are in use during the installation.
Hope this works and fixes the issue you are facing.
I am getting unhashable type: 'numpy.ndarray' error. so I cast the df_subset , 'Views' to int,however, it is returning object
here is the script:
tsne = TSNE(n_components=2, verbose=1, perplexity=20, n_iter=1000)
tsne_results = tsne.fit_transform(logits_list)
df_subset = pd.DataFrame({'X':tsne_results[:,0], 'Y':tsne_results[:,1], 'Views':targets})
df_subset.astype({'Views': 'int'}).dtypes
colors = {'A2CH':'red', 'A3CH':'green', 'A4CH_LV':'blue', 'A4CH_RV':'cyan', 'A5CH':'magneta', 'Apical_MV_LA_IAS':'yellow',
'PLAX_TV':'black', 'PLAX_full':'white', 'PLAX_valves':'orange', 'PSAX_AV':'purple', 'PSAX_LV':'dodgerblue', 'Subcostal_IVC':'lightgreen', 'Subcostal_heart':'darkcyan', 'Suprasternal':'grey'}
ax = sns.scatterplot(x= "X", y= "Y", hue='Views', legend = 'full',palette = colors, data=df_subset)
here is a print of df_subset and dtype:
X Y Views
0 13.208739 -19.657906 [11]
1 7.932375 -31.547863 [6]
2 -3.896450 -23.075047 [9]
3 -11.836237 -12.138339 [9]
4 -8.077571 17.220371 [11]
5 9.463497 23.756912 [2]
6 8.354083 -47.790867 [10]
7 -2.848731 -0.220144 [9]
8 25.724466 -29.862696 [9]
9 -26.956612 -8.361418 [9]
10 -16.011475 2.309184 [7]
11 16.193329 -0.280985 [8]
12 5.060284 -9.906323 [9]
13 37.827713 -16.174528 [4]
14 -5.971475 -39.845860 [7]
15 6.608039 9.085782 [12]
16 -20.108206 -26.253906 [8]
17 32.851559 0.332044 [2]
18 23.818949 13.762548 [2]
19 23.625357 -12.107020 [3]
X float32
Y float32
Views object
dtype: object
I assume I am getting the unhashable type: 'numpy.ndarray' error because of object type? Any help would be appreciated.
.astype() returns a copy so it should work if you do
df_subset = df_subset.astype({'Views': int})
I'm studying deep-learning.
I'm making figure classifier: circle, rectangle, triangle, pentagon, star. And one-hot-encoded into label2idx = dict(rectangle=0, circle=1, pentagon=2, star=3, triangle=4)
But every learning rates per epoch are same and it do not learn about the image.
I made a Layer with using Relu function for activation function, Affine for each layer, Softmax for the last layer, and using Adam to optimizing the gradients.
I have totally 234 RGB images to learn, which has created on window paint 2D tool and it is 128 * 128 size but not using the whole canvas to draw the figure.
And the picture looks like:
The train result. left [] is predict, and the right [] is answer lable(I picked random images to print predict value and answer lable).:
epoch: 0.49572649572649574
[ 0.3149641 -0.01454905 -0.23183 -0.2493432 0.11655246] [0 0 0 0 1]
epoch: 0.6837606837606838
[ 1.67341673 0.27887525 -1.09800398 -1.12649948 -0.39533065] [1 0 0 0 0]
epoch: 0.7094017094017094
[ 0.93106499 1.49599772 -0.98549052 -1.20471573 -0.24997779] [0 1 0 0 0]
epoch: 0.7905982905982906
[ 0.48447043 -0.05460748 -0.23526179 -0.22869489 0.05468969] [1 0 0 0 0]
epoch: 0.9230769230769231
[14.13835867 0.32432293 -5.01623202 -6.62469261 -3.21594355] [1 0 0 0 0]
epoch: 0.9529914529914529
[ 1.61248239 -0.47768294 -0.41580036 -0.71899219 -0.0901478 ] [1 0 0 0 0]
epoch: 0.9572649572649573
[ 5.93142154 -1.16719891 -1.3656573 -2.19785097 -1.31258801] [1 0 0 0 0]
epoch: 0.9700854700854701
[ 7.42198941 -0.85870225 -2.12027192 -2.81081263 -1.83810873] [1 0 0 0 0]
I think the more it learn, prediction should like [ 0.00143 0.09357 0.352 0.3 0.253 ] [ 1 0 0 0 0 ], which means answer index should be close to 0, but it does not.
Even the train accuracy sometimes goes to 1.0 ( 100% ).
I'm loading and normalizing the images with below codes.
#data_list = data_list = glob('dataset\\training\\*\\*.jpg')
dataset['train_img'] = _load_img()
def _load_img():
data = [np.array(Image.open(v)) for v in data_list]
a = np.array(data)
a = a.reshape(-1, img_size * 3)
return a
for v in dataset:
dataset['train_img'] = dataset['train_img'].astype(np.float32)
dataset['train_img'] /= dataset['train_img'].max()
dataset['train_img'] -= dataset['train_img'].mean(axis=1).reshape(len(dataset['train_img']), 1)
I let the images to gray scale with Image.open(v).convert('LA')
and checking my prediction value, and it's example:
[-3.98576886e-04 3.41216374e-05] [1 0]
[ 0.00698861 -0.01111879] [1 0]
[-0.42003415 0.42222863] [0 1]
still not learning about the images. I removed 3 figures to test it, so I just have rectangle, and triangle total 252 images ( I drew more imges. )
And the prediction value is usually like opposite value( 3.1323, -3.1323 or 3.1323, -3.1303 ), I cannot figure out the reason.
Not just increasing numerical accuracy, when I use SGD for optimizer, the accuracy do not increase. Just same accuracy.
[ 0.02090227 -0.02085848] [1 0]
epoch: 0.5873015873015873
[ 0.03058879 -0.03086193] [0 1]
epoch: 0.5873015873015873
[ 0.04006064 -0.04004988] [1 0]
[ 0.04545139 -0.04547538] [1 0]
epoch: 0.5873015873015873
[ 0.05605123 -0.05595288] [0 1]
epoch: 0.5873015873015873
[ 0.06495255 -0.06500597] [1 0]
epoch: 0.5873015873015873
Yes. Your model is performing pretty well. The problem is not related to normalization(not even a problem). The model actually predicted outside of 0,1 which means the model is really confident.
The model will not try to optimize towards [1,0,0,0] because when it calculates the loss, it will firstly clip the values.
Hope this helps!