Shade text using seaborn and matplotlib?

Shade text using seaborn and matplotlib? - python-3.x

I have a sentence like say
25 August 2003 League of Extraordinary Gentlemen: Sean Connery is one of the all time greats I have been a fan of his since the 1950's. 25 August 2003 League of Extraordinary Gentlemen
I pass it through the openai sentiment code which gives me some neuron weights which can be equal or little greater then number of words.
Neuron weights are
[0.01258736, 0.03544582, 0.05184804, 0.05354257, 0.07339437,
0.07021661, 0.06993681, 0.06021424, 0.0601177 , 0.04100083,
0.03557627, 0.02574683, 0.02565657, 0.03435502, 0.04881989,
0.08868718, 0.06816255, 0.05957553, 0.06767794, 0.06561323,
0.06339648, 0.06271613, 0.06312297, 0.07370538, 0.08369936,
0.09008111, 0.09059132, 0.08732472, 0.08742133, 0.08792272,
0.08504769, 0.08541565, 0.09255819, 0.09240738, 0.09245031,
0.09080137, 0.08733468, 0.08705935, 0.09201239, 0.113047 ,
0.14285286, 0.15205048, 0.15249513, 0.14051639, 0.14070784,
0.14526351, 0.14548902, 0.12730363, 0.11916814, 0.11097522,
0.11390981, 0.12734678, 0.13625301, 0.13386811, 0.13413942,
0.13782364, 0.14033082, 0.14971626, 0.14988877, 0.14171578,
0.13999145, 0.1408006 , 0.1410009 , 0.13423227, 0.16819029,
0.18822579, 0.18462598, 0.18283379, 0.16304792, 0.1634682 ,
0.18733767, 0.22205424, 0.22615907, 0.22679318, 0.2353312 ,
0.24562076, 0.24771859, 0.24478345, 0.25780812, 0.25183237,
0.24660441, 0.2522405 , 0.26310056, 0.26156184, 0.26127928,
0.26154354, 0.2380443 , 0.2447366 , 0.24580643, 0.22959644,
0.23065038, 0.228564 , 0.23980206, 0.23410076, 0.40933537,
0.436683 , 0.5319608 , 0.5273239 , 0.54030097, 0.55781454,
0.5665511 , 0.58764166, 0.58651507, 0.5870301 , 0.5893866 ,
0.58905166, 0.58955604, 0.5872186 , 0.58744675, 0.58569545,
0.58279306, 0.58205146, 0.6251827 , 0.6278348 , 0.63121724,
0.7156403 , 0.715524 , 0.714875 , 0.71317464, 0.7630029 ,
0.75933087, 0.7571995 , 0.7563375 , 0.7583521 , 0.75923103,
0.8155783 , 0.8082132 , 0.8096348 , 0.8114364 , 0.82923543,
0.8229595 , 0.8196689 , 0.8070393 , 0.808637 , 0.82305557,
0.82719535, 0.8210828 , 0.8697561 , 0.8547278 , 0.85224617,
0.8521625 , 0.84694564, 0.8472206 , 0.8432255 , 0.8431826 ,
0.8394848 , 0.83804935, 0.83134645, 0.8234757 , 0.82382894,
0.82562804, 0.80014366, 0.7866942 , 0.78344023, 0.78955245,
0.7862923 , 0.7851586 , 0.7805863 , 0.780684 , 0.79073226,
0.79341674, 0.7970072 , 0.7966449 , 0.79455364, 0.7945448 ,
0.79476243, 0.7928985 , 0.79307675, 0.79677683, 0.79655904,
0.79619783, 0.7947823 , 0.7915144 , 0.7912799 , 0.795091 ,
0.8032384 , 0.810835 , 0.8084989 , 0.8094493 , 0.8045582 ,
0.80466574, 0.8074054 , 0.8075554 , 0.80178404, 0.7978776 ,
0.78742194, 0.8119776 , 0.8119776 , 0.8119776 , 0.8119776 ,
0.8119776 , 0.8119776 ]
The funda is that the text's background color should shades w.r.t. the neuron weights provided. (For positive weights green color for negative weights red color and some yellow when the weight value is near 0)
So for above the shading should be (Green for positive and red shade for negative)
But what it really plotting is
The function which is shading the text w.r.t. to neuron weights is
def plot_neuron_heatmap(text, values, n_limit=80, savename='fig1.png',
cell_height=0.325, cell_width=0.15, dpi=100):
text = text.replace('\n', '\\n')
text = np.array(list(text + ' ' * (-len(text) % n_limit)))
if len(values) > text.size:
values = np.array(values[:text.size])
else:
t = values
values = np.zeros(text.shape, dtype=np.int)
values[:len(t)] = t
text = text.reshape(-1, n_limit)
values = values.reshape(-1, n_limit)
mask = np.zeros(values.shape, dtype=np.bool)
mask.ravel()[values.size:] = True
mask = mask.reshape(-1, n_limit)
plt.figure(figsize=(cell_width * n_limit, cell_height * len(text)))
hmap = sns.heatmap(values, annot=text,mask=mask, fmt='', vmin=-5, vmax=5, cmap='RdYlGn', xticklabels=False, yticklabels=False, cbar=False)
plt.subplots_adjust()
plt.savefig(savename if savename else 'fig1.png', dpi=dpi)
Where I am wrong?
Above defintion refined by #Mad Physicist link

When you create your values array with np.zeros, you set dtype=np.int. So, even though you then replace the zeros with the actual floating-point data, they are being rounded to integers, because thats the dtype of the array. This is essentially setting them all to 0, since they are all less than 1.
You really want to keep them as floats, so if you instead change this line:
values = np.zeros(text.shape, dtype=np.int)
to
values = np.zeros(text.shape, dtype=np.float)
everything seems to work fine.

Related

Migrating XMobar to the new 0.17 standard

Just to make it clear, my XMobar uses UnsafeStdinReader and SpawnPipe to send information about workspaces right now. Here are the relevant portions of the configuration:
main = do
xmprocleft <- spawnPipe "xmobar -x 0 $HOME/.config/xmobar/xmobarrc0.hs"
xmonad $ docks $ ewmhFullscreen $ ewmh $ def
{ manageHook = myManageHook <+> manageDocks
, modMask = myModMask
, terminal = myTerminal
, startupHook = myStartupHook
, layoutHook = showWName' myShowWNameTheme $ myLayoutHook
, workspaces = myWorkspaces
, borderWidth = myBorderWidth
, normalBorderColor = myNormColor
, focusedBorderColor = myFocusColor
, logHook = dynamicLogWithPP $ xmobarPP
{ ppOutput = \x -> hPutStrLn xmprocleft x
, ppCurrent = xmobarColor "#f8f16a" "" . wrap "<fn=1>" "</fn>" -- Workspace that I am viewing now
, ppVisible = xmobarColor "#98be65" "" . wrap "<fn=1>" "</fn>" . clickable -- Workspace that is open on any monitor other than this one
, ppHidden = xmobarColor "#2ac3de" "" . wrap "<fn=1>" "</fn>" . clickable -- Hidden workspaces that have any open software in it but not open on any monitors
, ppHiddenNoWindows = xmobarColor "#c0caf5" "" . wrap "<fn=1>" "</fn>" . clickable -- Workspaces with no open softwares and not open on any monitors
, ppTitle = xmobarColor "#c0caf5" "" . shorten 60 -- Title of active window
, ppSep = "<fc=#444b6a> | </fc>" -- Separator character
, ppUrgent = xmobarColor "#EBCB8B" "" . wrap "!<fn=1>" "</fn>!" -- Urgent workspace
, ppExtras = [windowCount] -- # of windows current workspace
-- name of workspaces, current layout, current title of open software, number of open windows in current workspace
, ppOrder = \(ws:_:_:_) -> [ws] -- stopped showing the current layout, number of open programs in current workspace
}
} `additionalKeysP` myKeys
What I am trying to achieve
According to the XMonad Wiki, SpawnPipe is deprecated for the newer use of XMonadLog to send data to XMobar. I am trying to use Dynamic status bar using dynamicEasySBs according to XMonad.Hooks.StatusBar.PP and XMonad.Hooks.StatusBar.
I did make necessary changes on the XMobar config too. But, the configuration is a bit confusing for me. Has anyone yet made a working config using this new format?

While reading the new Tutorial I manage to Update my setup, this way.
This is how you want to write your xmobarrc
-- appearance
font = "xft:Fira Code:size=11:bold:antialias=true"
, bgColor = "#272727"
, fgColor = "#073642"
, position = Top
, border = BottomB
, borderColor = "#646464"
, textOffset = 11
-- layout
, sepChar = "%" -- delineator between plugin names and straight text
, alignSep = "}{" -- separator between left-right alignment
, template = " %XMonadLog% | %coretemp% | %memory% | %dynnetwork% }{%StdinReader% | %dropbox% | %RJTT% | %date% || %kbd% "
-- general behavior
, lowerOnStart = True -- send to bottom of window stack on start
, hideOnStart = False -- start with window unmapped (hidden)
, allDesktops = True -- show on all desktops
, overrideRedirect = True -- set the Override Redirect flag (Xlib)
, pickBroadest = False -- choose widest display (multi-monitor)
, persistent = True -- enable/disable hiding (True = disabled)
- plugins
-- Numbers can be automatically colored according to their value. xmobar
-- decides color based on a three-tier/two-cutoff system, controlled by
-- command options:
-- --Low sets the low cutoff
-- --High sets the high cutoff
--
-- --low sets the color below --Low cutoff
-- --normal sets the color between --Low and --High cutoffs
-- --High sets the color above --High cutoff
--
-- The --template option controls how the plugin is displayed. Text
-- color can be set by enclosing in <fc></fc> tags. For more details
-- see http://projects.haskell.org/xmobar/#system-monitor-plugins.
, commands =
-- weather monitor
[ Run Weather "RJTT" [ "--template", "<skyCondition> | <fc=#4682B4><tempC></fc>°C | <fc=#4682B4><rh></fc>% | <fc=#4682B4><pressure></fc>hPa"
] 36000
-- network activity monitor (dynamic interface resolution)
, Run DynNetwork [ "--template" , "<dev>: <tx>kB/s|<rx>kB/s"
, "--Low" , "1000" -- units: kB/s
, "--High" , "5000" -- units: kB/s
, "-m" , "4"
, "--low" , "darkgreen"
, "--normal" , "darkorange"
, "--high" , "darkred"
] 10
-- cpu activity monitor
, Run MultiCpu [ "--template" , "Cpu: <total0> <total1> <total2> <total3> <total4> <total5> <total6> <total7>%"
, "--Low" , "50" -- units: %
, "--High" , "85" -- units: %
, "-p" , "3"
, "--low" , "darkgreen"
, "--normal" , "darkorange"
, "--high" , "darkred"
] 10
-- cpu core temperature monitor
, Run CoreTemp [ "--template" , "Temp: <core0> <core1> <core2> <core3>°C"
, "--Low" , "70" -- units: °C
, "--High" , "80" -- units: °C
, "--low" , "darkgreen"
, "--normal" , "darkorange"
, "--high" , "darkred"
] 50
-- memory usage monitor
, Run Memory [ "--template" ,"Mem: <usedratio>%"
, "--Low" , "20" -- units: %
, "--High" , "90" -- units: %
, "--low" , "darkgreen"
, "--normal" , "darkorange"
, "--high" , "darkred"
] 10
-- battery monitor
, Run Battery [ "--template" , "Batt: <left>% - <timeleft>"
, "--Low" , "10" -- units: %
, "--High" , "80" -- units: %
, "--low" , "darkred"
, "--normal" , "darkorange"
, "--high" , "darkgreen"
, "--" -- battery specific options
-- discharging status
, "-o" , "<left>% (<timeleft>)"
-- AC "on" status
, "-O" , "<fc=#dAA520>Charging</fc>"
-- charged status
, "-i" , "<fc=#006000>Charged</fc>"
] 50
-- time and date indicator
-- (%F = y-m-d date, %a = day of week, %T = h:m:s time)
, Run Date "<fc=#ABABAB>%F (%a) %T</fc>" "date" 10
-- Xmonad Xmobar Constructor
, Run XMonadLog
]
And this should be your xmonad.hs
main :: IO ()
main = xmonad
. ewmhFullscreen
. ewmh
. withEasySB (statusBarProp "xmobar" (pure def)) defToggleStrutsKey
$ myConfig
According with this documentation about dinamicLog
DynamicLog API is frozen and users are encouraged to migrate to these modern replacements.
That is then XMonad.Hooks.StatusBar
This module provides a composable interface for (re)starting these status bars and logging to them, either using pipes or X properties. There's also XMonad.Hooks.StatusBar.PP which provides an abstraction and some utilities for customization what is logged to a status bar. Together, these are a modern replacement for XMonad.Hooks.DynamicLog, which is now just a compatibility wrapper.

'numpy.ndarray' object has no attribute 'sqrt'

I am trying to obtain the std of this output using numpy.std()
[[array([0.92473118, 0.94117647]), array([0.98850575, 0.69565217]), array([0.95555556, 0.8 ]), 0.923030303030303], [array([0.85555556, 0.8 ]), array([0.95061728, 0.55172414]), array([0.9005848 , 0.65306122]), 0.8353285811932428]]
To obtain that output I used the code (it goes through a loop, in this example, it went through two iterations)
precision, recall, fscore, support = precision_recall_fscore_support(np.argmax(y_test_0, axis=-1), np.argmax(probas_, axis=-1))
eval_test_metric = [precision, recall, fscore, avg_fscore]
test_metric1.append(eval_test_metric)
std_matrix1 = np.std(test_metric1, axis=0)
I would like to get an output similar in structure to when I do np.mean(), Please excuse the 'precision', 'recall' I just made that in my code for clarity.
dr_test_metric = dict(zip(['specificity avg', 'sensitivity avg', 'ppv avg', 'npv avg'], np.mean(test_metric2, axis=0)))
print(dr_test_metric,'\n')
output, (where 0.89014337 in 'precision avg': array([0.89014337, 0.87058824] is the average of precision of class 0 for my model and 0.8705 is the average of the precision for class 1 for my model)
{'precision avg': array([0.89014337, 0.87058824]), 'recall avg': array([0.96956152, 0.62368816]), 'fscore avg': array([0.92807018, 0.72653061]), 'avg_fscore avg': 0.8791794421117729}

Numpy Array value setting issues

I have a data set that spans a certain length of time and data points for each of these time points. I would like to create a much more detailed timescale and fill the empty data points to zero. I wrote a piece of code to do this but it isn't doing what I want it to. I tried a sample case though and it seems to work. Below are the two codes.
This piece of code does not do what I want it to.
import numpy as np
TD_t = np.array([36000, 36500, 37000, 37500, 38000, 38500, 39000, 39500, 40000, 40500, 41000, 41500, 42000, 42500,
43000, 43500, 44000, 44500, 45000, 45500, 46000, 46500, 47000, 47500, 48000, 48500, 49000, 49500,
50000, 50500, 51000, 51500, 52000, 52500, 53000, 53500, 54000, 54500, 55000, 55500, 56000, 56500,
57000, 57500, 58000, 58500, 59000, 59500, 60000, 60500, 61000, 61500, 62000, 62500, 63000, 63500,
64000, 64500, 65000, 65500, 66000])
TD_d = np.array([-0.05466527, -0.04238242, -0.04477601, -0.02453717, -0.01662798, -0.02548617, -0.02339215,
-0.01186576, -0.0029057 , -0.01094671, -0.0095005 , -0.0190277 , -0.01215644, -0.01997112,
-0.01384497, -0.01610656, -0.01927564, -0.02119056, -0.011634 , -0.00544096, -0.00046568,
-0.0017769 , -0.0007341, 0.00193066, 0.01359107, 0.02054919, 0.01420335, 0.01550565,
0.0132394 , 0.01371563, 0.01959774, 0.0165316 , 0.01881992, 0.01554435, 0.01409003,
0.01898334, 0.02300266, 0.03045158, 0.02869013, 0.0238423 , 0.02902356, 0.02568908,
0.02954539, 0.02537967, 0.02927247, 0.02138605, 0.02815635, 0.02733237, 0.03321588,
0.03063803, 0.03783137, 0.04110955, 0.0451221 , 0.04646263, 0.04472884, 0.04935833,
0.03372911, 0.04031406, 0.04165237, 0.03940343, 0.03805504])
time = np.arange(0, 100001,1)
data = np.zeros_like(time)
for i in range(0, len(TD_t)):
t = TD_t[i]
data[t] = TD_d[i]
print(i,t,TD_d[i],data[t])
But for some reason this code works.
import numpy
nums = numpy.array([0,1,2,3])
data = numpy.zeros_like(nums)
data[0] = nums[2]
data[0], nums[2]
Any help will be much appreciated!!

It's because the dtype of data is being set to int64, and so when you try to reassign one of the data elements, it gets rounded to zero.
Try changing the line to:
data = np.zeros_like(time, dtype=float)
and it should work (or use whatever dtype the TD_d array is)

Python - How to save spectrogram output in a text file?

My code calculates the spectrogram for x, y and z.
I calculate the magnitude of the three axis first, then calculate the spectrogram.
I need to take the spectrogram output and save it as one column in an array to use it as an input for a deep learning model.
This is my code:
dataset = np.loadtxt("trainingdatasetMAG.txt", delimiter=",")
X = dataset[:,0:6]
Y = dataset[:,6]
fake_size = 1415684
time = np.arange(fake_size)/1000 # 1kHz
base_freq = 2 * np.pi * 100
magnitude = dataset[:,5]
plt.title('xyz_magnitude')
ls=(plt.specgram(magnitude, Fs=1000))
This is my dataset, whose headers are (patientno, time/Msecond, x-axis, y-axis, z-axis, xyz_magnitude, label)
1,15,70,39,-970,947321,0
1,31,70,39,-970,947321,0
1,46,60,49,-960,927601,0
1,62,60,49,-960,927601,0
1,78,50,39,-960,925621,0
1,93,50,39,-960,925621,0
and this is the output of the spectrogram that needs to be more efficient
(array([[ 1.52494154e+11, 1.52811638e+11, 1.52565040e+11, ...,
1.47778892e+11, 1.46781213e+11, 1.46678951e+11],
[ 7.69589176e+10, 7.73638333e+10, 7.76935891e+10, ...,
7.48498747e+10, 7.40088248e+10, 7.40343108e+10],
[ 6.32683585e+04, 1.58170271e+06, 6.11287648e+06, ...,
5.06690834e+05, 3.31360693e+05, 7.04757400e+05],
...,
[ 7.79589127e+05, 8.09843763e+04, 2.52907491e+05, ...,
2.48520301e+05, 2.11734697e+05, 2.50917758e+05],
[ 9.41199946e+05, 4.98371406e+05, 1.29328139e+06, ...,
2.56729806e+05, 3.45253951e+05, 3.51932417e+05],
[ 4.36846676e+05, 1.24123764e+06, 9.20694394e+05, ...,
8.35807658e+04, 8.36986905e+05, 3.57807267e+04]]),
array([ 0. , 3.90625, 7.8125 , 11.71875, 15.625 ,
19.53125, 23.4375 , 27.34375, 31.25 , 35.15625,
39.0625 , 42.96875, 46.875 , 50.78125, 54.6875 ,
58.59375, 62.5 , 66.40625, 70.3125 , 74.21875,
78.125 , 82.03125, 85.9375 , 89.84375, 93.75 ,
97.65625, 101.5625 , 105.46875, 109.375 , 113.28125,
117.1875 , 121.09375, 125. , 128.90625, 132.8125 ,
136.71875, 140.625 , 144.53125, 148.4375 , 152.34375,
156.25 , 160.15625, 164.0625 , 167.96875, 171.875 ,
175.78125, 179.6875 , 183.59375, 187.5 , 191.40625,
195.3125 , 199.21875, 203.125 , 207.03125, 210.9375 ,
214.84375, 218.75 , 222.65625, 226.5625 , 230.46875,
234.375 , 238.28125, 242.1875 , 246.09375, 250. ,
253.90625, 257.8125 , 261.71875, 265.625 , 269.53125,
273.4375 , 277.34375, 281.25 , 285.15625, 289.0625 ,
292.96875, 296.875 , 300.78125, 304.6875 , 308.59375,
312.5 , 316.40625, 320.3125 , 324.21875, 328.125 ,
332.03125, 335.9375 , 339.84375, 343.75 , 347.65625,
351.5625 , 355.46875, 359.375 , 363.28125, 367.1875 ,
371.09375, 375. , 378.90625, 382.8125 , 386.71875,
390.625 , 394.53125, 398.4375 , 402.34375, 406.25 ,
410.15625, 414.0625 , 417.96875, 421.875 , 425.78125,
429.6875 , 433.59375, 437.5 , 441.40625, 445.3125 ,
449.21875, 453.125 , 457.03125, 460.9375 , 464.84375,
468.75 , 472.65625, 476.5625 , 480.46875, 484.375 ,
488.28125, 492.1875 , 496.09375, 500. ]),
array([1.28000000e-01, 2.56000000e-01, 3.84000000e-01, ...,
1.41529600e+03, 1.41542400e+03, 1.41555200e+03]),
<matplotlib.image.AxesImage object at 0x000002161A78F898>)

Matplotlib function specgram has 4 outputs :
spectrum : 2-D array
Columns are the periodograms of successive segments.
freqs : 1-D array
The frequencies corresponding to the rows in spectrum.
t : 1-D array
The times corresponding to midpoints of segments (i.e., the columns in spectrum).
im : instance of class AxesImage
From your code :
ls=plt.specgram(magnitude, Fs=1000)
So ls[0] contains the spectrum that you want to export in txt, you can write it in a file with this piece of code :
with open('spectrogram.txt', 'w+b') as ffile:
for spectros in ls[0]:
for spectro in spectros:
lline = str(spectro) + ' \t'
ffile.write(lline)
# one row written
ffile.write(' \n')
plt.
However before, ls[0] contains the power spectral density of NFFT=256 segments with 128 samples of overlap - by default -, so you'll have NFFT/2 +1 = 129 rows. So each columns contains the PSD at time T and each line contains a time series of the frequency concerned. To have the FFT at instant T slice it :
T_idx = 10
psd_ls[:,T_idx]

Random Forest feature importance: how many are actually used?

I use RF twice in a row.
First, I fit it using max_features='auto' and the whole dataset (109 feature), in order to perform features selection.
The following is RandomForestClassifier.feature_importances_, it correctly gives me 109 score per each feature:
[0.00118087, 0.01268531, 0.0017589 , 0.01614814, 0.01105567,
0.0146838 , 0.0187875 , 0.0190427 , 0.01429976, 0.01311706,
0.01702717, 0.00901344, 0.01044047, 0.00932331, 0.01211333,
0.01271825, 0.0095337 , 0.00985686, 0.00952823, 0.01165877,
0.00193286, 0.0012602 , 0.00208145, 0.00203459, 0.00229907,
0.00242616, 0.00051358, 0.00071606, 0.00975515, 0.00171034,
0.01134927, 0.00687018, 0.00987706, 0.01507474, 0.01223525,
0.01170495, 0.00928417, 0.01083082, 0.01302036, 0.01002457,
0.00894818, 0.00833564, 0.00930602, 0.01100774, 0.00818604,
0.00675784, 0.00740617, 0.00185461, 0.00119627, 0.00159034,
0.00154336, 0.00478926, 0.00200773, 0.00063574, 0.00065675,
0.01104192, 0.00246746, 0.01663812, 0.01041134, 0.01401842,
0.02038318, 0.0202834 , 0.01290935, 0.01476593, 0.0108275 ,
0.0118773 , 0.01050919, 0.0111477 , 0.00684507, 0.01170021,
0.01291888, 0.00963295, 0.01161876, 0.00756015, 0.00178329,
0.00065709, 0. , 0.00246064, 0.00217982, 0.00305187,
0.00061284, 0.00063431, 0.01963523, 0.00265208, 0.01543552,
0.0176546 , 0.01443356, 0.01834896, 0.01385694, 0.01320648,
0.00966011, 0.0148321 , 0.01574166, 0.0167107 , 0.00791634,
0.01121442, 0.02171706, 0.01855552, 0.0257449 , 0.02925843,
0.01789742, 0. , 0. , 0.00379275, 0.0024365 ,
0.00333905, 0.00238971, 0.00068355, 0.00075399]
Then, I transform the dataset over the previous fit which should reduce its dimensionality, and then i re-fit RF over it.
Given max_features='auto' and the 109 feats, I would expect to have in total ~10 features instead, calling rf.feats_importance_, returns more (62):
[ 0.01261971, 0.02003921, 0.00961297, 0.02505467, 0.02038449,
0.02353745, 0.01893777, 0.01932577, 0.01681398, 0.01464485,
0.01672119, 0.00748981, 0.01109461, 0.01116948, 0.0087081 ,
0.01056344, 0.00971319, 0.01532258, 0.0167348 , 0.01601214,
0.01522208, 0.01625487, 0.01653784, 0.01483562, 0.01602748,
0.01522369, 0.01581573, 0.01406688, 0.01269036, 0.00884105,
0.02538574, 0.00637611, 0.01928382, 0.02061512, 0.02566056,
0.02180902, 0.01537295, 0.01796305, 0.01171095, 0.01179759,
0.01371328, 0.00811729, 0.01060708, 0.015717 , 0.01067911,
0.01773623, 0.0169396 , 0.0226369 , 0.01547827, 0.01499467,
0.01356075, 0.01040735, 0.01360752, 0.01754145, 0.01446933,
0.01845195, 0.0190799 , 0.02608652, 0.02095663, 0.02939744,
0.01870901, 0.02512201]
Why? Shouldn't it returns just ~10 features importances?

You misunderstood the meaning of max_features, which is
The number of features to consider when looking for the best split
It is not the number of features when transforming the data.
It is the threshold in transform method that determines the most important features.
threshold : string, float or None, optional (default=None)
The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if available, the object attribute threshold is used. Otherwise, “mean” is used by default.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Shade text using seaborn and matplotlib? - python-3.x

Related

Migrating XMobar to the new 0.17 standard

'numpy.ndarray' object has no attribute 'sqrt'

Numpy Array value setting issues

Python - How to save spectrogram output in a text file?

Random Forest feature importance: how many are actually used?

Categories

Resources