code produces a 2d histogram but the results dont match with hist2d - python-3.x

I am trying to write a histogram builder to construct a 2d histogram for my assignment work. This is [my code][1]:
def Build2DHistogramClassifier(X1,X2,T,B,x1min,x1max,x2min,x2max):
HF=np.zeros((B,B),dtype='int');#initialising a empty array of integer type
HM=np.zeros((B,B),dtype='int');
bin_row_indices=(np.round(((B-1)*(X1-x1min)/(x1max-x1min)))).astype('int32');"""this logic decides which bin the value goes into"""
bin_column_indices=(np.round(((B-1)*(X2-x2min)/(x2max-x2min)))).astype('int32');"""np.round-->applies the formula to all the values in the array"""
for i,(r,c) in enumerate(zip(bin_row_indices, bin_column_indices)):
"""enumerate-->if we put array or list into it gives output with index/count i """
if T[i]=='Female':
HF[r,c]+=1;
else:
HM[r,c]+=1;
return [HF, HM]
but the problem is that the results( count in each bin) i am getting is not matching the what i get from using hist2d function in numpy( i passed the same bin size)
i am sorry if my code is not in the right format. Please click on the hyperlink to a gist i created with the same code.
what is the mistake in my code?
how do i correct it?
thanks

By rounding when assigning to bins you are treating the bins as bin centers. The numpy convention is to use them as bin edges.
Remove the two calls to round() from your code and change B-1 to B. You should now get the same results with your function and with np.histogram2d.

Related

Is there a pandas function that can create a dataframe of the mean, median, and mode of selected columns?

My attempt:
# Compute the mean, median and variance for the variables sph, acous and dur. Compare their level of variability.
sad_mean = dat_songs[['spch', 'acous', 'dur']].mean()
sad_mode = dat_songs[['spch', 'acous', 'dur']].mode()
sad_median = dat_songs[['spch', 'acous', 'dur']].median()
sad_mmm = pd.DataFrame({'mean':sad_mean, 'median':sad_median, 'mode':sad_mode})
sad_mmm
Which outputs this
First of all, the median column is not right at all and want to know how to fix that too.
Secondly, I feel like I have seen some quicker or shorter way to do this with a simple function with pandas.
My data head for reference
Simply try, dat_songs.describe(). Descriptive Statistics will be present for all the numerical columns.
For selected columns.
dat_songs[['spch', 'acous', 'dur']].describe()

Tensorflow clamp values outside specific range

I have been using tensorflow to implement a Convolutional neural network,
I have a requirement that the the output values be less than a given value MAX_VAL
I tried creating a matrix filled with MAX_VAL and then using tf.select and tf.greater :
filled = tf.fill(output.get_shape(),MAX_VAL)
modoutput = tf.select(tf.greater(output, filled), filled, output)
But this doesn't work because the shape of output is not known statically:
It is [?, 30] and tf.fill requires an explicit shape.
Any idea how do i implement this?
There is an alternative solution that uses tf.fill() like your initial version. Instead of using Tensor.get_shape() to get the static shape of output, use the tf.shape() operator to get the dynamic shape of output when the step runs:
output = ...
filled = tf.fill(tf.shape(output), MAX_VAL)
modoutput = tf.select(tf.greater(output, filled), filled, output)
(Note also that the tf.clip_by_value() operator might be useful for your purposes.)
I figured out a way to do it.
Instead of using tf.fill I used tf.ones_like
filled = MAX_VAL*tf.ones_like(output)
modoutput = tf.select(tf.greater(output, filled), filled, output)
Please mention if there is a faster or better way to do this is possible.

Why am I getting an array of NANs when trying to plot a map with D3.js?

I am tring to plot a map with d3.js using GeoJSON, but the paths generated look like this:
<path d="MNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,‌​NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,N‌​aNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,Na‌​NLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaN‌​LNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNL‌​NaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLN‌​aN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNZ">
the code and data are in this Gist:
https://gist.github.com/4157853
I can load the data just fine on QGIS.
Does any one know what is causing this?
The way you have specified the offset in the Mercator projection doesn't seem right. The projection.translate method expects a two element array:
https://github.com/mbostock/d3/wiki/Geo-Projections#wiki-mercator_translate
So instead of:
proj.translate(-43.8,-23.2).scale(10);
you would need to say:
proj.translate([-43.8,-23.2]).scale(10);
-- edit --
See source of projection.translate: https://github.com/mbostock/d3/blob/3.0/src/geo/projection.js#L139
projection.translate = function(_) {
if (!arguments.length) return [x, y];
x = +_[0];
y = +_[1];
return reset();
};
If the argument _ is not an array then +_[0] will return a NaN and therefore the x and y will become NaNs. (This is because trying to get one element from a number (e.g. 213[0]) returns undefined and casting undefined to a number (e.g. +undefined) yields NaN.)
If the code you posted in the gist is everything you're trying to run, then the data you show in data.json is not being loaded anywhere. Anyway, your draw function is acting on the data defined by the variable map (line 16), which refers to a simulation variable which isn't set anywhere. And even if it did, line 34 then refers to a features property of the object passed in as json, which map does not have.
In summary, you need to pass the JSON you posted in the gist to your draw function. Then it might well work. If you don't pass in valid data to the d3 SVG helpers, you'll get a bunch of NaN out.

Loop with matrices created with assign function in R project

I created several matrices with the assign function as follows:
for (i in 2:105) { # Loop for creating and filling matrices
(assign(paste("m",i,sep=""),Datos[(x[i-1]+1):x[i],1:14]))
}
This give me several matrices... from m2 to m105... which is exactly what i wanted because i can extract and call this matrices with their index like m2[i,j] or m65[i,j] etc.
My problem is that I want to make a loop which include all my "m" matrices, but I don't know what could be the right code to do so because I need something like:
paste("m",i,"[i,j]",sep="") to return m2[i,j]...m3[i,j] ...... m105[i,j] and do the loop over this , but clearly the paste function returns a string and don't recognize m2.... m105 like matrices..... it returns m2[i,j] as text.
What should I do ?
Thank you very much !
regards
You have to use get:
get(paste("m", i, sep=""))[i,j]

Access list element using get()

I'm trying to use get() to access a list element in R, but am getting an error.
example.list <- list()
example.list$attribute <- c("test")
get("example.list") # Works just fine
get("example.list$attribute") # breaks
## Error in get("example.list$attribute") :
## object 'example.list$attribute' not found
Any tips? I am looping over a vector of strings which identify the list names, and this would be really useful.
Here's the incantation that you are probably looking for:
get("attribute", example.list)
# [1] "test"
Or perhaps, for your situation, this:
get("attribute", eval(as.symbol("example.list")))
# [1] "test"
# Applied to your situation, as I understand it...
example.list2 <- example.list
listNames <- c("example.list", "example.list2")
sapply(listNames, function(X) get("attribute", eval(as.symbol(X))))
# example.list example.list2
# "test" "test"
Why not simply:
example.list <- list(attribute="test")
listName <- "example.list"
get(listName)$attribute
# or, if both the list name and the element name are given as arguments:
elementName <- "attribute"
get(listName)[[elementName]]
If your strings contain more than just object names, e.g. operators like here, you can evaluate them as expressions as follows:
> string <- "example.list$attribute"
> eval(parse(text = string))
[1] "test"
If your strings are all of the type "object$attribute", you could also parse them into object/attribute, so you can still get the object, then extract the attribute with [[:
> parsed <- unlist(strsplit(string, "\\$"))
> get(parsed[1])[[parsed[2]]]
[1] "test"
flodel's answer worked for my application, so I'm gonna post what I built on it, even though this is pretty uninspired. You can access each list element with a for loop, like so:
#============== List with five elements of non-uniform length ================#
example.list=
list(letters[1:5], letters[6:10], letters[11:15], letters[16:20], letters[21:26])
#===============================================================================#
#====== for loop that names and concatenates each consecutive element ========#
derp=c(); for(i in 1:length(example.list))
{derp=append(derp,eval(parse(text=example.list[i])))}
derp #Not a particularly useful application here, but it proves the point.
I'm using code like this for a function that calls certain sets of columns from a data frame by the column names. The user enters a list with elements that each represent different sets of column names (each set is a group of items belonging to one measure), and the big data frame containing all those columns. The for loop applies each consecutive list element as the set of column names for an internal function* applied only to the currently named set of columns of the big data frame. It then populates one column per loop of a matrix with the output for the subset of the big data frame that corresponds to the names in the element of the list corresponding to that loop's number. After the for loop, the function ends by outputting that matrix it produced.
Not sure if you're looking to do something similar with your list elements, but I'm happy I picked up this trick. Thanks to everyone for the ideas!
"Second example" / tangential info regarding application in graded response model factor scoring:
Here's the function I described above, just in case anyone wants to calculate graded response model factor scores* in large batches...Each column of the output matrix corresponds to an element of the list (i.e., a latent trait with ordinal indicator items specified by column name in the list element), and the rows correspond to the rows of the data frame used as input. Each row should presumably contain mutually dependent observations, as from a given individual, to whom the factor scores in the same row of the ouput matrix belong. Also, I feel I should add that if all the items in a given list element use the exact same Likert scale rating options, the graded response model may be less appropriate for factor scoring than a rating scale model (cf. http://www.rasch.org/rmt/rmt143k.htm).
'grmscores'=function(ColumnNameList,DataFrame) {require(ltm) #(Rizopoulos,2006)
x = matrix ( NA , nrow = nrow ( DataFrame ), ncol = length ( ColumnNameList ))
for(i in 1:length(ColumnNameList)) #flodel's magic featured below!#
{x[,i]=factor.scores(grm(DataFrame[, eval(parse(text= ColumnNameList[i]))]),
resp.patterns=DataFrame[,eval(parse(text= ColumnNameList[i]))])$score.dat$z1}; x}
Reference
*Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response theory analyses, Journal of Statistical Software, 17(5), 1-25. URL: http://www.jstatsoft.org/v17/i05/

Resources