Python 3 replacing nan values

Python 3 replacing nan values - python-3.x

I am trying to replace some nan values in a few columns using a calculation from other columns.
ie.
nancolumn = column1.value + column2.value
My first attempt didn't work, ie there is still nan values
indecies = list(list(map(tuple, np.where(np.isnan(df['nancolumn']))))[0])
newValue = df.iloc[indecies]['column1'] + df.iloc[indecies ]['column2']
df.iloc[indecies]['nancolumn'] = newValue
I then found a specific index that i wanted to replace, 1805, and tried just replacing this data point value with 1.0. The result is still a nan
df.iloc[1805]['nancolumn'] = 1.0
I tried using fillna(), and isnan()
df[np.isnan(df)]=1
I get this error for the isnan() attempt:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
df.iloc[1805]['nancolumn'].dtype
dtype('float64')
I know im missing something simple, but i can't figure it out.
Can someone please help?

I found out that its best to use reference the column first and then the index, like below
df['nancolumn'].iloc[1805] = 1.0
Although, i still don't really understand the difference. If any one has a explenation, that would be helpful.

Related

Python: (partial) matching elements of a list to DataFrame columns, returning entry of a different column

I am a beginner in python and have encountered the following problem: I have a long list of strings (I took 3 now for the example):
ENSEMBL_IDs = ['ENSG00000040608',
'ENSG00000070371',
'ENSG00000070413']
which are partial matches of the data in column 0 of my DataFrame genes_df (first 3 entries shown):
genes_list = (['ENSG00000040608.28', 'RTN4R'],
['ENSG00000070371.91', 'CLTCL1'],
['ENSG00000070413.17', 'DGCR2'])
genes_df = pd.DataFrame(genes_list)
The task I want to perform is conceptually not that difficult: I want to compare each element of ENSEMBL_IDs to genes_df.iloc[:,0] (which are partial matches: each element of ENSEMBL_IDs is contained within column 0 of genes_df, as outlined above). If the element of EMSEMBL_IDs matches the element in genes_df.iloc[:,0] (which it does, apart from the extra numbers after the period ".XX" ), I want to return the "corresponding" value that is stored in the first column of the genes_df Dataframe: the actual gene name, 'RTN4R' as an example.
I want to store these in a list. So, in the end, I would be left with a list like follows:
`genenames = ['RTN4R', 'CLTCL1', 'DGCR2']`
Some info that might be helpful: all of the entries in ENSEMBL_IDs are unique, and all of them are for sure contained in column 0 of genes_df.
I think I am looking for something along the lines of:
`genenames = []
for i in ENSEMBL_IDs:
if i in genes_df.iloc[:,0]:
genenames.append(# corresponding value in genes_df.iloc[:,1])`
I am sorry if the question has been asked before; I kept looking and was not able to find a solution that was applicable to my problem.
Thank you for your help!
Thanks also for the edit, English is not my first language, so the improvements were insightful.

You can get rid of the part after the dot (with str.extract or str.replace) before matching the values with isin:
m = genes_df[0].str.extract('([^.]+)', expand=False).isin(ENSEMBL_IDs)
# or
m = genes_df[0].str.replace('\..*$', '', regex=True).isin(ENSEMBL_IDs)
out = genes_df.loc[m, 1].tolist()
Or use a regex with str.match:
pattern = '|'.join(ENSEMBL_IDs)
m = genes_df[0].str.match(pattern)
out = genes_df.loc[m, 1].tolist()
Output: ['RTN4R', 'CLTCL1', 'DGCR2']

Does 'position()' have to be explicitly included in this Xpath?

This returns all the first 'nd's as expected
select="osm/way/nd[1]"
This returns all the lasts:
select="osm/way/nd[last()]"
This returns both:
select="osm/way/nd[position() = 1 or position() = last()]"
Is there a syntax to remove position() function?
Something like this, but works?
select="osm/way/nd[[1] or [last()]]"

There has been some debate about allowing a new syntax to select a range https://github.com/qt4cg/qtspecs/issues/50#issuecomment-799228627 e.g. osm/way/nd[#1,last()] might work in a future XPath 4 but currently it is all up in the air of a lot of debate and questionable whether a new operator is helpful instead of doing osm/way/nd[position() = (1, last())].

which() Function in SystemML

I’m a developer want to use SystemMl for running R-Code from our business people on a Spark cluster.
I’ve studied http://apache.github.io/systemml/dml-language-reference , however, haven’t found a implementation of the R function “which” or any alternative functionality. Has anyone an idea how I could
Given
v = c(1,4,NA,2, 5, NA)
Expect indexes where value meets condition = int[] 2 5
v2 = which(v>2)
Expect indexes where is.na returns TRUE = int[] 3 6
v3 = which(is.na(v))
I’ve already considered the functions replace() and removeEmpty(), but they don’t exactly meets my needs.
Thanks a lot in advance
Kuno

Just in case someone else stumbles over the same problem. R's which can be emulated with the following workaround:
v2 = removeEmpty(target=seq(1,length(v)) * (v>2), margin="rows")
Furthermore, SystemML does not allow NA, so you would need to replace it with 0 or NaN (e.g., 0/0=NaN). The extraction would then look like (v==0) or (v!=v), where the latter accounts for the fact that any comparison with a NaN is false and so NaN is the only value that is not equal to itself.

How to deal with missing values in Azure Machine Learning Studio

Looks like I have 672 mission values, according to statistics.
There are NULL value in QuotedPremium column.
I implemented Clean Missing Data module where it should substitute missing values with 0, but for some reason I'm still seeing NULL values as QuotedPremium, but...it says that missing values are = 0
Here you see it tells me that missing values = 0, but there are still NULLs
So what really happened after I ran Clean Missing Data module? Why it ran succesfully but there are still NULL values, even though it tells that number of missing values are 0.

NULL is indeed a value; entries containing NULLs are not missing, hence they are neither cleaned with the 'Clean Missing Data' operator nor reported as missing.

Since they are not really missing values, its a string NULL which is added to all these cells. So, in order to substitute these values with 0 you can use this below:
Use Execute R Script module, and add this code in it.
dataset1 <- maml.mapInputPort(1); # class: data.frame
dataset1[dataset1 == "NULL"] = 0; # Wherever cell's value is "NULL", replace it with 0
maml.mapOutputPort("dataset1"); # return the modified data.frame
Image for same:

Why am I getting an array of NANs when trying to plot a map with D3.js?

I am tring to plot a map with d3.js using GeoJSON, but the paths generated look like this:
<path d="MNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,‌NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,N‌aNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,Na‌NLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaN‌LNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNL‌NaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLN‌aN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNLNaN,NaNZ">
the code and data are in this Gist:
https://gist.github.com/4157853
I can load the data just fine on QGIS.
Does any one know what is causing this?

The way you have specified the offset in the Mercator projection doesn't seem right. The projection.translate method expects a two element array:
https://github.com/mbostock/d3/wiki/Geo-Projections#wiki-mercator_translate
So instead of:
proj.translate(-43.8,-23.2).scale(10);
you would need to say:
proj.translate([-43.8,-23.2]).scale(10);
-- edit --
See source of projection.translate: https://github.com/mbostock/d3/blob/3.0/src/geo/projection.js#L139
projection.translate = function(_) {
if (!arguments.length) return [x, y];
x = +_[0];
y = +_[1];
return reset();
};
If the argument _ is not an array then +_[0] will return a NaN and therefore the x and y will become NaNs. (This is because trying to get one element from a number (e.g. 213[0]) returns undefined and casting undefined to a number (e.g. +undefined) yields NaN.)

If the code you posted in the gist is everything you're trying to run, then the data you show in data.json is not being loaded anywhere. Anyway, your draw function is acting on the data defined by the variable map (line 16), which refers to a simulation variable which isn't set anywhere. And even if it did, line 34 then refers to a features property of the object passed in as json, which map does not have.
In summary, you need to pass the JSON you posted in the gist to your draw function. Then it might well work. If you don't pass in valid data to the d3 SVG helpers, you'll get a bunch of NaN out.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Python 3 replacing nan values - python-3.x

I found out that its best to use reference the column first and then the index, like below df['nancolumn'].iloc[1805] = 1.0 Although, i still don't really understand the difference. If any one has a explenation, that would be helpful.

Related

Python: (partial) matching elements of a list to DataFrame columns, returning entry of a different column

Does 'position()' have to be explicitly included in this Xpath?

which() Function in SystemML

How to deal with missing values in Azure Machine Learning Studio

Why am I getting an array of NANs when trying to plot a map with D3.js?

Categories

Resources