How to identify and remove observations with ValueError in Pandas - python-3.x

While working on a dataset with pandas I am getting ValueError
ValueError: could not convert string to float: <some_string>
I want identify and remove such observations from my data set and I am unable to find a way to do it. Please suggest.
The code that I am running :
X_scaled = preprocessing.scale(X_final)
Edit 1 : After searching online and looking at some other posts in SO, I have understood that this might be happening because there is some column/s in my data which is supposed to have numbers/floats but contains string in some observations. How can I identify such columns?

Related

Converting Numpy array to Dataframe using pandas

I am struggling to convert a NumPy array to a data frame.
This has been imported from a CSV file.
enter image description here
I have tried codes like
carsdf = pd.DataFrame(newdf=newdf[1:,1:],
index=newdf[1:,0],
columns=newdf[0,1:])
however, get a type error.
I have tried reshaping
print(mydata.reshape((213, 26)))
and get value errors
Now am flustered as I know how to rectify typos, errors, capitalize and fetch data and all the other stuff only my coding is pretty raw and am stuck with this as it is the beginning of the assignment.
would appreciate all advice.

Slice copy error while using two data frame and updating one of the column into other datafarame pandas

I'm trying to compare 2 df and fill the values of one data frame into another by creating a column
have used the following code
df['location']=df1['location']
for i in range(0,len(df)):
for j in range(0,len(df1)):
if df['Name'][i]==df1['Name'][j]:
df['location'][i] =(df1['location'][j])
df are listed below
I am getting the following error.
<ipython-input-14-7b3141ebb9f0>:7: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas- docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['location'][i] =(df1['location'][j])
i am able to get the desired output irrespective of the error warning..!
result :
If i use the following command i can bypass this error/warning
pd.options.mode.chained_assignment = None
Need to know is there a way to avoid warning without using the above-said command. Need your help Thanks in advance.
Data
df=pd.DataFrame({'Name':['A','B','C'], 'location':['South','north','east']})
df1=pd.DataFrame({'Name':['A','B','C','A','B','C'], 'count':[1,2,3,4,5,6]})
df
dict for reference
d=dict(zip(df.Name,df.location))
Map to Transfer
df1['location']=df1.Name.map(f)
Output

getting a groupby error through a for loop

fashion = [1,1,2,3,3,3,21,1,1,1,5,5,5,5,3,3,2,6]
for key,group in groupby(fashion):
print(key,':',list(group))
I have written the above code to group by certain numbers and get a list. For example, I want an outcome such as :
1 : [1,1,1,1,1]
2 : [2,2]
Can someone please tell me what's wrong with my code?
fashion is neither a sorted list for itertools.groupby() nor a Pandas DataFrame. Please post the fully functioning code with clarification on if you are attempting to use Pandas for grouping a series or DataFrame.
How to make good reproducible pandas examples
Contains good advices for getting help with Pandas features.

pandas method .isin() automatically converts datatypes in element comparisons?

Wondering if someone can help me here. When I take a regular python list containing strings, and check to see if a pandas series (pla.id) has a value that matches a value in that list. It works.
Which is great and all but I wonder how it's able to compare strings to ints... is there documentation somewhere that states that it will convert under the hood before comparing those values??
I wasn't able to find anything on the .isin() page of pandas documentation..
Also super interesting is that when I try pandas indexing it fails due to a type comparison.
So my two questions:
Does pandas.series.isin(some_list_of_strings) method automatically convert the values in the series (which are int values) to strings before doing a value comparison?
If so, why doesn't pandas indexing i.e. (df[df.series == 'some value']) not do the same thing? What is the thinking behind this? If I wanted to accomplish the same thing I would have to do df[df.series.astype(str) == ''] or df[df.series.astype(str).isin(some_list_of_strings)] to access those values in the df that match
After some digging I think this might be due to the pandas object datatype? but I have no understanding of why this works. Also this doesn't explain why the below works... since it is a int dtype
Thanks in advance!

NetworkXError: Input is not a correct Pandas DataFrame

I'm trying to create a network map in Python using NetworkX with labels to find protential connections between people. Currently I have a 1,316 x 1,316 matrix of 1's and 0's that represent if there is a connection or not. I have been able to import this data into pyhon using a dataframe. Screenshot of Dataframe This is a small screenshot of the python dataframe. You can see the column and row names are numbers but at the end of the list they turn into actual names. If I remove the names and make a grid of just 1's and 0's NetworkX will allow me to turn this into a connection graph but it is almost useless without the labels to know who is connected to who. When including the labels in dataframe and trying to run the code listed below I recieve an error.
Error
NetworkXError: Input is not a correct Pandas DataFrame.
Code
Network = pd.DataFrame.from_csv('H:\\Network.csv')
G1 = nx.to_networkx_graph(Network)
I will admit to being quite new at this so any and all help will be appreciated. If you have any advice on why I'm getting this error or a better way to go about this, I am open to suggestions.
Thanks for your help.
I had the same problem. What helped me to solve is to just transform the dataframe to a numpy array. (This only works if shape[0] == shape[1])
here is what I did:
Network = pd.DataFrame.from_csv('H:\\Network.csv')
G1 = nx.to_networkx_graph(Network.to_numpy())
This should work for you.

Resources