Converting Numpy array to Dataframe using pandas - python-3.x

I am struggling to convert a NumPy array to a data frame.
This has been imported from a CSV file.
enter image description here
I have tried codes like
carsdf = pd.DataFrame(newdf=newdf[1:,1:],
index=newdf[1:,0],
columns=newdf[0,1:])
however, get a type error.
I have tried reshaping
print(mydata.reshape((213, 26)))
and get value errors
Now am flustered as I know how to rectify typos, errors, capitalize and fetch data and all the other stuff only my coding is pretty raw and am stuck with this as it is the beginning of the assignment.
would appreciate all advice.

Related

getting a groupby error through a for loop

fashion = [1,1,2,3,3,3,21,1,1,1,5,5,5,5,3,3,2,6]
for key,group in groupby(fashion):
print(key,':',list(group))
I have written the above code to group by certain numbers and get a list. For example, I want an outcome such as :
1 : [1,1,1,1,1]
2 : [2,2]
Can someone please tell me what's wrong with my code?
fashion is neither a sorted list for itertools.groupby() nor a Pandas DataFrame. Please post the fully functioning code with clarification on if you are attempting to use Pandas for grouping a series or DataFrame.
How to make good reproducible pandas examples
Contains good advices for getting help with Pandas features.

DataFrame entries got round off when converted to txt

This is what the dataframe looks like before exporting
After that it becomes
Rounding down is not what I want here; I want the text in txt.file look like what it is shown in the console. So how can I fix this? Any simple solutions?
Did you try writing directly from your Pandas dataframe instead of going through Numpy?
Try DF.to_csv(‘output.txt’, sep=‘\t’, float_format=‘%g’)
For more details see pandas.DataFrame.to_csv

How to identify and remove observations with ValueError in Pandas

While working on a dataset with pandas I am getting ValueError
ValueError: could not convert string to float: <some_string>
I want identify and remove such observations from my data set and I am unable to find a way to do it. Please suggest.
The code that I am running :
X_scaled = preprocessing.scale(X_final)
Edit 1 : After searching online and looking at some other posts in SO, I have understood that this might be happening because there is some column/s in my data which is supposed to have numbers/floats but contains string in some observations. How can I identify such columns?

NetworkXError: Input is not a correct Pandas DataFrame

I'm trying to create a network map in Python using NetworkX with labels to find protential connections between people. Currently I have a 1,316 x 1,316 matrix of 1's and 0's that represent if there is a connection or not. I have been able to import this data into pyhon using a dataframe. Screenshot of Dataframe This is a small screenshot of the python dataframe. You can see the column and row names are numbers but at the end of the list they turn into actual names. If I remove the names and make a grid of just 1's and 0's NetworkX will allow me to turn this into a connection graph but it is almost useless without the labels to know who is connected to who. When including the labels in dataframe and trying to run the code listed below I recieve an error.
Error
NetworkXError: Input is not a correct Pandas DataFrame.
Code
Network = pd.DataFrame.from_csv('H:\\Network.csv')
G1 = nx.to_networkx_graph(Network)
I will admit to being quite new at this so any and all help will be appreciated. If you have any advice on why I'm getting this error or a better way to go about this, I am open to suggestions.
Thanks for your help.
I had the same problem. What helped me to solve is to just transform the dataframe to a numpy array. (This only works if shape[0] == shape[1])
here is what I did:
Network = pd.DataFrame.from_csv('H:\\Network.csv')
G1 = nx.to_networkx_graph(Network.to_numpy())
This should work for you.

Hide wrong values of a graph

I am building graphics using Matplotlib and I sometimes have wrong values in my Csv files, it creates spikes in my graph that I would like to suppress, also sometimes I have lots of zeros ( when the sensor is disconnected ) but I would prefer the graph showing blank spaces than wrong zeros that could be interpreted as real values.
Forgive me for I'm not familiar with matplotlib but I'm presuming that you're reading the csv file directly into matplotlib. If so is there an option to read the csv file into your app as a list of ints or as a string and then do the data validation before passing that string to the library?
Apologies if my idea is not applicable.
I found a way that works:
I used the Xlim to set my max and min x values and then i set all the values that i didnt want to nan !

Resources