I am building graphics using Matplotlib and I sometimes have wrong values in my Csv files, it creates spikes in my graph that I would like to suppress, also sometimes I have lots of zeros ( when the sensor is disconnected ) but I would prefer the graph showing blank spaces than wrong zeros that could be interpreted as real values.
Forgive me for I'm not familiar with matplotlib but I'm presuming that you're reading the csv file directly into matplotlib. If so is there an option to read the csv file into your app as a list of ints or as a string and then do the data validation before passing that string to the library?
Apologies if my idea is not applicable.
I found a way that works:
I used the Xlim to set my max and min x values and then i set all the values that i didnt want to nan !
Related
I'm trying to read a csv file with a column of data that has a scrambled ID number that includes the occasional consecutive $$ along with #, numbers, and letters.
SCRAMBLE_ID
AL9LLL677
AL9$AM657
$L9$$4440
#L9$306A1
etc.
I tried the following:
df = pd.read_csv('MASTER~1.CSV',
dtype = {'SCRAMBLE_ID': str})
which rendered the third entry as L9$4440 (L9 appear in serif font, italicized, and the first and second $ vanish).
Faced with an entire column of ID numbers configured in this manner, what is the best way of dealing with such data? I can imagine:
PRIOR TO pd.read_csv: replacing the offending symbols with substitutes that don't create this problem (and what would those be), OR,
is there a way of preserving the IDs as is but making them into a data type that ignores these symbols while keeping them present?
Thank you. I've attached a screenshot of the .csv side by side with resulting df (Jupyter notebook) below.
csv column to pandas df with $$
I cannot replicate this using the same values as you in a mock CSV file.
Are you sure that the formatting based on the $ symbol is not occurring in wherever you are rendering your dataframe values? Have you checked to see if the data in the dataframe is what you expect or are you just rendering it externally?
Given a dataframe as follows:
date,unit_value,unit_value_cumulative,daily_growth_rate
2019/1/29,1.0139,1.0139,0.22
2019/1/30,1.0057,1.0057,-0.81
2019/1/31,1.0122,1.0122,0.65
2019/2/1,1.0286,1.0286,1.62
2019/2/11,1.0446,1.0446,1.56
2019/2/12,1.0511,1.0511,0.62
2019/2/13,1.0757,1.0757,2.34
2019/2/14,1.0763,1.0763,0.06
2019/2/15,1.0554,1.0554,-1.94
2019/2/18,1.0949,1.0949,3.74
2019/2/19,1.0958,1.0958,0.08
I have used the code below to plot them, but as you can see from out image, one column doesn't display on the plot.
df.plot(x='date', y=['unit_value', 'unit_value_cumulative', 'daily_growth_rate'], kind="line")
Output:
To plot unit_value only, I use: df.plot(x='date', y=['unit_value'], kind="line")
Out:
Anyone could help to figure out why it doesn't work out when I plot three columns on same plot? Thanks.
I just reproduced your results and it actually does work fine. In your case the values of the columns "unit_value" and "unit_value_cumulative" are identical, which is why you only see the one in the front.
Besides of this problem your current data looks like you made a mistake when calculating the cumulative values.
I have the air quality(link here) dataset that contains missing values. I've imputed them while creating a dummy dataframe[using df.isnull()] to keep track of the missing values.
My goal is to generate a pairplot using seaborn(or otherwise - if any other simpler method exists) that gives a different color for the imputed values.
This is easily possible in matplotlib, where the parameter c of plt.plot can be assigned a list of values and the points are colored(but the problem is I can plot only against two columns and not a pairplot). A possible solution is to iteratively to create subplots against pairs of columns(which can make the code quite complicated!!)
However, in Seaborn (which already has the builtin function for pairplot) you are supposed to provide hue='column-name' which is not possible in this case as the missingness is stored in the dummy dataframe and need to retrieve the corresponding columns for color coding.
Please let me know how I can accomplish this in the simplest manner possible.
I'm trying to create a network map in Python using NetworkX with labels to find protential connections between people. Currently I have a 1,316 x 1,316 matrix of 1's and 0's that represent if there is a connection or not. I have been able to import this data into pyhon using a dataframe. Screenshot of Dataframe This is a small screenshot of the python dataframe. You can see the column and row names are numbers but at the end of the list they turn into actual names. If I remove the names and make a grid of just 1's and 0's NetworkX will allow me to turn this into a connection graph but it is almost useless without the labels to know who is connected to who. When including the labels in dataframe and trying to run the code listed below I recieve an error.
Error
NetworkXError: Input is not a correct Pandas DataFrame.
Code
Network = pd.DataFrame.from_csv('H:\\Network.csv')
G1 = nx.to_networkx_graph(Network)
I will admit to being quite new at this so any and all help will be appreciated. If you have any advice on why I'm getting this error or a better way to go about this, I am open to suggestions.
Thanks for your help.
I had the same problem. What helped me to solve is to just transform the dataframe to a numpy array. (This only works if shape[0] == shape[1])
here is what I did:
Network = pd.DataFrame.from_csv('H:\\Network.csv')
G1 = nx.to_networkx_graph(Network.to_numpy())
This should work for you.
thanks for taking a look at my question.
I'm having a peculiar issue importing an xlsx file into MATLAB R2016a (Mac OS X) , more specifically importing dates.
I am using the below code to import my bank statement history from the Worksheet 'Past' in the xlsx file 'bank_statements.xlsx'. A snippet of column 1 with the dates in dd/mm/yyyy format is also included.
[ndata, text, data] = xlsread('bank_statements.xlsx','Past');
My understanding is that MATLAB uses filters to distinguish between text and numeric data with these being represented in the 'text' and 'data' arrays respectively whilst 'ndata' is a cell array with everything included. Previously, when running the script on MATLAB 2015a (Windows) the dates from column 1 were treated as strings and populated in the 'text' array, whilst on MATLAB 2016a (Mac OS X) column 1 of the text array is blank. I assumed this was because updates had been made to how the xlsread function interprets date information.
Here's the strange part. Whilst inspecting the text array through the Variables window and referencing in the Command Window shows text(2,1) to be empty, performing the datenum function on this "empty" cell successfully gives the date in a numbered format:
Whilst I can solve this issue by using the ndata array (or ignoring the fact that the above doesn't make sense to me) I'd really like to understand what is happening here and whilst a seemingly empty cell can actually be holding information which operations can be performed on.
Best regards,
Jim
I was able to replicate your problem and although I can't answer the intricacies of what is happening, I could offer a suggestion. I was only able to replicate it when I was converting a string of non-date text, which leads me to believe that there might be an issue with the way the data was imported.
Instead of:
[ndata,text,data] = xlsread('bank_statements.xlsx','Past');
maybe try and add in the #convertSpreadsheetDates function if you have it, along with the range of values you want to import, i.e.
[ndata,text,data] = xlsread('bank_statements.xlsx','Past','A2:A100','',#convertSpreadsheetDates);
Probably not what you are looking for but it might help!