How to plot frequency from a CSV Column in Python - python-3.x

I have a CSV file with columns: created_at, hashtags, media, urls, favorite_count.
I would like to plot the frequency of hashtags.
To read the CSV file I used pandas (but I would like also to show/list the result):
import pandas as pd
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('/path/file',delimiter=",")
Then, to plot the frequency of hashtags in the file, I used
plt.plot(df["hashtags"])
plt.show()
but I received the error: "nan is not a string".
Any suggestion on how to plot the column and visualise the results as both plot and pretty table?
Thanks

You can try this:
df.dropna()
df.reset_index(drop = True)
plt.plot(df["Column1"], df["Column1"])
plt.show()

Related

How to create a line plot in python, by importing data from excel and using it to create a plot that shares a common X-Axis?

Trying to create a plot using Python Spyder. I have sample data in excel which I am able to import into Spyder, I want one column ('Frequency') to be the X axis, and the rest of the columns ('C1,C2,C3,C4') to be plotted on the Y axis. How do I do this? This is the data in excel and how the plot looks in excel (https://i.stack.imgur.com/eRug5.png) , the plot and data
This is what I have so far . These commands below (Also seen in the image) give an empty plot.
data = data.head()
#data.plot(kind='line', x='Frequency', y=['C1','C2','C3','C4'])
df = pd.DataFrame(data, columns=["Frequency","C1", "C2","C3","C4"])
df.plot(x = "Frequency",y=["C1", "C2","C3","C4"])
Here is an example, you can change columns names:
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame({'X_Axis':[1,3,5,7,10,20],
'col_2':[.4,.5,.4,.5,.5,.4],
'col_3':[.7,.8,.9,.4,.2,.3],
'col_4':[.1,.3,.5,.7,.1,.0],
'col_5':[.5,.3,.6,.9,.2,.4]})
dfm = df.melt('X_Axis', var_name='cols', value_name='vals')
g = sns.catplot(x="X_Axis", y="vals", hue='cols', data=dfm, kind='point')
import pandas as pd
import matplotlib.pyplot as plt
path = r"C:\Users\Alisha.Walia\Desktop\Alisha\SAMPLE.xlsx"
data = pd.read_excel(path)
#df = pd.DataFrame.from_dict(data)
#print(df)
#prints out data from excl in tabular format
dict1 = (data.to_dict()) #print(dict1)
Frequency=data["Frequency "].to_list() #print (Frequency)
C1=data["C1"].to_list() #print(C1)
C2=data["C2"].to_list() #print(C2)
C3=data["C3"].to_list() #print(C3)
C4=data["C4"].to_list() #print(C4)
plt.plot(Frequency,C1)
plt.plot(Frequency,C2)
plt.plot(Frequency,C3)
plt.plot(Frequency,C4)
plt.style.use('ggplot')
plt.title('SAMPLE')
plt.xlabel('Frequency 20Hz-200MHz')
plt.ylabel('Capacitance pF')
plt.xlim(5, 500)
plt.ylim(-20,20)
plt.legend()
plt.show()

plotting time and temperature in xy plot

I want to plot a xy plot where x axis contain temperature values(first column) and y axis contain time in hr:min:sec(second column) .
8.8900 06:09:95.50
9.4500 06:09:00.56
10.5800 08.06:95.48
11.6500 09:07:73.58
56.3650 00:08:00.47
85.7823 07:01:03.23
I just want to plot a xy plot.
I tried code
import numpy as np
import matplotlib.pyplot as plt
data=np.loadtxt("inpdata.txt")
plt.plot(data[:,0],data[:,1])
plt.show()
But it does not give plot.hope experts may help.Thanks.
The simplest approach would be to use pandas.
Load the file as whitespace delimited file into pandas.DataFrame object as:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('inpdata.txt', names=['temp', 'time'], delim_whitespace=True)
Then create a line plot with time as x axis:
df.plot.line(x='time')
and show the plot
plt.show()

How to plot big csv file

I have dataframe with 2232803 rows and two columns 'x' , 'y', I want to draw line plot. x-axis x and y-axis y . I tried some ways but when i plot my computer stuck maybe file is so big , my computer have 8gb RAM. is there anyway to plot that big file
i tried this:
plt.plot(df['x'].values,df['y'].values)
I suggest you plot a random fraction of the dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(2232803, 2), columns=list('xy'))
df_plot = df.sample(frac=0.01).copy()
plt.plot(df_plot['x'].values, df_plot['y'].values)

How to plot multiple line graph for a column value from a CSV file?

I tried to plot a graph for energies of 4 nodes using line graph but I'm not able to identify which line represent which node ID(1,2,3 or 4)
My csv looks something like this :
Time,Source,Destination,Bits,Energy
0,1,2,288,9.9999856
1058,1,2,288,9.9999856
1140,2,1,96,9.9999808
1958,2,3,96,9.9999952
2024,2,1,96,9.9999808
2051,2,3,288,9.999966399
3063,2,3,288,9.9999808
3126,3,2,96,9.999976
3127,2,1,288,9.9999664
3946,3,2,96,9.999961599
8340,1,2,288,9.999952
9418,1,2,288,9.999947199
9479,2,1,96,9.999942399
10299,2,3,96,9.9999712
10365,2,1,96,9.9999472
10758,2,3,288,9.999927999
11770,2,3,288,9.9999568
11832,3,2,96,9.999951999
11842,2,1,288,9.9999328
Code :
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv('DS1.csv')
for Energy,data in df.groupby('Source'):
plt.plot(data['Time'], data['Energy'])
plt.legend(data['Source'])
#print(data)
plt.xlabel('Time')
plt.ylabel('Energy')
plt.legend()
plt.show()
I actually want to plot source,energy vs Time for all sources(1 to 4)
You need to set the label.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv('DS1.csv')
for Energy, data in df.groupby('Source'):
plt.plot(data['Time'], data['Energy'], label=Energy)
#print(data)
plt.xlabel('Time')
plt.ylabel('Energy')
plt.legend()
plt.show()

x axis labels (date) slips in Python matplotlib

I'm beginner in Python and I have the following problems. I would like to plot a dataset, where the x-axis shows date data. The Dataset look likes the follows:
datum, start, end
2017.09.01 38086 37719,8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
In the first column are the labels of the x-axis (this is the date).
When I write the following code the x axis data slips:
import pandas as pd
import matplotlib.pyplot as plt
bux = pd.read_csv('C:\\Home\\BUX.txt',
sep='\t',
decimal='.',
header=0)
fig1 = bux.plot(marker='o')
fig1.set_xticklabels(bux.datum, rotation='vertical', fontsize=8)
The resulted figure look likes as follows:
The second data row in the dataset is '2017.09.04 37707.3906 37465.2617', BUT '2017.09.04' is yield at the third data row with start value=37471.5117
What shell I do to get correct x axis labels?
Thank you!
Agnes
First, there is a comma in the second line instead of a .. This should be adjusted. Then, you convert the "datum," column to actual dates and simply plot the dataframe with matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/BUX.txt', sep='\s+')
df["datum,"] = pd.to_datetime(df["datum,"], format="%Y.%m.%d")
plt.plot(df["datum,"], df["start,"], marker="o")
plt.plot(df["datum,"], df["end"], marker="o")
plt.gcf().autofmt_xdate()
plt.show()
Thank you! It works perfectly. The key moment was to convert the data to date format. Thank you again!
Agnes
Actually you can easily use the df.plot() to fix it:
import pandas as pd
import matplotlib.pyplot as plt
import io
t="""
date start end
2017.09.01 38086 37719.8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
"""
import numpy as np
data=pd.read_fwf(io.StringIO(t),header=1,parse_dates=['date'])
data.plot(x='date',marker='o')
plt.show()

Resources