How to plot big csv file

How to plot big csv file - python-3.x

I have dataframe with 2232803 rows and two columns 'x' , 'y', I want to draw line plot. x-axis x and y-axis y . I tried some ways but when i plot my computer stuck maybe file is so big , my computer have 8gb RAM. is there anyway to plot that big file
i tried this:
plt.plot(df['x'].values,df['y'].values)

I suggest you plot a random fraction of the dataset:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.rand(2232803, 2), columns=list('xy'))
df_plot = df.sample(frac=0.01).copy()
plt.plot(df_plot['x'].values, df_plot['y'].values)

Related

plotting time and temperature in xy plot

I want to plot a xy plot where x axis contain temperature values(first column) and y axis contain time in hr:min:sec(second column) .
8.8900 06:09:95.50
9.4500 06:09:00.56
10.5800 08.06:95.48
11.6500 09:07:73.58
56.3650 00:08:00.47
85.7823 07:01:03.23
I just want to plot a xy plot.
I tried code
import numpy as np
import matplotlib.pyplot as plt
data=np.loadtxt("inpdata.txt")
plt.plot(data[:,0],data[:,1])
plt.show()
But it does not give plot.hope experts may help.Thanks.

The simplest approach would be to use pandas.
Load the file as whitespace delimited file into pandas.DataFrame object as:
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('inpdata.txt', names=['temp', 'time'], delim_whitespace=True)
Then create a line plot with time as x axis:
df.plot.line(x='time')
and show the plot
plt.show()

How to plot frequency from a CSV Column in Python

I have a CSV file with columns: created_at, hashtags, media, urls, favorite_count.
I would like to plot the frequency of hashtags.
To read the CSV file I used pandas (but I would like also to show/list the result):
import pandas as pd
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('/path/file',delimiter=",")
Then, to plot the frequency of hashtags in the file, I used
plt.plot(df["hashtags"])
plt.show()
but I received the error: "nan is not a string".
Any suggestion on how to plot the column and visualise the results as both plot and pretty table?
Thanks

You can try this:
df.dropna()
df.reset_index(drop = True)
plt.plot(df["Column1"], df["Column1"])
plt.show()

how to make scatter plot of two columns and divide x_axis in 3 column f1,f2,and f3

I have dataframe i want to draw a scattor plot by dividing plot in 2 regions in region one only plot f_x_f1 vs A_x_f1, and in region2 plot f_x_f2 vs A_x_f2
please if someone can provide better solution for this problem
here is example of my dataframe
df=pd.DataFrame({'f_x_f1':[0.3,0.28,0.34],'A_x_f1':[0.003,0.28,0.034],'f1':[0.4,0.4,0.4],'f_x_f2':[0.91,0.88,0.96],'A_x_f2':[0.003,0.28,0.034],'f2':[1.3,1.3,1.3]})

Here, using matplotlib!
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
### making some sample data
df = pd.DataFrame({"f_x_f1": np.random.randint(1,100,100)
, "A_x_f1": np.random.randint(1,100,100)
, "f_x_f2": np.random.randint(1,100,100)
, "A_x_f2": np.random.randint(1,100,100) })
fig, ax = plt.subplots(nrows=1, ncols=2)
ax[0].scatter(df.f_x_f1,df.A_x_f1)
ax[0].set_title("f_x_f1 vs A_x_f1")
ax[1].scatter(df.f_x_f2,df.A_x_f2)
ax[1].set_title("f_x_f2 vs A_x_f2")
OUTPUT:

how can i plot the graph for csv data in matplotlib

can you please tell me how to plot the graph for csv data.
csv file have x,y,depth,color values i want to plot the depth and color for x and y axis,i goggled many times but i didn't find anything properly.so please guide me how to plot the graph for that values?
this is i tried :
from matplotlib import pyplot as plt
from matplotlib import style
import pandas as pd
data=pd.read_csv("Tunnel.csv",names=['x','y','z','color'])
data1 =data[data.z==0]
print (data1)
# plt.plot(data[data.x],data[data.y])
plt.ylabel('yaxis')
plt.xlabel('xaxis')
plt.title('Tunnel 2d')
plt.show()
my data is given bellow

I'm assuming that you want the first two columns to be used as plot axis and columns 3 and 4 as plot data.
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv("Tunnel.csv")
x = stats[stats.columns[2]]
y = stats[stats.columns[3]]
xlab = list(stats)[0] #x-axis label
ylab = list(stats)[1] #y-axis label
fig, pli = plt.subplots()
pli.show()
#Assuming it's a line graph that you want to plot
line, = pli.plot(x, y, color='g', linewidth=5, label='depth vs color')
plt.xlabel(xlab)
plt.ylabel(ylab)
plt.title(title)
fig.savefig('./Directory/Graph.png')

I am assuming that you want the color and depth as text annotations.
import stuff
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
create the df
dep=list(np.random.randint(0,100,10))
col=list(np.random.randint(0,100,10))
y=[int(x/3)+1 for x in range(0,10)]
x=list(range(0,10))
my_df=pd.DataFrame({'x':x,'y':y,'colour':col,'depth':dep})
create the annotate column
my_df['my_text']='c= '+my_df.colour.astype(str)+','+'d= '+my_df.depth.astype(str)
plot it
plt.figure(figsize=(20,10))
plt.plot(my_df.x,my_df.y,'o')
for i, txt in enumerate(my_df['my_text']):
plt.annotate(txt, (x[i],y[i]), size=10, xytext=(0,0), ha='left', textcoords='offset points', bbox=dict(facecolor='none', edgecolor='red'))
plt.ylabel('yaxis')
plt.xlabel('xaxis')
plt.title('Tunnel 2d')
plt.show()
Result

x axis labels (date) slips in Python matplotlib

I'm beginner in Python and I have the following problems. I would like to plot a dataset, where the x-axis shows date data. The Dataset look likes the follows:
datum, start, end
2017.09.01 38086 37719,8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
In the first column are the labels of the x-axis (this is the date).
When I write the following code the x axis data slips:
import pandas as pd
import matplotlib.pyplot as plt
bux = pd.read_csv('C:\\Home\\BUX.txt',
sep='\t',
decimal='.',
header=0)
fig1 = bux.plot(marker='o')
fig1.set_xticklabels(bux.datum, rotation='vertical', fontsize=8)
The resulted figure look likes as follows:
The second data row in the dataset is '2017.09.04 37707.3906 37465.2617', BUT '2017.09.04' is yield at the third data row with start value=37471.5117
What shell I do to get correct x axis labels?
Thank you!
Agnes

First, there is a comma in the second line instead of a .. This should be adjusted. Then, you convert the "datum," column to actual dates and simply plot the dataframe with matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/BUX.txt', sep='\s+')
df["datum,"] = pd.to_datetime(df["datum,"], format="%Y.%m.%d")
plt.plot(df["datum,"], df["start,"], marker="o")
plt.plot(df["datum,"], df["end"], marker="o")
plt.gcf().autofmt_xdate()
plt.show()

Thank you! It works perfectly. The key moment was to convert the data to date format. Thank you again!
Agnes

Actually you can easily use the df.plot() to fix it:
import pandas as pd
import matplotlib.pyplot as plt
import io
t="""
date start end
2017.09.01 38086 37719.8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
"""
import numpy as np
data=pd.read_fwf(io.StringIO(t),header=1,parse_dates=['date'])
data.plot(x='date',marker='o')
plt.show()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to plot big csv file - python-3.x

I suggest you plot a random fraction of the dataset: import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.DataFrame(np.random.rand(2232803, 2), columns=list('xy')) df_plot = df.sample(frac=0.01).copy() plt.plot(df_plot['x'].values, df_plot['y'].values)

Related

plotting time and temperature in xy plot

How to plot frequency from a CSV Column in Python

how to make scatter plot of two columns and divide x_axis in 3 column f1,f2,and f3

how can i plot the graph for csv data in matplotlib

x axis labels (date) slips in Python matplotlib

Categories

Resources