python pandas dataframe.plot() - python-3.x

Hello I am working on assignment and I am having trouble. When I run the program it compiles and there are no errors. however, nothing is printed out on the terminal screen. I guess I expect a graph on the terminal screen but don't see anything
with open('myfile.csv') as csvfile:
data = pd.read_csv(csvfile, delimiter=',')
d = data.values
dd = pd.DataFrame(data=d)
dd.plot()
Any tips or suggestions is appreciated

import pandas as pd
import matplotlib.pyplot as plt
with open('myfile.csv') as csvfile:
data = pd.read_csv(csvfile, delimiter=',')
d = data.values
dd = pd.DataFrame(data=d)
dd.plot()
plt.show()
use this

The graph is plotting with your current code as is,
Please check the imports once again. I'm using the following imports:
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline

Related

ax = plt.axes(projection=ccrs.Mercator()) give an error

I have the following program:
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
birddata = pd.read_csv("bird_tracking.csv")
bird_names = pd.unique(birddata.bird_name)
plt.figure(figsize=(10,10)
ax = plt.axis(projection=ccrs.Mercator())
ax.set_extent((-25.0,20.0,52.0,10.0))
ax.add_feature(cfeature.LAND)
ax.add_feature(cfeature.OCEAN)
ax.add_feature(cfeature.COASTLINE)
ax.add_feature(cfeature.BORDERS,linestyle=":")
for name in bird_names:
ix = birddata['bird_name'] == name
x, y = birddata.longitude[ix], birddata.latitude[ix]
ax.plot(x, y,".", transform = ccrs.Geodetic(), label = name)
plt.legend(loc="upper left")
plt.savefig("map.pdf")
I am using Spyder 5 and python 3.9. But I have an invalid syntax for the line ax = plt.axis(projection=ccrs.Mercator())
If I run the program, I receive this message in the console:
error in ax
I am unable to find help on this. Moreover, if you go to this link, you will see that they use it exactly as I do. So what am I missing?
Thank you i advance for your help.

Is there any alternative for pd.notna ( pandas 0.24.2). It is not working in pandas 0.20.1?

"Code was developed in pandas=0.24.2, and I need to make the code work in pandas=0.20.1. What is the alternative for pd.notna as it is not working in pandas version 0.20.1.
df.loc[pd.notna(df["column_name"])].query(....).drop(....)
I need an alternative to pd.notna to fit in this line of code to work in pandas=0.20.1
import os
import subprocess
import pandas as pd
import sys
from StringIO import StringIO
from io import StringIO
cmd = 'NSLOOKUP email.fullcontact.com'
df = pd.DataFrame()
a = subprocess.Popen(cmd, stdout=subprocess.PIPE)
b = StringIO(a.communicate()[0].decode('utf-8'))
df = pd.read_csv(b, sep=",")
column = list(df.columns)
name = list(df.iloc[1])[0].strip('Name:').strip()
name

import data from multiple file and summing column wise

I have n number of txt files each having 99 floating numbers in 99 column. I read each files and append all data by following script.
import glob
import numpy as np
import matplotlib.pyplot as plt
msd_files = (glob.glob('MSD_no_fs*'))
msd_all=[]
for msd_file in msd_files:
# print(msd_file)
msd = numpy.loadtxt(fname=msd_file, delimiter=',')
msd_all.append(msd)
After that I need to make column wise summation of each files. for example file1,column1+file2,column1+...+file(n)column(1) and iterate this for all column. What will be the effective way to perform this? Can I use list comprehension for that?
**edited code and it works fine now.
import glob
import numpy as np
import matplotlib.pyplot as plt
msd_files = (glob.glob('MSD_no_fs*'))
msd_all=[]
for msd_file in msd_files:
with open(msd_file) as f:
for line in f:
# msd_all.append([float(v) for v in line.strip().split(',')])
msd_all.append(float(line.strip()))
msa_array = np.array(msd_all)
x=np.split(msa_array,99)
x=np.array(x)
result=np.mean(x,axis=0)
print(result.shape)
print(len(result))
It depends on efficiency level you want. Using numpy to load many csv files might be a bad choice. Here is my suggestion.
import glob
import numpy as np
msd_files = (glob.glob('MSD_no_fs*'))
msd_all=[]
for msd_file in msd_files:
with open(msd_file) as f:
for line in f:
msd_all.append([float(v) for v in line.strip().split(',')])
msa_array = np.array(msd_all)
result = msa_array.sum(axis=0)

x axis labels (date) slips in Python matplotlib

I'm beginner in Python and I have the following problems. I would like to plot a dataset, where the x-axis shows date data. The Dataset look likes the follows:
datum, start, end
2017.09.01 38086 37719,8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
In the first column are the labels of the x-axis (this is the date).
When I write the following code the x axis data slips:
import pandas as pd
import matplotlib.pyplot as plt
bux = pd.read_csv('C:\\Home\\BUX.txt',
sep='\t',
decimal='.',
header=0)
fig1 = bux.plot(marker='o')
fig1.set_xticklabels(bux.datum, rotation='vertical', fontsize=8)
The resulted figure look likes as follows:
The second data row in the dataset is '2017.09.04 37707.3906 37465.2617', BUT '2017.09.04' is yield at the third data row with start value=37471.5117
What shell I do to get correct x axis labels?
Thank you!
Agnes
First, there is a comma in the second line instead of a .. This should be adjusted. Then, you convert the "datum," column to actual dates and simply plot the dataframe with matplotlib.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/BUX.txt', sep='\s+')
df["datum,"] = pd.to_datetime(df["datum,"], format="%Y.%m.%d")
plt.plot(df["datum,"], df["start,"], marker="o")
plt.plot(df["datum,"], df["end"], marker="o")
plt.gcf().autofmt_xdate()
plt.show()
Thank you! It works perfectly. The key moment was to convert the data to date format. Thank you again!
Agnes
Actually you can easily use the df.plot() to fix it:
import pandas as pd
import matplotlib.pyplot as plt
import io
t="""
date start end
2017.09.01 38086 37719.8984
2017.09.04 37707.3906 37465.2617
2017.09.05 37471.5117 37736.1016
2017.09.06 37723.5898 37878.8594
2017.09.07 37878.8594 37783.5117
2017.09.08 37764.7383 37596.75
2017.09.11 37615.5117 37895.8516
2017.09.12 37889.6016 38076.8789
2017.09.13 38089.1406 38119.0898
2017.09.14 38119.2617 38243.1992
2017.09.15 38243.7188 38325.9297
2017.09.18 38325.3086 38387.2188
2017.09.19 38387.2188 38176.0781
2017.09.20 38173.2109 38108.0391
2017.09.21 38107.2617 38109.2109
2017.09.22 38110.4609 38178.6289
2017.09.25 38121.9102 38107.8711
2017.09.26 38127.25 37319.2383
2017.09.27 37360.8398 37244.3008
2017.09.28 37282.1094 37191.6484
2017.09.29 37192.1484 37290.6484
"""
import numpy as np
data=pd.read_fwf(io.StringIO(t),header=1,parse_dates=['date'])
data.plot(x='date',marker='o')
plt.show()

Charting with Candlestick_OHLC

import pandas as pd
import numpy as np
from matplotlib.finance import candlestick_ohlc
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import matplotlib.ticker as mticker
import io
import datetime
import urllib
import urllib.request
%matplotlib notebook
urlToVisit = 'http://chartapi.finance.yahoo.com/instrument/1.0/GOOG/chartdata;
type=quote;range=1y/csv'
with urllib.request.urlopen(urlToVisit) as response:
sourcePage = response.read().decode('utf-8')
df = pd.read_csv(io.StringIO(sourcePage), skiprows=18, header=None, sep=",",
names=['date','closeP','highP','lowP','openP','volume'],
index_col= 0, parse_dates= True)
if 'volume' not in df:
df['volume'] = np.zeros(len(df))
DATA = df[['openP', 'highP', 'lowP', 'closeP','volume']].values
f1 = plt.subplot2grid((6,4), (1,0), rowspan=6, colspan=4, axisbg='#07000d')
candlestick_ohlc(f1, DATA, width=.6, colorup='#53c156', colordown='#ff1717')
f1.grid('on')
f1.xaxis.set_major_locator(mticker.MaxNLocator(15))
f1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.subplots_adjust(left=.09, bottom=.14, right=.94, top=.95, wspace=.20, hspace=0)
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.show()
So here's the problem, when I try to plot the 'candlestick_ohlc' but it only plots the volume bar chart! (Why is this happening?) I'm thinking that maybe the problem has to do with my dates? I'm using iPython Notebook btw. My source is from - Yahoo Finance. If you notice, I skipped the first 18 lines so that I can get straight to the actual data itself and it looks like:
20150302,569.7757,570.5834,557.2202,558.9953,2129600
20150303,572.0694,573.8146,564.9689,568.8881,1704700
20150304,571.8001,575.5299,566.4548,570.3043,1876800
20150305,573.7548,576.3277,571.8400,573.4456,1389600
20150306,566.1307,575.1011,565.2082,573.3060,1659100
20150309,567.2925,568.7086,561.9921,565.3079,1062100
date,close,high,low,open,volume
Any ideas? Would appreciate any help!!
So with the help of #DSM,
DATA = df[['openP', 'highP', 'lowP', 'closeP','volume']]
DATA = DATA.reset_index()
DATA["date"] = DATA["date"].apply(mdates.date2num)
f1 = plt.subplot2grid((6,4), (1,0), rowspan=6, colspan=4, axisbg='#07000d')
candlestick_ohlc(f1, DATA.values, width=.6, colorup='#53c156', colordown='#ff1717')
fixed the problem! Credits to him.

Resources