KeyError: "['cut'] not found in axis" - python-3.x

I am working with thousands of lines of data trying to narrow a search for certain grains. To do this, I have an 'Asset' column with about 20 different values, of which I need to receive the sum of all of the lines in the adjacent column 'Load'.
I would like to cut the unnecessary rows out of my data set. To start, I relabeled all of the extra assets as 'cut' (as shown in the example below) so that I could manage one .drop command. Here is how it is coded:
df14['Asset'] = df14["Asset"].str.replace('BEANS', 'cut')
df14.drop("cut", axis=0)
set(df14['Asset'])
This is the error I have received:
KeyError Traceback (most recent call last)
<ipython-input-593-40006512df80> in <module>
----> 1 df14.drop("cut", axis=0)
2 set(df14['Asset'])
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
4100 level=level,
4101 inplace=inplace,
-> 4102 errors=errors,
4103 )
4104
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
3912 for axis, labels in axes.items():
3913 if labels is not None:
-> 3914 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
3915
3916 if inplace:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
3944 new_axis = axis.drop(labels, level=level, errors=errors)
3945 else:
-> 3946 new_axis = axis.drop(labels, errors=errors)
3947 result = self.reindex(**{axis_name: new_axis})
3948
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
5338 if mask.any():
5339 if errors != "ignore":
-> 5340 raise KeyError("{} not found in axis".format(labels[mask]))
5341 indexer = indexer[~mask]
5342 return self.delete(indexer)
KeyError: "['cut'] not found in axis"
I have tried several commands to remove said lines, like:
df14.drop(["cut"], inplace = True)
df14[~df14['Asset'].isin(to_drop)]
df14[df14['Asset'].str.contains('cut', na = True)]
And all of them yield the same fruits.
When I code
df14 = df14[~df14["Asset"].str.contains('BEANS')]
It does not remove the Load number, which is the next column over, from my final calculations.
Is it possible to remove all rows of data with a certain label so I can trim from 20 assets to 7 assets?
Thank you

pd.drop works by column or row wise. You give column name to drop a column or index to drop a row. Andaxis=0 means index-wise. Since you don't have a index named "cut", it gives the error.
I recommend doing it by:
df = df.loc[df['Asset'] != 'cut']

I believe that df14.drop("cut", axis=0) is failing because it is looking for the value "cut" in the index of df14. You could potentially specify the asset column as an index, see the pandas documentation on drop for how, but I think a better solution might be something along lines of
df14 = df14.query('asset != "cut"')
I can't say I know if this is the fastest solution since I usually work with small-ish datasets I've not had to worry about performance too much.

This should do the job.
Here you are basically selecting all rows other than 'cut'
df14 = df14.loc[df14['Asset'] != 'cut']

Related

OneHotEncoder failing after combining dataframes

I have a model that runs successfully.
When I tried to predict using it, it was failing due to the fact that after OneHotEncoding, the test set had more columns than the train.
After some reading I found where I need to concat the two df's first, OneHotEncode, then split them apart.
Added a 'temp' column to the train data set with value 'train'.
Added a 'temp' column to the test data set with value 'test'.
This is so that I can split the df apart later using boolean indexing like this:
X = temp_df[temp_df['temp'] == 'train']
X2 = temp_df[temp_df['temp'] == 'test']
Vertically concat the two df's.
Verify the shape of the new combined df.
Change all columns to type 'category' except 'temp', which is object:
basin category
region category
lga category
extraction_type_class category
management category
quality_group category
quantity category
source category
waterpoint_type category
cluster category
temp object
Now I am simply trying to OneHotEncode like I did before. I choose only categorical columns:
cat_ix = temp_df.select_dtypes(include=['category']).columns
And I try to apply with:
ct = ColumnTransformer([('o', OneHotEncoder(), cat_ix)], remainder='passthrough')
temp_df = ct.fit_transform(temp_df)
It fails on the temp_df = ct.fit_transform(temp_df) line.
These identical steps worked perfectly before I added the temp column and concat'd the two df's.
The exact error:
Traceback (most recent call last):
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\compose\_column_transformer.py", line 778, in _hstack
converted_Xs = [
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\compose\_column_transformer.py", line 779, in <listcomp>
check_array(X, accept_sparse=True, force_all_finite=False)
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\utils\validation.py", line 738, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py", line 1993, in __array__
return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: 'train'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\Mark\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\compose\_column_transformer.py", line 783, in _hstack
raise ValueError(
ValueError: For a sparse output, all columns should be a numeric or convertible to a numeric.
Why is it complaining about 'train'? That is in the 'temp' column which is being excluded.
Note that the traceback doesn't reference OneHotEncoder, it's all the ColumnTransformer. You're trying to pass through the temp column, which gets tacked onto the one-hot-encoded sparse matrix in the method _hstack, and the second error message is the more relevant one. It cannot stack a string-type array onto a numeric sparse array (which leads to the first error message).
If the sparse matrix isn't too large, you can just force it to be dense by using sparse_threshold=0 in the ColumnTransformer or sparse=False in the OneHotEncoder. If it is too large for memory (or you'd prefer the sparse matrices), you could use a 0/1 indicator for the train/test split instead of the strings "train", "test".

KeyError: "None of [Index(['23/01/2020' ......,\n dtype='object', length=9050)] are in the [columns]"

I am learning pandas and matplotlib on my own by using some public dataset via
this api link
I'm using colab and below are my codes:
import datetime
import io
import json
import pandas as pd
import requests
import matplotlib.pyplot as plt
confirm_resp = requests.get('https://api.data.gov.hk/v2/filterq=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv%22%2 C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
confirm_df = pd.read_json(io.StringIO(confirm_resp.decode('utf-8')))
confirm_df.columns = confirm_df.columns.str.replace(" ", "_")
pd.to_datetime(confirm_df['Report_date'])
confirm_df.columns = ['Case_no', 'Report_date', 'Onset_date', 'Gender', 'Age',
'Name_of_hospital_admitted', 'Status', 'Resident', 'Case_classification', 'Confirmed_probable']
confirm_df = confirm_df.drop('Name_of_hospital_admitted', axis = 1)
confirm_df.head()
and this is what the dataframe looks like:
Case_no
Report_date
Onset_date
Gender
Age
Status
Resident
Case_classification
Confirmed_probable
1
23/01/2020
21/01/2020
M
39
Discharged
Non-HK resident
Imported case
Confirmed
2
23/01/2020
18/01/2020
M
56
Discharged
HK resident
Imported case
Confirmed
3
24/01/2020
20/01/2020
F
62
Discharged
Non-HK resident
Imported case
Confirmed
4
24/01/2020
23/01/2020
F
62
Discharged
Non-HK resident
Imported case
Confirmed
5
24/01/2020
23/01/2020
M
63
Discharged
Non-HK resident
Imported case
Confirmed
When I try to make a simple plot with the below code:
x = confirm_df['Report_date']
y = confirm_df['Case_classification']
confirm_df.plot(x, y)
It gives me the below error:
KeyError Traceback (most recent call last)
<ipython-input-17-e4139a9b5ef1> in <module>()
4 y = confirm_df['Case_classification']
5
----> 6 confirm_df.plot(x, y)
3 frames
/usr/local/lib/python3.6/dist-packages/pandas/plotting/_core.py in __call__(self, *args, **kwargs)
912 if is_integer(x) and not data.columns.holds_integer():
913 x = data_cols[x]
--> 914 elif not isinstance(data[x], ABCSeries):
915 raise ValueError("x must be a label or position")
916 data = data.set_index(x)
/usr/local/lib/python3.6/dist-packages/pandas/core/frame.py in __getitem__(self, key)
2910 if is_iterator(key):
2911 key = list(key)
-> 2912 indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
2913
2914 # take() does not accept boolean indexers
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1253
-> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1255 return keyarr, indexer
1256
/usr/local/lib/python3.6/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1296 if missing == len(indexer):
1297 axis_name = self.obj._get_axis_name(axis)
-> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1299
1300 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "*None of [Index(['23/01/2020', '23/01/2020', '24/01/2020', '24/01/2020', '24/01/2020',\n '26/01/2020', '26/01/2020', '26/01/2020', '29/01/2020', '29/01/2020',\n ...\n '05/01/2021', '05/01/2021', '05/01/2021', '05/01/2021', '05/01/2021',\n '05/01/2021', '05/01/2021', '05/01/2021', '05/01/2021', '05/01/2021'],\n dtype='object', length=9050)] are in the [column*s]"
I have tried to make the plot with and without converting Report date to datetime object, I tried substitute x value with all the columns in the data frame, but all give me the same error code.
Appreciate if anyone can help me to understand how to handle these issues here and going forward. I've spent hours to resolve it but cannot find the answers.
I did not encounter this issue before when I downloaded some notebooks and datasets from Kaggle to follow along.
Thank you and happy new year.
First, you need to assign the converted date back to the column:
confirm_df['Report_date'] = pd.to_datetime(confirm_df['Report_date'])
Second, When the plot method is called from a dataframe object, you need to provide only the column names as argument (1).
confirm_df.plot(x='Report_date', y='Case_classification')
But the above code still throws error because 'Case_classification' is not numeric data.
You are trying to plot datetime vs. categorical data, so normal plot won't work but Something like this could work (2):
# I used only first 15 examples here, full dataset is kinda messy
confirm_df.iloc[:15, :].groupby(['Report_date', 'Case_classification']).size().unstack().plot.bar()
(1)pandas.DataFrame.plot
(2)How to plot categorical variable against a date column in Python
Several problems. First, the links were incorrect, I have edited them (probably just a copy/paste error). Second, you have to assign the converted datetime series back to the dataframe. Use print(confirm_df.dtypes) to see the difference. Then, the dataset is not ordered by date, but matplotlib expects an ordered x-axis. Well, actually, the problem was that the parser misinterpreted the datetime objects. I have added dayfirst=True to ensure that the dates are read correctly. Finally, what do you want to plot here? Just the cases by date? The number of cases per group by date? Your original code implies just the former but this is not really informative, is it?
import io
import pandas as pd
import requests
import matplotlib.pyplot as plt
print("starting download")
confirm_resp = requests.get('https://api.data.gov.hk/v2/filter?q=%7B%22resource%22%3A%22http%3A%2F%2Fwww.chp.gov.hk%2Ffiles%2Fmisc%2Fenhanced_sur_covid_19_eng.csv%22%2C%22section%22%3A1%2C%22format%22%3A%22json%22%7D').content
print("finished download")
confirm_df = pd.read_json(io.StringIO(confirm_resp.decode('utf-8')))
confirm_df.columns = confirm_df.columns.str.replace(" ", "_")
confirm_df['Report_date'] = pd.to_datetime(confirm_df['Report_date'], dayfirst=True)
confirm_df.columns = ['Case_no', 'Report_date', 'Onset_date', 'Gender', 'Age',
'Name_of_hospital_admitted', 'Status', 'Resident', 'Case_classification', 'Confirmed_probable']
confirm_df = confirm_df.drop('Name_of_hospital_admitted', axis = 1)
print(confirm_df.dtypes)
fig, ax = plt.subplots(figsize=(20, 5))
ax.plot(confirm_df['Report_date'], confirm_df['Case_classification'])
plt.tight_layout()
plt.show()
Sample output:
Some grouping and data aggregation might be more informative, but you have to decide what you want to display first before writing the code.

ImageDataBunch.from_df positional indexers are out-of-bounds

scratching my head on this issue. i dont know how to identify the positional indexers. am i even passing them?
attempting this for my first kaggle comp, can pass in the csv to a dataframe and make the needed edits. trying to create the ImageDataBunch so training a cnn can begin. This error pops up no matter which method is tried. Any advice would be appreciated.
data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
data.classes
Backtrace
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-25-5588812820e8> in <module>
----> 1 data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)
2 data.classes
/opt/conda/lib/python3.7/site-packages/fastai/vision/data.py in from_df(cls, path, df, folder, label_delim, valid_pct, seed, fn_col, label_col, suffix, **kwargs)
117 src = (ImageList.from_df(df, path=path, folder=folder, suffix=suffix, cols=fn_col)
118 .split_by_rand_pct(valid_pct, seed)
--> 119 .label_from_df(label_delim=label_delim, cols=label_col))
120 return cls.create_from_ll(src, **kwargs)
121
/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in _inner(*args, **kwargs)
477 assert isinstance(fv, Callable)
478 def _inner(*args, **kwargs):
--> 479 self.train = ft(*args, from_item_lists=True, **kwargs)
480 assert isinstance(self.train, LabelList)
481 kwargs['label_cls'] = self.train.y.__class__
/opt/conda/lib/python3.7/site-packages/fastai/data_block.py in label_from_df(self, cols, label_cls, **kwargs)
283 def label_from_df(self, cols:IntsOrStrs=1, label_cls:Callable=None, **kwargs):
284 "Label `self.items` from the values in `cols` in `self.inner_df`."
--> 285 labels = self.inner_df.iloc[:,df_names_to_idx(cols, self.inner_df)]
286 assert labels.isna().sum().sum() == 0, f"You have NaN values in column(s) {cols} of your dataframe, please fix it."
287 if is_listy(cols) and len(cols) > 1 and (label_cls is None or label_cls == MultiCategoryList):
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
1760 except (KeyError, IndexError, AttributeError):
1761 pass
-> 1762 return self._getitem_tuple(key)
1763 else:
1764 # we by definition only have the 0th axis
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
2065 def _getitem_tuple(self, tup: Tuple):
2066
-> 2067 self._has_valid_tuple(tup)
2068 try:
2069 return self._getitem_lowerdim(tup)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
701 raise IndexingError("Too many indexers")
702 try:
--> 703 self._validate_key(k, i)
704 except ValueError:
705 raise ValueError(
/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
2007 # check that the key does not exceed the maximum size of the index
2008 if len(arr) and (arr.max() >= len_axis or arr.min() < -len_axis):
-> 2009 raise IndexError("positional indexers are out-of-bounds")
2010 else:
2011 raise ValueError(f"Can only index by location with a [{self._valid_types}]")
IndexError: positional indexers are out-of-bounds
I faced this error while creating a DataBunch when my dataframe/CSV did not have a class label explicitly defined.
I created a dummy column which stored 1's for all my rows in the dataframe and it seemed to work. Also please be sure to store your independent variable in the second column and the label(dummy variable in this case) in the first column.
I believe this error happens if there's just one column in the Pandas DataFrame.
Thanks.
Code:
df = pd.DataFrame(lines, columns=["dummy_value", "text"])
df.to_csv("./train.csv")
data_lm = TextLMDataBunch.from_csv(path, "train.csv", min_freq=1)
Note: This is my first attempt at answering a StackOverflow question. Hope it helped!
This error also appears when your dataset is not correctly split between test and validation.
In the case of dataframes, it assumes there is a column is_valid that indicates which rows are in validation set.
If all rows have True, then the training set is empty, so fastai cannot index into it to prepare the first example, thus raising this error.
Example:
data = pd.DataFrame({
'fname': [f'{x}.png' for x in range(10)],
'label': np.arange(10)%2,
'is_valid': True
})
blk = DataBlock((ImageBlock, CategoryBlock),
splitter=ColSplitter(),
get_x=ColReader('fname'),
get_y=ColReader('label'),
item_tfms=Resize(224, method=ResizeMethod.Squish),
)
blk.summary(data)
Results in the error.
Solution
The solution is to check that your data can be split correctly into train and valid sets. In the above example, it suffices to have one row that is not in validation set:
data.loc[0, 'is_valid'] = False
How to figure it out?
Work in a jupyter notebook. After the error, type %debug in a cell, and enter the post mortem debugging. Go to the frame of the setup function ( fastai/data/core.py(273) setup() ) by going up 5 frames.
This takes you to this line that is throwing the error.
You can then print(self.splits) and observe that the first one is empty.

How can I plot a categorical vs categorical plot?

I want to check the count of categories (in the first column) with the count of categories in the second column. I have two columns:
1. Max_glu_serum with categories: None, Norm, <200, <300.
2. Readmitted with categories: No, <30, >30.
I want a plot so that I can check what is the count of '<300' with '>30' i.e., how many patients had max_glu_serum = >300 and were readmitted in '>30' days
I tried the following code:
sns.catplot(y=train_data_wmis['max_glu_serum'],
hue=train_data_wmis['readmitted'],
kind="count",
palette="pastel", edgecolor=".6", dropna=True)
but it throws the following error:
TypeError Traceback (most recent call last)
<ipython-input-384-1be2c9032203> in <module>
----> 1 sns.catplot(y=train_data_wmis['max_glu_serum'], hue=train_data_wmis['readmitted'], kind="count", palette="pastel", edgecolor=".6", dropna=True)
F:\Anaconda3\lib\site-packages\seaborn\categorical.py in catplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, height, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
3750
3751 # Initialize the facets
-> 3752 g = FacetGrid(**facet_kws)
3753
3754 # Draw the plot onto the facets
F:\Anaconda3\lib\site-packages\seaborn\axisgrid.py in __init__(self, data, row, col, hue, col_wrap, sharex, sharey, height, aspect, palette, row_order, col_order, hue_order, hue_kws, dropna, legend_out, despine, margin_titles, xlim, ylim, subplot_kws, gridspec_kws, size)
255 # Make a boolean mask that is True anywhere there is an NA
256 # value in one of the faceting variables, but only if dropna is True
--> 257 none_na = np.zeros(len(data), np.bool)
258 if dropna:
259 row_na = none_na if row is None else data[row].isnull()
TypeError: object of type 'NoneType' has no len()
Can someone help me, please!
I tried a couple of things and finally found one solution to the above problem. Defined the following function:
def plot_stack(column_1, column_2):
plot_stck=pd.crosstab(index=column_1, columns=column_2)
plot_stck.plot(kind='bar', figsize=(8,8), stacked=True)
return
Then,
plot_stack(train_data_wmis['max_glu_serum'], train_data_wmis['readmitted'])
Output:
Stacked Plot of 'max_glu_serum' and 'readmitted'
Please comment, if a better solution is available via Seaborn. Thanks

User Warning: The following kwargs were not used by contour: 'label', 'color'

Im trying to create a comparison plot using Seaborn's PairGrid function on my dataset. My data set has 6 columns that I am trying to plot using the scatter() function in my .map_upper segment of the PairGrid function I'm applying to the entire dataframe. Here is a quick peak at my dataframe object; the 'year' object is set as the dataframe's index
Here are the data types of my dataframe comp_pct_chg_df:
year object
Amsterdam float64
Barcelona float64
Kingston float64
Milan float64
Philadelphia float64
Global float64
dtype: object
Here is my erroneous code below:
# Creating a comparison plot (.PairGrid()) of all my cities' and global data's average percent change in temperature
# Set up my figure by naming it 'pct_chg_yrly_fig', then call PairGrid on the DataFrame
pct_chg_yrly_fig = sns.PairGrid(comp_pct_chg_df.dropna())
# Using map_upper we can specify what the upper triangle will look like.
pct_chg_yrly_fig.map_upper(plt.scatter,color='purple')
# We can also define the lower triangle in the figure, including the plot type (KDE) or the color map (BluePurple)
pct_chg_yrly_fig.map_lower(sns.kdeplot,cmap='cool_d')
# Finally we'll define the diagonal as a series of histogram plots of the yearly average percent change in temperature
pct_chg_yrly_fig.map_diag(plt.hist,histtype='step',linewidth=3,bins=30)
# Adding a legend
pct_chg_yrly_fig.add_legend()
Some of the visualizations do plot out, like the .map_lower() function I used, which turned out great. I'd like to plot each city however, in a different color for my scatter plot used in the .map_upper() function I've used. Right now its monochromatic, and hard to tell which data points belong to which city. And lastly, my .map_diag() doesn't plot at all. I don't know what I'm doing wrong. I've assessed the ValueError msg I received (which is below) and tried manipulating dozens of **kwargs, label and color specifically, to no avail. Help would be greatly appreciated.
Here is the ValueError msg I'm receiving:
ValueError Traceback (most recent call last)
<ipython-input-38-3fcf1b69d4ef> in <module>()
11
12 # Finally we'll define the diagonal as a series of histogram plots of the yearly average percent change in temperature
---> 13 pct_chg_yrly_fig.map_diag(plt.hist,histtype='step',linewidth=3,bins=30)
14
15 # Adding a legend
~/anaconda3/lib/python3.6/site-packages/seaborn/axisgrid.py in map_diag(self, func, **kwargs)
1361
1362 if "histtype" in kwargs:
-> 1363 func(vals, color=color, **kwargs)
1364 else:
1365 func(vals, color=color, histtype="barstacked", **kwargs)
~/anaconda3/lib/python3.6/site-packages/matplotlib/pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, hold, data, **kwargs)
3023 histtype=histtype, align=align, orientation=orientation,
3024 rwidth=rwidth, log=log, color=color, label=label,
-> 3025 stacked=stacked, normed=normed, data=data, **kwargs)
3026 finally:
3027 ax._hold = washold
~/anaconda3/lib/python3.6/site-packages/matplotlib/__init__.py in inner(ax, *args, **kwargs)
1715 warnings.warn(msg % (label_namer, func.__name__),
1716 RuntimeWarning, stacklevel=2)
-> 1717 return func(ax, *args, **kwargs)
1718 pre_doc = inner.__doc__
1719 if pre_doc is None:
~/anaconda3/lib/python3.6/site-packages/matplotlib/axes/_axes.py in hist(***failed resolving arguments***)
6137 color = mcolors.to_rgba_array(color)
6138 if len(color) != nx:
-> 6139 raise ValueError("color kwarg must have one color per dataset")
6140
6141 # If bins are not specified either explicitly or via range,
ValueError: color kwarg must have one color per dataset
I also noticed that my index, the year object, is plotting out in the upper left corner of my PairGrid. It looks like a bunch of vertical lines plotted next to one another. Not sure why it’s plotting but could it be because the values ( years 1743 - 2015) end in ‘.0’? I noticed this when I put the data frame together (and I don’t know how to drop it... Python newb here) so I changed the year column’s data type from float64 to string and set it as my index. I thought doing this would make my index ‘unworkable’ meaning even though the values are numbers, the data type is set to string so no calculations could be done on them? Am I missing something here?

Resources