I am new to Python,
I am trying to load the columns into the python file and then display a chart but I keep getting millions of errors.
I have a csv file with 2 columns.
All im trying to do is call the columns and present it on a graph! I originally used dataframe but now im here after multiple attempts.
Code
import matplotlib.pyplot as plt
import csv
import pandas as pd
with open('religion.csv') as file:
reader = csv.reader(file)
count = 0
for row in reader:
print(row)
if count > 5:
break
count +=1
# use the scatter function
#plt.scatter(x, y, alpha=0.5)
x = reader['religions']
y = reader['students']
plt.scatter(x, y, alpha=0.5)
plt.show()
excel file
files and code
Sample data
religions schuler
Romisch-Katholisch 371
Moslem 298
Ohne Bekenntnis 182
Serbisch-Orthodox 120
Evangelisch A.B. 26
Rumnisch-Orthodox 15
Sonstige Religion 9
Updated code (Still not working)
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_fwf('religion.csv')
df.columns.tolist()
x = df['religions']
y = df['schuler']
df.columns.tolist()
plt.scatter(x, y, alpha=0.5)
plt.show()
folder location
Current error
KeyError
Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'religions'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-6-f2e811496fb9> in <module>()
----> 1 x = df['religions']
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2686 return self._getitem_multilevel(key)
2687 else:
-> 2688 return self._getitem_column(key)
2689
2690 def _getitem_column(self, key):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
2693 # get column
2694 if self.columns.is_unique:
-> 2695 return self._get_item_cache(key)
2696
2697 # duplicate columns & possible reduce dimensionality
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
2487 res = cache.get(item)
2488 if res is None:
-> 2489 values = self._data.get(item)
2490 res = self._box_item_values(item, values)
2491 cache[item] = res
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
4113
4114 if not isna(item):
-> 4115 loc = self.items.get_loc(item)
4116 else:
4117 indexer = np.arange(len(self.items))[isna(self.items)]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3078 return self._engine.get_loc(key)
3079 except KeyError:
-> 3080 return self._engine.get_loc(self._maybe_cast_indexer(key))
3081
3082 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'religions'
1. While reading the CSV file, you need to specify the sep=';'
df = pd.read_csv("C:/Test/rel.csv", sep=';')
df
Out[417]:
religions schuler
0 Romisch-Katholisch 371
1 Moslem 298
2 Ohne Bekenntnis 182
3 Serbisch-Orthodox 120
4 Evangelisch A.B. 26
5 Rumnisch-Orthodox 15
6 Sonstige Religion 9
2. You can plot it using the pd.plot (build in function) in pandas
This uses matplotlib in the background, you can specify the x & y columns. (I have used 'bar' plot but you can use any other type from this reference):
df.plot(x='religions', y= 'schuler', kind='bar')
Out[418]: <matplotlib.axes._subplots.AxesSubplot at 0xae7e518>
[Plot image]
Image link: https://i.stack.imgur.com/8u0xs.png
Using pandas and matplotlib is fine. Try importing the CSV file like this:
df = pd.read_csv("religion.csv")
If your CSV hasn't got the columns header names, pass them as a list to the name argument. Also if you don't want the first column of the DF to be the index column, set index_col parameter to False. You can check the documentation related to the read_csv here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Then plot your data with pyplot:
plt.scatter(df['religions'], df['students'])
plt.show()
Related
I could be missing something here but I believe that there is something odd going on with pandas datetime slicing. Here is a reproducible example:
import pandas as pd
import pandas_datareader as pdr
testdf = pdr.DataReader('SPY', 'yahoo')
testdf.index = pd.to_datetime(testdf.index)
testdf['2020-11']
Here we can see that slicing to find the month's data returns the expected output.
However, now lets try to find the row corresponding to Nov 9 2020.
testdf['2020-11-09']
And we get the following traceback.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2894 try:
-> 2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: '2020-11-09'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-78-a42a45b5c3a4> in <module>
----> 1 testdf['2020-11-09']
C:\Anaconda\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
C:\Anaconda\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2895 return self._engine.get_loc(casted_key)
2896 except KeyError as err:
-> 2897 raise KeyError(key) from err
2898
2899 if tolerance is not None:
KeyError: '2020-11-09'
Here we can see that the key is in fact in the index:
testdf['2020-11'].index
DatetimeIndex(['2020-11-02', '2020-11-03', '2020-11-04', '2020-11-05',
'2020-11-06', '2020-11-09'],
dtype='datetime64[ns]', name='Date', freq=None)
Is this a bug or am I a bug?
testdf['2020-11-09'] slice column-wise, i.e. looking in columns for '2020-11-09'. Do you mean:
testdf.loc['2020-11-09']
I am a new user to folium, but I am trying to follow another user's code. For some reason, this isn't working and I just want to be sure it doesn't have to do with the fact that I'm using Jupyter notebook.
m = folium.Map(location=[0, 0], tiles='cartodbpositron',
min_zoom=1, max_zoom=4, zoom_start=1)
for i in range(0, len(full_latest)):
folium.Circle(
location=[full_latest.iloc[i]['Lat'], full_latest.iloc[i]['Long']],
color='crimson',
tooltip = '<li><bold>Country : '+str(full_latest.iloc[i]['Country'])+
'<li><bold>Province : '+str(full_latest.iloc[i]['Province/State'])+
'<li><bold>Confirmed : '+str(full_latest.iloc[i]['Confirmed'])+
'<li><bold>Deaths : '+str(full_latest.iloc[i]['Deaths'])+
'<li><bold>Recovered : '+str(full_latest.iloc[i]['Recovered']),
radius=int(full_latest.iloc[i]['Confirmed'])**1.1).add_to(m)
m
I am learning Data Science and checked the references to DataFrames in pandas, but don't see any errors in spelling. I've checked with my mentor, and still no luck.
Here is the Error Output:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4735 try:
-> 4736 return libindex.get_value_box(s, key)
4737 except IndexError:
pandas\_libs\index.pyx in pandas._libs.index.get_value_box()
pandas\_libs\index.pyx in pandas._libs.index.get_value_at()
pandas\_libs\util.pxd in pandas._libs.util.get_value_at()
pandas\_libs\util.pxd in pandas._libs.util.validate_indexer()
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-21-4272891a2f02> in <module>
6 for i in range(0, len(full_latest)):
7 folium.Circle(
----> 8 location=[full_latest.iloc[i]['Lat'], full_latest.iloc[i]['Long']],
9 color='crimson',
10 tooltip = '<li><bold>Country : '+str(full_latest.iloc[i]['Country'])+
~\Anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
1066 key = com.apply_if_callable(key, self)
1067 try:
-> 1068 result = self.index.get_value(self, key)
1069
1070 if not is_scalar(result):
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4742 raise InvalidIndexError(key)
4743 else:
-> 4744 raise e1
4745 except Exception: # pragma: no cover
4746 raise e1
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4728 k = self._convert_scalar_indexer(k, kind="getitem")
4729 try:
-> 4730 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4731 except KeyError as e1:
4732 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Lat'
It is trying to reference a 'Lat', which is not in the dataset I am using. Apologies for the question spam.
Getting ValueError: "text" while trying to read and feed csv data BasicClassificationDatasetReader from deeppavlov model
from deeppavlov import dataset_readers
dat = dataset_readers.basic_classification_reader.BasicClassificationDatasetReader()
l=dat.read("C:\Users\Anna\Desktop\NLP\test", url=None, format = 'csv', sep=',', header = 1)
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4380 try:
-> 4381 return libindex.get_value_box(s, key)
4382 except IndexError:
pandas/_libs/index.pyx in pandas._libs.index.get_value_box()
pandas/_libs/index.pyx in pandas._libs.index.get_value_at()
pandas/_libs/util.pxd in pandas._libs.util.get_value_at()
pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()
TypeError: 'str' object cannot be interpreted as an integer
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
in
2
3 dat = dataset_readers.basic_classification_reader.BasicClassificationDatasetReader()
----> 4 l=dat.read("C:\Users\Anna\Desktop\NLP\test", url=None, format = 'csv', sep=',', header = 1, names = ['x','y'])
~\Anaconda3\lib\site-packages\deeppavlov\dataset_readers\basic_classification_reader.py in read(self, data_path, url, format, class_sep, *args, **kwargs)
100 if class_sep is None:
101 # each sample is a tuple ("text", "label")
--> 102 data[data_type] = [(row[x], str(row[y])) for _, row in df.iterrows()]
103 else:
104 # each sample is a tuple ("text", ["label", "label", ...])
~\Anaconda3\lib\site-packages\deeppavlov\dataset_readers\basic_classification_reader.py in (.0)
100 if class_sep is None:
101 # each sample is a tuple ("text", "label")
--> 102 data[data_type] = [(row[x], str(row[y])) for _, row in df.iterrows()]
103 else:
104 # each sample is a tuple ("text", ["label", "label", ...])
~\Anaconda3\lib\site-packages\pandas\core\series.py in getitem(self, key)
866 key = com.apply_if_callable(key, self)
867 try:
--> 868 result = self.index.get_value(self, key)
869
870 if not is_scalar(result):
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4387 raise InvalidIndexError(key)
4388 else:
-> 4389 raise e1
4390 except Exception: # pragma: no cover
4391 raise e1
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
4373 try:
4374 return self._engine.get_value(s, k,
-> 4375 tz=getattr(series.dtype, 'tz', None))
4376 except KeyError as e1:
4377 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'text'
from deeppavlov import train_model, configs
I want that data will be feeded without errors. Now Data look like
value label
1600 rows
There are undocumented initialization arguments x='text' and y='labels' — headers for x and y data. The error is there because pandas could not find the text header in your data.
Remember also that you use header=1 and rows numbers start with 0, so the first line in your csv-file is skipped.
I am trying to create data generator using ImageDataGenerator.flow_from_dataframe but facing keyerror: class
Before using flow_from_dataframe, i created a pivot of training dataframe where class labels are converted to columns
train_df = train[['Label', 'filename', 'subtype']].drop_duplicates().pivot(index='filename', columns='subtype', values='Label').reset_index()
Below is the output of dataframe train_df.
subtype filename any epidural intraparenchymal intraventricular subarachnoid subdural
0 ID_000039fa0.dcm 0 0 0 0 0 0
1 ID_00005679d.dcm 0 0 0 0 0 0
2 ID_00008ce3c.dcm 0 0 0 0 0 0
3 ID_0000950d7.dcm 0 0 0 0 0 0
4 ID_0000aee4b.dcm 0 0 0 0 0 0
train_gen = datagen.flow_from_dataframe(train_df,
directory='/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images',
xcol='filename',
ycol=['any', 'epidural', 'intraparenchymal','intraventricular', 'subarachnoid', 'subdural'],
class_mode='categorical',
target_size=(300, 300),
batch_size=64,
subset='training')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'class'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-93-0b64db9da6bb> in <module>
6 target_size=(300, 300),
7 batch_size=64,
----> 8 subset='training')
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/image_data_generator.py in flow_from_dataframe(self, dataframe, directory, x_col, y_col, weight_col, target_size, color_mode, classes, class_mode, batch_size, shuffle, seed, save_to_dir, save_prefix, save_format, subset, interpolation, validate_filenames, **kwargs)
681 subset=subset,
682 interpolation=interpolation,
--> 683 validate_filenames=validate_filenames
684 )
685
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/dataframe_iterator.py in __init__(self, dataframe, directory, image_data_generator, x_col, y_col, weight_col, target_size, color_mode, classes, class_mode, batch_size, shuffle, seed, data_format, save_to_dir, save_prefix, save_format, subset, interpolation, dtype, validate_filenames)
127 self.dtype = dtype
128 # check that inputs match the required class_mode
--> 129 self._check_params(df, x_col, y_col, weight_col, classes)
130 if validate_filenames: # check which image files are valid and keep them
131 df = self._filter_valid_filepaths(df, x_col)
/opt/conda/lib/python3.6/site-packages/keras_preprocessing/image/dataframe_iterator.py in _check_params(self, df, x_col, y_col, weight_col, classes)
202 if self.class_mode == 'categorical':
203 types = (str, list, tuple)
--> 204 if not all(df[y_col].apply(lambda x: isinstance(x, types))):
205 raise TypeError('If class_mode="{}", y_col="{}" column '
206 'values must be type string, list or tuple.'
/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'class'
Could someone let me know how can I fix this issue.
Any help is appreciated.
Can you try this, basically setting class_mode to other
columns=["any", "epidural", "intraparenchymal","intraventricular", "subarachnoid", "subdural"]
train_generator=datagen.flow_from_dataframe(
directory="/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images",
x_col="filename",
y_col=columns,
class_mode="other"
target_size=(300, 300)
batch_size=64,
subset="training")
Do not pivot the table. Just pass the y_col as the Label field, and put the list of unique values in class parameter.
Set the class_mode as categorical.
Also, it'd be x_col and y_col respectively.
Keras automatically performs the one-hot encoding and does the rest.
You have used xcol and ycol instead of x_col and y_col which is causing this error
I have the following data frame my_df:
col_A col_B
---------------
John []
Mary ['A','B','C']
Ann ['B','C']
I want to delete the rows where col_B has an empty list. i.e. I want the new data frame to be:
col_A col_B
---------------
Mary ['A','B','C']
Ann ['B','C']
Below is what I did:
my_df[ len(my_df['col_B']) >0 ]
But I got the following errors:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2133 try:
-> 2134 return self._engine.get_loc(key)
2135 except KeyError:
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()
KeyError: True
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-27-75da0b0af6a1> in <module>()
----> 1 records_df_pair_count[ len(records_df_pair_count['stable_seq']) >0 ]
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in __getitem__(self, key)
2057 return self._getitem_multilevel(key)
2058 else:
-> 2059 return self._getitem_column(key)
2060
2061 def _getitem_column(self, key):
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _getitem_column(self, key)
2064 # get column
2065 if self.columns.is_unique:
-> 2066 return self._get_item_cache(key)
2067
2068 # duplicate columns & possible reduce dimensionality
/usr/local/lib/python3.4/dist-packages/pandas/core/generic.py in _get_item_cache(self, item)
1384 res = cache.get(item)
1385 if res is None:
-> 1386 values = self._data.get(item)
1387 res = self._box_item_values(item, values)
1388 cache[item] = res
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in get(self, item, fastpath)
3539
3540 if not isnull(item):
-> 3541 loc = self.items.get_loc(item)
3542 else:
3543 indexer = np.arange(len(self.items))[isnull(self.items)]
/usr/local/lib/python3.4/dist-packages/pandas/indexes/base.py in get_loc(self, key, method, tolerance)
2134 return self._engine.get_loc(key)
2135 except KeyError:
-> 2136 return self._engine.get_loc(self._maybe_cast_indexer(key))
2137
2138 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4164)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4028)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13166)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13120)()
KeyError: True
Any idea what I did wrong here? Thanks!
Another way to do this:
my_df[my_df['col_b'].apply(lambda x: len(x)) > 0]
You can use Series.str.len() method:
my_df[my_df['col_B'].str.len() > 0]
You already got a couple answers that correct the problem. But I thought I'd chime in with an explanation of why yours doesn't work.
This gives a pandas series:
my_df['col_B']
So this gives the length of the series:
len(my_df['col_B'])
Since you have a non-empty series, this evaluates to True:
len(my_df['col_B']) >0
And this:
my_df[ len(my_df['col_B']) >0 ]
evaluates to:
my_df[True]
And clearly my_df is not going to have True as a column index. Hence the KeyError.