I am new to python. I am using anaconda and trying to write some python code in it. I have written 2 lines of code in which I am trying to create a Series data from a dictonary
Hi #ApurvG there is no function called Series in native python.
If your question is about pandas series you can do it like this:
import pandas as pd
dictionary={'apurb':400}
series = pd.Series(dictionary)
Jupyter:-
salary = {'John': 5000, 'Rob': 6000, 'Wills':7500, 'Ashu': 5500}
salary
se3 = Series(salary)
NameError Traceback (most recent call last)
C:\Users\ADMINI~1\AppData\Local\Temp/ipykernel_13716/1803553183.py in
----> 1 se3 = Series(salary)
NameError: name 'Series' is not defined
import pandas as pd
se4 = pd.Series(salary)
se4
John 5000
Rob 6000
Wills 7500
Ashu 5500
dtype: int64
Related
I am trying to get my CSV processed with nlargest and I've run into this error. Any reasons as to why it could be? I'm trying to get my head around it but it just doesn't seem to go away.
import pandas as pd
from matplotlib import pyplot
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
from pandas import read_csv
from pandas.plotting import scatter_matrix
filename = '/Users/rahulparmeshwar/Documents/Algo Bots/Data/Live Data/Tester.csv'
data = pd.read_csv(filename)
columnname = 'Scores'
bestfeatures = SelectKBest(k='all')
y = data['Vol']
X = data.drop('Open',axis=1)
fit = bestfeatures.fit(X,y)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X.columns)
featurescores = pd.concat([dfscores,dfcolumns],axis=1)
print(featurescores.nlargest(5,[columnname]))
It gives me the error Scores the above exception was the direct cause of the following exception on the last line print(featurescores.nlargest(5,[columnname])). Can someone explain to me why this is happening? I've looked around and can't seem to figure this out.
EDIT: Full Error Stack:
Exception has occurred: KeyError 'Scores'
The above exception was the direct cause of the following exception:
File "C:\Users\mattr\OneDrive\Documents\Python AI\AI.py", line 19, in <module> print(featurescores.nlargest(2,'Scores'))
The exception KeyError means that the concatenated dataframe featurescores does not have a column with name "Scores".
The problem is the created DataFrames dfscores and dfcolumns for which no column names are defined explicitly, so their single column names will be the "default" 0.
That is, after the concatenation you get a DataFrame (featurescores) similar to this:
0 0
0 xxx col1_name
1 xxx col2_name
2 xxx col3_name
...
If you want to refer to the columns by name, you should define the column names explicitly as follows:
>>> dfscores = pd.DataFrame(fit.scores_, columns=["Scores"])
>>> dfcolumns = pd.DataFrame(X.columns, columns=["Features"])
>>> featurescores = pd.concat([dfscores,dfcolumns], axis=1)
>>> print(featurescores.nlargest(5, "Scores"))
Scores Features
0 xxx col_name1
1 xxx col_name2
2 xxx col_name3
...
If you want to use the features as index, here is a one liner:
>>> featurescores = pd.DataFrame(data=fit.scores_.transpose(), index=X.columns.transpose(), columns=["Scores"])
>>> print(featurescores)
Scores
col_name1 xxx
col_name2 xxx
col_name3 xxx
...
The error is
Traceback (most recent call last):
File "1.py", line 28, in <module>
buy.append(np.nan)
AttributeError: 'numpy.ndarray' object has no attribute 'nan'
Here the code its python 3
import fxcmpy
import socketio
from pylab import plt
import numpy as np
from finta import TA
TOKEN='xxxx'
con = fxcmpy.fxcmpy(access_token=TOKEN, log_level='error', server='real', log_file='log.txt')
#print(con.get_instruments())
data = con.get_candles('US30', period='D1', number=250)
con.close()
df1=data[['askopen','askhigh', 'asklow', 'askclose']]
plt.style.use('seaborn')
np=df1.to_numpy()
df2=df1.rename(columns={'askopen':'open','askhigh':'high','asklow':'low','askclose':'close'})
dfhma=TA.HMA(df2,14)
pr1=dfhma.shift(1)
pr2=dfhma.shift(2)
buy=[]
sell=[]
i=0
flag=''
for item in dfhma:
if item > pr1[i] and item > pr2[i] and flag!=1:
flag=1
buy.append(item)
else:
buy.append(np.nan)
if item < pr1[i] and item < pr2[i] and flag!=0:
flag=0
sell.append(item)
else:
sell.append(np.nan)
i=i+1
print(buy)
print('buy len='+str(len(buy)))
mk=[]
for item in dfhma:
print(item)
plt.plot(dfhma)
plt.scatter(dfhma.index,buy,marker='^',color='g')
plt.scatter(dfhma.index,sell,marker='v',color='r')
plt.show()
Search Google/Stackoverflow nothing was found and change nan to NaN,NAN still got the same error guessing its newbie error Help !! Just try to add NaN to the list as a buy/sell signal and it doesn't work what could be wrong here ?
In line 14 np=df1.to_numpy() you reassigned variable np from a package to a numpy array. So when you called np.nan it was searching nan from the numpy ndarray instance, not the package.
Change the variable to any other name and it will work fine.
python 3.7
A task. Add a new column in the received date frame based on two conditions:
if the value in the NET_NAME column is equal to one of the list and the value in the ECELL_TYPE column is LTE, then assign the value to the SHARING column from the ENODEB_NAME column.
import csv
import os
import pandas as pd
import datetime
import numpy as np
from time import gmtime, strftime
WCOUNT=strftime("%V", gmtime())
WCOUNT = int(WCOUNT)
WCOUNT_last = int(WCOUNT)-1
os.environ['NLS_LANG'] = 'Russian.AL32UTF8'
cell_file_list=pd.read_excel('cdt_config.xlsx',sheet_name ='cdt_config',index_col='para_name')
filial_name_list=pd.read_excel('FILIAL_NAME.xlsx')
gcell_file_name1=cell_file_list.para_value.loc['ucell_file_name']
ecell_file_name=cell_file_list.para_value.loc['ecell_file_name']
cols_simple=['RECDATE','REGION_PHOENIX_NAME','NET_NAME','CELL_NAME_IN_BSC','ENODEB_NAME','ECELL_TYPE','NRI_ADDRESS', 'NRI_BS_NUMBER','NRI_SITEID','STOPTIME', ]
cols_export=['GSM', 'UMTS', 'LTE', 'TOTAL', 'NWEEK', 'SHARING' ]
ecell_df=df = pd.read_csv(ecell_file_name, sep=",",encoding='cp1251',
dtype={'NRI_SITEID': str})
ecell_df=ecell_df.rename(columns={"RECDATE.DATE": "RECDATE"})
ecell_df=ecell_df.rename(columns={"ECELL_MNEMONIC": "CELL_NAME_IN_BSC"})
#replace ","
ecell_df.STOPTIME=pd.to_numeric(ecell_df.STOPTIME.replace(',', '', regex=True), errors='coerce')/3600
ecell_df=ecell_df[cols_simple]
#pivot ecell table
ecell_sum_df=pd.pivot_table(ecell_df,values='STOPTIME',index=['RECDATE','NRI_SITEID','REGION_PHOENIX_NAME','NET_NAME','ENODEB_NAME','ECELL_TYPE'],aggfunc='sum')
ecell_sum_df=ecell_sum_df.fillna(0)
#create a empty column with the same index as the pivot table.
ecell_export_df= pd.DataFrame(index=ecell_sum_df.index.copy())
ecell_export_df=ecell_export_df.assign(LTE=0)
ecell_export_df.LTE=ecell_sum_df.STOPTIME
ecell_export_df['SHARING'] = 0
ecell_export_df.SHARING.replace(ecell_export_df.NET_NAME in filial_name_list, ENODEB_NAME,inplace=True)
print(ecell_export_df)
#print (ecell_export_df)
del ecell_df
del ecell_sum_df
export_df=pd.concat([ecell_export_df],join='outer',axis=1)
export_df=export_df.fillna(0)
export_df['TOTAL'] = export_df.sum(axis=1)
export_df['NWEEK'] = WCOUNT_last
del ecell_export_df
#################################################
Below is the error message:
Traceback (most recent call last):
File "C:/Users/PycharmProjects/ReportCDT/CDT 4G_power pivot.py", line 43, in <module>
ecell_export_df.SHARING.replace(ecell_sum_df.NET_NAME in filial_name_list, ENODEB_NAME,inplace=True)
File "C:\Users\vavrumyantsev\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 5067, in __getattr__
eturn object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'NET_NAME'
Your traceback contains: DataFrame object has no attribute NET_NAME,
meaning actually that this DataFrame has no column of this name.
This message pertains to ecell_sum_df.NET_NAME (also contained in
the traceback), so let's look how you created this DataFrame (slightly
reformatted for readablity):
ecell_sum_df=pd.pivot_table(ecell_df, values='STOPTIME',\
index=['RECDATE', 'NRI_SITEID', 'REGION_PHOENIX_NAME', 'NET_NAME',
'ENODEB_NAME', 'ECELL_TYPE'], aggfunc='sum')
Note that NET_NAME is a part of the index list, so in the DataFrame
created it is a part of the MultiIndex, not an "ordinary" column.
So Python is right displaying this message.
Maybe you should move this level of the MultiIndex to a "normal" column?
I have the column of dates called 'Activity_Period' in this format '200507' which means July 2005 and I want to convert it to datetime format of ('Y'-'m') in python.
I tried to use the datetime.strp however it shows that the input has to be a string and not a series.
df.Activity_Period=datetime.strptime(df.Activity_Period, '%Y-%m')
The following is the error I get
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-ac32eb324a0b> in <module>
----> 1 df.Activity_Period=datetime.strptime(df.Activity_Period, '%Y-%m')
TypeError: strptime() argument 1 must be str, not Series
import datetime as dt
import pandas as pd
#simple example
timestamp = '200507'
result = dt.datetime.strptime(timestamp, '%Y%m')
print(result)
#Example using pandas series
series = pd.Series(['200507', '200508', '200509', '200510'])
series = pd.to_datetime(series, format='%Y%m')
print(series)
#for your DF
df['Activity_Period'] = pd.to_datetime(df['Activity_Period'], format='%Y%m')
Good morning,
I'm using python 3.6. I'm trying to name my index (see last line in code below) because I plan on joining to another DataFrame. The DataFrame should be multi-indexed. The index is the first two columns ('currency' and 'rtdate') and the data
rate
AUD 2010-01-01 0.897274
2010-02-01 0.896608
2010-03-01 0.895943
2010-04-01 0.895277
2010-05-01 0.894612
This is the code that I'm running:
import pandas as pd
import numpy as np
import datetime as dt
df=pd.read_csv('file.csv',index_col=0)
df.index = pd.to_datetime(df.index)
new_index = pd.date_range(df.index.min(),df.index.max(),freq='MS')
df=df.reindex(new_index)
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)
Running this throw's an error message:
KeyError: 'currency'
What am I missing.
Thanks for the assistance
I think you need to set the names of the levels of MultiIndex by using rename_axis first and then reset_index for columns from MultiIndex:
So you'd end up with this:
rate = df.interpolate().unstack().set_axis(('currency','rtdate')).reset_index()
instead of this:
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)