AttributeError, whein writing back from ipython to Excel with pandas - excel

System:
Windows 7
Anaconda -> Spyder with 2.7.12 Python
I got this AttributeError:
File "<ipython-input-4-d258b656588d>", line 1, in <module>
runfile('C:/xxx/.spyder/pandas excel.py', wdir='C:/xxx/.spyder')
File "C:\xxx\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\xxx\Anaconda2\lib\site-packages\spyder\utils\site\sitecustomize.py", line 87, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "C:/xxx/.spyder/pandas excel.py", line 33, in <module>
moving_avg.to_excel(writer, sheet_name='Methodentest', startcol=12, startrow=38)
File "C:\xxx\Anaconda2\lib\site-packages\pandas\core\generic.py", line 2672, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'Series' object has no attribute 'to_excel'
This is my code:
import pandas as pd
#Adjustmend of Data for Date funtioniert nicht?
#parsen = lambda x: pd.datetime.strptime(x, '%Y-%m')
#Open new file object
xl = pd.ExcelFile('C:\xxx\Desktop\Beisspieldatensatz.xlsx')
#parse_dates={'Zeit': ['Jahr', 'Monat']}, index_col = 0, date_parser=parsen)
#Link to specific sheet
df = xl.parse('Methodentest')
#Narrow the data input
df2 = df[['Jahr', 'Monat', 'Umsatzmenge']]
#Establishment values under the year 2015
df3 = df2[(df2['Jahr']<2015)]
#Execute gleitender Mittelwert History 36 Month or 36 rows
moving_avg = pd.rolling_mean(df3["Umsatzmenge"],36)
print (moving_avg.head())
#Create a pandas excel writer
writer = pd.ExcelWriter(r'C:\xxx\Desktop\Beisspieldatensatz.xlsx', engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
moving_avg.to_excel(writer, sheet_name='Methodentest', startcol=12, startrow=38)
# Close the Pandas Excel writer and output the Excel file.
writer.save()
I want to read a data set in ipython from Excel. In the next step I want to "pare" my data, but this is not working?? (that's why I put this part in hashtag). After this I want do a mathematical method like here moving average for the next 18 month and store this information in moving_average.
My Data set start monthly from the 01.2012. Then the code must write back the new figures in Excel in specific row and column -> Here the error occurred.

I think you need to convert your Series back to a DataFrame before saving, try
moving_avg.to_frame().to_excel(writer, sheet_name='Methodentest', startcol=12, startrow=38)

Related

Error downloading historical stcok data using pandas_datareader Anconda3, Spyder 5.3.3

I have watch list of 30 stocks. The list is in a text file called "WatchList". I initialize the list as:
stock = []
and read the symbols line by line. I specify a location to store the data in csv format for each symbol.
I have the latest version of pandas_datareader and it is 0.10.0. I have used a while loop and pandas_datareader before. However, now I am experiencing problems. I receive the following error message:
runfile('E:/Stock_Price_Forecasting/NewStockPriceFetcher.py', wdir='E:/Stock_Price_Forecasting')
Enter the name of file to access WatchList
WatchList.txt
0 AAPL <class 'str'>
Traceback (most recent call last):
File "C:\Users\Om\anaconda3\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
exec(code, globals, locals)
File "e:\stock_price_forecasting\newstockpricefetcher.py", line 60, in
df = web.DataReader(stock[i], data_source='yahoo', start=start_date, end=end_date)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas\util_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\data.py", line 370, in DataReader
return YahooDailyReader(
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\base.py", line 253, in read
df = self._read_one_data(self.url, params=self._get_params(self.symbols))
File "C:\Users\Om\anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py", line 153, in _read_one_data
data = j["context"]["dispatcher"]["stores"]["HistoricalPriceStore"]
TypeError: string indices must be integers
The portion of my code that shows the while loop is shown below:
i = 0
while i < len(stock):
print(i, stock[i], type(stock[i]))
# Format the filename for each security to use in full path
stock_data_file = stock[i] + '.csv'
# Complete the path definition for stock data storage including filename
full_file_path = (file_path/stock_data_file)
# Specify the order for the columns
columnTitles = ('Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close')
# Pull the data for the stock from the Web
df = web.DataReader(stock[i], data_source='yahoo', start=start_date,
end=end_date) ** error in this line!!
# Reorder the columns for plotting Candlesticks
df=df.reindex(columns=columnTitles)
if i == 0:
df.to_csv(full_file_path)
print(i, stock[i], 'has data stored to csv file')
else:
df.to_csv(full_file_path, header=True)
print(i, stock[i], 'has data stored to csv file')
i += 1
I have looked at the parameter requirements for the Datareader and Yahoo. I belive the first paramataer is the ticker and a string value. I have been unable to find out where I am making a mistake. Any suggestions in solving this issue would be greatly appreciated. Thank you.

how to filter a particular column with python pandas?

I have an excel file where I have 2 columns: 'Name' and 'size'. The 'Name' column has multiple file types, namely ".apk, .dat, .vdex, .ttc" etc. But I only want to populate the files with the file extension ending with .apk. I do not want any other file type in the new excel file.
I have written the below code:
import pandas as pd
import json
def json_to_excel():
with open('installed-files.json') as jf:
data = json.load(jf)
df = pd.DataFrame(data)
new_df = df[df.columns.difference(['SHA256'])]
new_xl = new_df.to_excel('abc.xlsx')
return new_xl
def filter_apk(): `MODIFIED CODE`
old_xl = json_to_excel()
data = pd.read_excel(old_xl)
a = data[data["Name"].str.contains("\.apk")]
a.to_excel('zybg.xlsx')
Above program does following:
json_to_excel(), takes a Json file, converts it to a .xlsx format and save.
filter_apk() is suppose to create multiple excel file based on the file extension present in "Name" column.
1st function is doing what I intend to.
2nd function is not doing anything. Neither its throwing any error. I have followed this weblink
Below are the few samples of the "name" column
/system/product/<Path_to>/abc.apk
/system/fonts/wwwr.ttc
/system/framework/framework.jar
/system/<Path_to>/icu.dat
/system/<Path_to>/Normal.apk
/system/<Path_to>/Tv.apk
How to get that working? Or is there a better way to achieve the objective?
Please suggest.
ERROR
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'NoneType'>
Note:
I have all the files at the same location.
modified code:
import pandas as pd
import json
def json_to_excel():
with open('installed-files.json') as jf:
data = json.load(jf)
df = pd.DataFrame(data)
new_df = df[df.columns.difference(['SHA256'])]
new_df.to_excel('abc.xlsx')
def filter_apk():
json_to_excel()
old_xl = pd.read_excel('abc.xlsx')
data = pd.read_excel(old_xl)
a = data[data["Name"].str.contains("\.apk")]
a.to_excel('zybg.xlsx')
t = filter_apk()
print(t)
New error:
Traceback (most recent call last):
File "C:/Users/amitesh.sahay/PycharmProjects/work_allocation/TASKS/Jenkins.py", line 89, in <module>
t = filter_apk()
File "C:/Users/amitesh.sahay/PycharmProjects/work_allocation/TASKS/Jenkins.py", line 84, in filter_apk
data = pd.read_excel(old_xl)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 304, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 867, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_xlrd.py", line 22, in __init__
super().__init__(filepath_or_buffer)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel\_base.py", line 344, in __init__
filepath_or_buffer, _, _, _ = get_filepath_or_buffer(filepath_or_buffer)
File "C:\Users\amitesh.sahay\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 243, in get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'pandas.core.frame.DataFrame'>
There is a difference between your use-case and use-case shown in the weblink. You want to apply a single filter (apk files), whereas the example you saw had multiple filters which were to be applied one after another (multiple species).
This will do the trick.
def filter_apk():
old_xl = json_to_excel()
data = pd.read_excel(old_xl)
a = data[data["Name"].str.contains("\.apk")]
a.to_excel("<path_to_new_excel>\\new_excel_name.xlsx")
Regarding your new updated question. I guess your first function is not working as you think it is working.
new_xl = new_df.to_excel('abc.xlsx')
This will write an excel file, as you are expecting it to do. Which works.
However, assigning it to new_xl, does not do anything since there is no return on pd.to_excel. So when you return new_xl as output of your json_to_excel function, you actually return None. Therefore in your second function, old_xl = json_to_excel() will make old_xl have the value None.
So, your functions should be something like this:
def json_to_excel():
with open('installed-files.json') as jf:
data = json.load(jf)
df = pd.DataFrame(data)
new_df = df[df.columns.difference(['SHA256'])]
new_df.to_excel('abc.xlsx')
def filter_apk():
json_to_excel()
data= pd.read_excel('abc.xlsx')
a = data[data["Name"].str.contains("\.apk")]
a.to_excel('zybg.xlsx')

Read int values from a column in excel sheet using XLRD

I have a cell in an excel workbook with comma separated values.
This cell can have values with following pattern.
0 or 123 or 123, 345.
I want to extract them as list of integers using XLRD or pandas.read_excel.
I have tried using xlrd with the following snippet.
book = open_workbook(args.path)
dep_cms = book.sheet_by_index(1)
for row_index in range(1, dep_cms.nrows)
excelList = []
excelList.extend([x.strip() for x in dep_cms.cell(row_index, 8).value.split(',')])
I have even tried pandas
excel_frame = read_excel(args.path, sheet_name=2, skiprows=1, verbose=True, na_filter=False)
data_need = excel_frame['Dependent CMS IDS'].tolist()
print(data_need)
But got the list index is out of range.
Reading sheet 2
Traceback (most recent call last):
File "ExcelCellCSVRead.py", line 25, in <module>
excel_frame = read_excel(args.path, sheet_name=2, skiprows=1, verbose=True, na_filter=False)
File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 311, in read_excel
return io.parse(
File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 868, in parse
return self._reader.parse(
File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_base.py", line 441, in parse
sheet = self.get_sheet_by_index(asheetname)
File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_xlrd.py", line 46, in get_sheet_by_index
return self.book.sheet_by_index(index)
File "C:\Users\Kris\AppData\Local\Programs\Python\Python38-32\lib\site-packages\xlrd\book.py", line 466, in sheet_by_index
return self._sheet_list[sheetx] or self.get_sheet(sheetx)
IndexError: list index out of range
It is not working with single value in a cell (for example, just 0 or some value like 123). It is outputting AttributeError: 'float' object has no attribute 'split'.
It only works if I have comma separated values, and converts them into list of strings like ['123', '345']. I guess split condition is the culprit.
How to extract the values of this cell using XLRD or pandas to a list of integers?
Regards
Comma seperated value (CSV) cannot be compaired to excel during importing.
Instead of using read_excel you can use read_csv.
below is the code snippet that how your code will look like after applying read_csv
Import Pandas as pd
df = pd.read_csv("your file name.csv")
data_need = df["Column_name"].tolist()

What is causing this issue when trying to get yahoo_fin to return prices for a list of tickers?

I have a list of tickers that I want to retrieve the prices for by running the following:
from yahoo_fin import stock_info as si
for x in watchlist:
print(si.get_live_price(x))
When I run this I get the following error:
File "", line 1, in
runfile('C:/Users/User/OneDrive/Documents/Stuff/fluff 2.py', wdir='C:/Users/User/OneDrive/Documents/Stuff')
File
"D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 705, in runfile
execfile(filename, namespace)
File
"D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/User/OneDrive/Documents/Stuff/fluff 2.py", line 46,
in
print(si.get_live_price(x))
File "D:\Anaconda3\lib\site-packages\yahoo_fin\stock_info.py", line
338, in get_live_price
df = get_data(ticker, end_date = pd.Timestamp.today() + pd.DateOffset(10))
File "D:\Anaconda3\lib\site-packages\yahoo_fin\stock_info.py", line
68, in get_data
temp = loads(needed)
ValueError: Expected object or value
However, when I refer to a ticker directly, it runs normally:
print(si.get_live_price('tsla'))
348.8399963378906
What could be causing this issue? Is it due to me using a different html parser than that used with yahoo_fin in an earlier part of the code?
Try this out, It gives you complete dataframe for last 6 months data
import yfinance as yf
for x in ['TSLA','AAPL']:
data = yf.download( tickers = x)
print(data['Close'][-1])
Output :
348.8399963378906
268.4800109863281
If you want last 6 month data then you can store individual dataframe. In above case I have printed only last index as you wanted LTP.
This issue should be fixed now in the latest version of yahoo_fin (0.8.4). It was due to a change in Yahoo Finance's structure. See here for news about recent updates: http://theautomatic.net/2019/12/16/updates-to-yahoo_fin-package/

python pandas merging excel sheets not working

I'm trying to merge two excel sheets using the common filed Serial but throwing some errors. My program is as below :
(user1_env)root#ubuntu:~/user1/test/compare_files# cat compare.py
import pandas as pd
source1_df = pd.read_excel('a.xlsx', sheetname='source1')
source2_df = pd.read_excel('a.xlsx', sheetname='source2')
joined_df = source1_df.join(source2_df, on='Serial')
joined_df.to_excel('/root/user1/test/compare_files/result.xlsx')
getting error as below :
(user1_env)root#ubuntu:~/user1/test/compare_files# python3.5 compare.py
Traceback (most recent call last):
File "compare.py", line 5, in <module>
joined_df = source1_df.join(source2_df, on='Serial')
File "/home/user1/miniconda3/envs/user1_env/lib/python3.5/site-packages/pandas/core/frame.py", line 4385, in join
rsuffix=rsuffix, sort=sort)
File "/home/user1/miniconda3/envs/user1_env/lib/python3.5/site-packages/pandas/core/frame.py", line 4399, in _join_compat
suffixes=(lsuffix, rsuffix), sort=sort)
File "/home/user1/miniconda3/envs/user1_env/lib/python3.5/site-packages/pandas/tools/merge.py", line 39, in merge
return op.get_result()
File "/home/user1/miniconda3/envs/user1_env/lib/python3.5/site-packages/pandas/tools/merge.py", line 223, in get_result
rdata.items, rsuf)
File "/home/user1/miniconda3/envs/user1_env/lib/python3.5/site-packages/pandas/core/internals.py", line 4445, in items_overlap_with_suffix
to_rename)
ValueError: columns overlap but no suffix specified: Index(['Serial'], dtype='object')
I'm referring below SO link for the issue :
python compare two excel sheet and append correct record
Small modification worked for me,
import pandas as pd
source1_df = pd.read_excel('a.xlsx', sheetname='source1')
source2_df = pd.read_excel('a.xlsx', sheetname='source2')
joined_df = pd.merge(source1_df,source2_df,on='Serial',how='outer')
joined_df.to_excel('/home/gk/test/result.xlsx')
It is because of the overlapping column names after join. You can either set your index to Serial and join, or specify a rsuffix= or lsuffix= value in your join function so that the suffix value would be appended to the common column names.

Resources