How do I write to excel using pandas?

How do I write to excel using pandas? - excel

I have tried two methods and I keep getting an error.
My Current Code:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
writer = pd.ExcelWriter('/Users/anonymous/Desktop/Version4.xlsx', engine='xlsxwriter').
df.to_excel(writer, sheet_name='Ex1')
writer.save()
I have the following error:
.... import xlsxwriter ModuleNotFoundError: No module named 'xlsxwriter'
I also tried not having an engine and then it had the same error except the module was openpyxl. I checked to make sure in my terminal and both of them are installed.

open command script. go to start, search for cmd, then install pip.
I was having this error, then I solve it by doing this from another solution:
pip install xlsxwriter

bro,
have you tried the simple
pd.read_excel()
pd.to_excel()
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Related

VS Code not recognizing variables while using numpy

Numpy is not working on VS Code
I tried using numpy on jupyter notebook. It worked over there.
import numpy s np
speed = [99, 54, 68, 90]
x = np.mean(speed)
print(x)
I am getting error that x is not defined.
I retried downloading numpy

ModuleNotFoundError: No module named 'pandas.lib'

from ggplot import mtcars
While importing mtcars dataset from ggplot on jupyter notebook i got this error
My system is windows 10 and I've already reinstalled and upgraded pandas (also used --user in installation command)but it didn't work out as well. Is there any other way to get rid of this error?
\Anaconda3\lib\site-packages\ggplot\stats\smoothers.py in
2 unicode_literals)
3 import numpy as np
----> 4 from pandas.lib import Timestamp
5 import pandas as pd
6 import statsmodels.api as sm
ModuleNotFoundError: No module named 'pandas.lib'

I Just tried out a way. I hope this works out for others as well. I changed from this
from pandas.lib import Timestamp
to this
from pandas._libs import Timestamp
as the path of the module is saved in path C:\Users\HP\Anaconda3\Lib\site-packages\pandas-libs
is _libs
Also, I changed from
date_types = (
pd.tslib.Timestamp,
pd.DatetimeIndex,
pd.Period,
pd.PeriodIndex,
datetime.datetime,
datetime.time
)
to this
date_types = (
pd._tslib.Timestamp,
pd.DatetimeIndex,
pd.Period,
pd.PeriodIndex,
datetime.datetime,
datetime.time
)
Before that, I went on this path "C:\Users\HP\Anaconda3\Lib\site-packages\ggplot\util.py" to make the same changes in util.py for date_types. This helped me out to get rid of the error I mentioned in my question.

gspread worksheet.update error - Worksheet has no attribute 'update'

I am trying to write a dataframe to an open Google Sheet in Google Colab, but am getting the error:
AttributeError: 'Worksheet' object has no attribute 'update'
I documented and tested the parts up to the error.
# General Imports
# Example at https://colab.research.google.com/notebooks/io.ipynb
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
#Import the library, authenticate, and create the interface to Sheets.
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
import numpy as np
import pandas as pd
# Load the DataFrame
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']]
print(dataframe)
# Open the Google Sheet
# This assumes that you have worksheet called "RS Output" with sheet "Data" on your Google Drive,
gc = gspread.authorize(GoogleCredentials.get_application_default())
my_workbook = 'RS Output'
my_sheet = "Data"
worksheet = gc.open(my_workbook).worksheet(my_sheet)
list_of_lists = worksheet.get_all_values()
print(list_of_lists)
# update the Google Sheet with the values from the Dataframe
# per gspread documentation at
# https://gspread.readthedocs.io/en/latest/user-guide.html
worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
This is the output:
[['A', 'B', 'C'], ['1', '2', '3'], ['Mary', 'Mai', 'Kat']]
[['Testing'], ['This']]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-132-e085af26b2ed> in <module>()
21 # https://gspread.readthedocs.io/en/latest/user-guide.html
22
---> 23 worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
AttributeError: 'Worksheet' object has no attribute 'update'
I can't seem to find a clear example of how to write the dataframe to a Google Sheet.
Thanks

I had same issue, this is my first time using colab.research.google.com notebook.
it turned out the default gspread module was version 3.0
!pip install --upgrade gspread!
Updated it to version 3.7 and fixed the missing .update() problem.
Found existing installation: gspread 3.0.1
Uninstalling gspread-3.0.1:
Successfully uninstalled gspread-3.0.1
Successfully installed gspread-3.7.0
Big thanks to: Shashank Rautela

AttributeError: 'Worksheet' object has no attribute 'update' means that the variable worksheet has no update attribute in it, you can verify that by adding print(dir(worksheet)) in your code.
If the variable has update attribute, it should print something like this:
Also, I tried to replicate your code and found some issues:
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']] is already a list. dataframe.columns.values.tolist() will give you error since the variable dataframe is a list and has no attribute columns. Using only the variable dataframe in the update method is enough since it is already a list of list. Here is an example of a dataframe: {'col1': [1, 2], 'col2': [3, 4]}.
Incorrect worksheet.update() usage. According to this document, the parameter of update() are range and values (list of list if the range contains multiple cells). The parameter of your update() method should look like this: worksheet.update("Range", data in form of list of list).
Here is an example on how to use the update() method:
Using List:
Code:
data = [["It" , "works!"]]
worksheet.update("A1:B1", data)
Before:
After:
Using panda's dataframe.
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
sh.update("A1:B1", [values])
Before:
After:
Based on how you used the update() method, you want to insert the column names
above the current data of worksheet. Instead of using update, you can use insert_rows()
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
worksheet.insert_rows([values], row=1, value_input_option='RAW')
Before:
After:
References:
insert_rows
update

I ran into the same issue on a Jupyter notebook running on a server (ubuntu 18.04) while it works fine using Pycharm on my local machine (ubuntu 20.04) instead.
Meanwhile, here's how I push my pandas dataframe to a google spreadsheet:
import string
# create a spreadsheet range that matches the size of the df (including 1 row for the column names). It looks like that: 'A1:AA3'
letters = list(string.ascii_uppercase)
col_names_spreadsheet = letters+list(np.array([[X+x for x in letters] for X in letters]).flat)
range_for_df = col_names_spreadsheet[0]+"1"+":"+col_names_spreadsheet[df.shape[1]-1]+str(df.shape[0]+1)
# retrieve the matching cells
cell_list = worksheet.range(range_for_df)
# flatten the df, add the column names at the beginning
cell_values = list(df.columns)+list(df.values.flat)
# set the value of each cell
for i, val in enumerate(cell_values): #gives us a tuple of an index and value
cell_list[i].value = val #use the index on cell_list and the val from cell_values
# update the cells in bulk
worksheet.update_cells(cell_list)
if the df has dates it may return this error
Object of type date is not JSON serializable
In this case I use this
# turn all datetime columns into strings
import datetime
dt_cols = list(df.columns[[type(df[col].iloc[0]) is datetime.date for col in df.columns]])
for c in dt_cols:
df[c] = df[c].apply(lambda x: x.isoformat())
credit to this guy for the trick: Python/gspread - how can I update multiple cells with DIFFERENT VALUES at once?

!pip install --upgrade gspread
upgrade the gspread lib with above command.
you will be able to call the update method.

If you are getting any kind of Attribute Error ( assuming that you have used correct syntax and correct attributes for gspread )
Then that is because you are using gspread's old version 3.0.1, if you haven't used gspread before in google colab then this is the standard version that comes pre-installed. Just do
!pip install --upgrade gspread
At the time of writing this gspread gets upgraded to version 3.7.1 with above command.
Happy Coding!

module 'seaborn' has no attribute 'distplot'

I've some code like:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.read_csv('StudentsPerformance.csv')
#print(data.isnull().sum()) // checking if there are some missing values or not
#print(data.dtypes)checking datatypes of the dataset
# ANALYSİS VALUES OF THE COLUMN'S
"""print(data['gender'].value_counts())
print(data['parental level of education'].value_counts())
print(data['race/ethnicity'].value_counts())
print(data['lunch'].value_counts())
print(data['test preparation course'].value_counts())"""
# Adding column total and average to the dataset
data['total'] = data['math score'] + data['reading score'] + data['writing score']
data['average'] = data ['total'] / 3
sns.distplot(data['average'])
I would like to see distplot of average for visualization but I run the program that gives me an error like
Traceback (most recent call last): File
"C:/Users/usersample/PycharmProjects/untitled1/sample.py", line 22, in
sns.distplot(data['average']) AttributeError: module 'seaborn' has no attribute 'distplot'
I've tried to reinstall and install seaborn and upgrade the seaborn to 0.9.0 but it doesn't work.
head of my data female,"group B","bachelor's
degree","standard","none","72","72","74" female,"group C","some
college","standard","completed","69","90","88" female,"group
B","master's degree","standard","none","90","95","93" male,"group
A","associate's degree","free/reduced","none","47","57","44"

this might be due to removal of paths in environment variables section. Try considering to add your IDE scripts and python folder. I am using pycharm IDE, and did the same and its working fine.

Exporting a matplotlib chart directly to Excel? Without saving to file first

I am trying to write a matplotlib image directly into an Excel file using xlsx writer - ideally without saving the file to the disk.
I have found this answer: Writing pandas/matplotlib image directly into XLSX file but I cannot get it to work, especially not with Python 3.
This: wks1.insert_image(2,2, imgdata doesn't work, because:
path should be string, bytes, os.PathLike or integer, not _io.BytesIO
This: wks1.insert_image(2,2,"",{'image data': imgdata})
gives a warning
Image file '' not found.
and produces an excel file with no chart.
What does work is saving the file locally - but I'd like to understand if there's a way to avoid it.
fig.savefig('test.png', format='png')
wks1.insert_image(2,2, 'test.png')
Thoughts? The full code is:
import xlsxwriter
import numpy as np
import matplotlib.pyplot as plt
import io
x=np.linspace(-10,10,100)
y=x**2
fig,ax=plt.subplots()
ax.plot(x,y)
workbook = xlsxwriter.Workbook('test chart.xlsx')
wks1=workbook.add_worksheet('Test chart')
wks1.write(0,0,'test')
imgdata=io.BytesIO()
fig.savefig(imgdata, format='png')
wks1.insert_image(2,2,"",{'image data': imgdata})
#wks1.insert_image(2,2, imgdata)
workbook.close()

According to the documentation (and the link to the question you provided), the correct spelling for the optional argument to be passed to insert_image is image_data (noting the underscore _)
import xlsxwriter
import numpy as np
import matplotlib.pyplot as plt
import io
x=np.linspace(-10,10,100)
y=x**2
fig,ax=plt.subplots()
ax.plot(x,y)
workbook = xlsxwriter.Workbook('test chart.xlsx')
wks1=workbook.add_worksheet('Test chart')
wks1.write(0,0,'test')
imgdata=io.BytesIO()
fig.savefig(imgdata, format='png')
wks1.insert_image(2,2, '', {'image_data': imgdata})
workbook.close()

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How do I write to excel using pandas? - excel

open command script. go to start, search for cmd, then install pip. I was having this error, then I solve it by doing this from another solution: pip install xlsxwriter

bro, have you tried the simple pd.read_excel() pd.to_excel() https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Related

VS Code not recognizing variables while using numpy

ModuleNotFoundError: No module named 'pandas.lib'

gspread worksheet.update error - Worksheet has no attribute 'update'

module 'seaborn' has no attribute 'distplot'

Exporting a matplotlib chart directly to Excel? Without saving to file first

Categories

Resources