gspread worksheet.update error - Worksheet has no attribute 'update' - python-3.x

I am trying to write a dataframe to an open Google Sheet in Google Colab, but am getting the error:
AttributeError: 'Worksheet' object has no attribute 'update'
I documented and tested the parts up to the error.
# General Imports
# Example at https://colab.research.google.com/notebooks/io.ipynb
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
#Import the library, authenticate, and create the interface to Sheets.
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
import numpy as np
import pandas as pd
# Load the DataFrame
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']]
print(dataframe)
# Open the Google Sheet
# This assumes that you have worksheet called "RS Output" with sheet "Data" on your Google Drive,
gc = gspread.authorize(GoogleCredentials.get_application_default())
my_workbook = 'RS Output'
my_sheet = "Data"
worksheet = gc.open(my_workbook).worksheet(my_sheet)
list_of_lists = worksheet.get_all_values()
print(list_of_lists)
# update the Google Sheet with the values from the Dataframe
# per gspread documentation at
# https://gspread.readthedocs.io/en/latest/user-guide.html
worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
This is the output:
[['A', 'B', 'C'], ['1', '2', '3'], ['Mary', 'Mai', 'Kat']]
[['Testing'], ['This']]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-132-e085af26b2ed> in <module>()
21 # https://gspread.readthedocs.io/en/latest/user-guide.html
22
---> 23 worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
AttributeError: 'Worksheet' object has no attribute 'update'
I can't seem to find a clear example of how to write the dataframe to a Google Sheet.
Thanks

I had same issue, this is my first time using colab.research.google.com notebook.
it turned out the default gspread module was version 3.0
!pip install --upgrade gspread!
Updated it to version 3.7 and fixed the missing .update() problem.
Found existing installation: gspread 3.0.1
Uninstalling gspread-3.0.1:
Successfully uninstalled gspread-3.0.1
Successfully installed gspread-3.7.0
Big thanks to: Shashank Rautela

AttributeError: 'Worksheet' object has no attribute 'update' means that the variable worksheet has no update attribute in it, you can verify that by adding print(dir(worksheet)) in your code.
If the variable has update attribute, it should print something like this:
Also, I tried to replicate your code and found some issues:
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']] is already a list. dataframe.columns.values.tolist() will give you error since the variable dataframe is a list and has no attribute columns. Using only the variable dataframe in the update method is enough since it is already a list of list. Here is an example of a dataframe: {'col1': [1, 2], 'col2': [3, 4]}.
Incorrect worksheet.update() usage. According to this document, the parameter of update() are range and values (list of list if the range contains multiple cells). The parameter of your update() method should look like this: worksheet.update("Range", data in form of list of list).
Here is an example on how to use the update() method:
Using List:
Code:
data = [["It" , "works!"]]
worksheet.update("A1:B1", data)
Before:
After:
Using panda's dataframe.
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
sh.update("A1:B1", [values])
Before:
After:
Based on how you used the update() method, you want to insert the column names
above the current data of worksheet. Instead of using update, you can use insert_rows()
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
worksheet.insert_rows([values], row=1, value_input_option='RAW')
Before:
After:
References:
insert_rows
update

I ran into the same issue on a Jupyter notebook running on a server (ubuntu 18.04) while it works fine using Pycharm on my local machine (ubuntu 20.04) instead.
Meanwhile, here's how I push my pandas dataframe to a google spreadsheet:
import string
# create a spreadsheet range that matches the size of the df (including 1 row for the column names). It looks like that: 'A1:AA3'
letters = list(string.ascii_uppercase)
col_names_spreadsheet = letters+list(np.array([[X+x for x in letters] for X in letters]).flat)
range_for_df = col_names_spreadsheet[0]+"1"+":"+col_names_spreadsheet[df.shape[1]-1]+str(df.shape[0]+1)
# retrieve the matching cells
cell_list = worksheet.range(range_for_df)
# flatten the df, add the column names at the beginning
cell_values = list(df.columns)+list(df.values.flat)
# set the value of each cell
for i, val in enumerate(cell_values): #gives us a tuple of an index and value
cell_list[i].value = val #use the index on cell_list and the val from cell_values
# update the cells in bulk
worksheet.update_cells(cell_list)
if the df has dates it may return this error
Object of type date is not JSON serializable
In this case I use this
# turn all datetime columns into strings
import datetime
dt_cols = list(df.columns[[type(df[col].iloc[0]) is datetime.date for col in df.columns]])
for c in dt_cols:
df[c] = df[c].apply(lambda x: x.isoformat())
credit to this guy for the trick: Python/gspread - how can I update multiple cells with DIFFERENT VALUES at once?

!pip install --upgrade gspread
upgrade the gspread lib with above command.
you will be able to call the update method.

If you are getting any kind of Attribute Error ( assuming that you have used correct syntax and correct attributes for gspread )
Then that is because you are using gspread's old version 3.0.1, if you haven't used gspread before in google colab then this is the standard version that comes pre-installed. Just do
!pip install --upgrade gspread
At the time of writing this gspread gets upgraded to version 3.7.1 with above command.
Happy Coding!

Related

Convert a dataframe column to english text

I am trying to convert a column to english , but i get
AttributeError: 'NoneType' object has no attribute 'group'.
Here is my code:
from googletrans import Translator
translator = Translator()
df['Name'] = df['Name'].apply(translator.translate, dest='en')
Name
สวัสดีจีน
日本国)
日本の会社
It appears that some of the newer versions of this library have known issues. Please run this code below to install a working version and restart your kernel:
pip install googletrans==3.1.0a0
#this also may work for a working newer version:
pip install googletrans==4.0.0-rc1
Then, run the below code to confirm it is working. This solved for me. CREDIT to this answer (Moritz's Answer):
import pandas as pd
from googletrans import Translator
df = pd.DataFrame({'Name': {0: 'สวัสดีจีน', 1: '日本国)', 2: '日本の会社'}})
translator = Translator()
df['Name2'] = df['Name'].apply(lambda x: translator.translate(x, dest='en').text)
df
Out[1]:
Name Name2
0 สวัสดีจีน hello china
1 日本国) Japan)
2 日本の会社 Japanese company

Dask client.persist returns AssertionError when I try to use HashingVectorizer

I am trying to vectorize the dask.dataframe with dask HashingVectorizer. I want the vectorization results to stay in the cluster (distributed system). That's why I am using client.persist when I try to transform the data. But for some reason, I am getting the error below.
Traceback (most recent call last):
File "/home/dodzilla/my_project/components_with_adapter/vectorizers/base_vectorizer.py", line 112, in hybrid_feature_vectorizer
CLUSTERING_FEATURES=self.clustering_features)
File "/home/dodzilla/my_project/components_with_adapter/vectorizers/text_vectorizer.py", line 143, in vectorize
X = self.client.persist(fitted_vectorizer.transform, combined_data)
File "/home/dodzilla/.local/lib/python3.6/site-packages/distributed/client.py", line 2860, in persist
assert all(map(dask.is_dask_collection, collections))
AssertionError
I can't share the data but all of the necessary information about the data is as below:
>>>type(combined_data)
<class 'dask.dataframe.core.Series'>
>>>type(combined_data.compute())
<class 'pandas.core.series.Series'>
>>>combined_data.compute().shape
12
A minimal working example can be found below. Below, in the code snippet, combined_data holds the merged columns. Meaning: all of the columns are merged into 1 column. Data has 12 rows. All of the values inside the rows are string. This is the code where I am getting the error:
from stop_words import get_stop_words
from dask_ml.feature_extraction.text import HashingVectorizer as daskHashingVectorizer
import pandas as pd
import dask
import dask.dataframe as dd
from dask.distributed import Client
def convert_dataframe_to_single_text(documents):
"""
Combine all of the columns into 1 column.
"""
if type(documents) is dask.dataframe.core.DataFrame:
cols = documents.columns
documents['combined'] = documents[cols].apply(func=(lambda row: ' '.join(row.values.astype(str))), axis=1,
meta=('str'))
document_texts = documents.drop(cols, axis=1)
else:
raise TypeError('Wrong type of data. Expected Pandas DF or Dask DF but received ', type(documents))
return document_texts
# Init the client.
client = Client('localhost:8786')
# Get stopwords
stopwords = get_stop_words(language="english")
# Create dask dataframe from pandas dataframe
data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':["twenty", "twentyone", "nineteen", "eighteen"]}
df = pd.DataFrame(data)
df = dd.from_pandas(df, npartitions=1)
# Init the vectorizer
vectorizer = daskHashingVectorizer(stop_words=stopwords, alternate_sign=False,
norm=None, binary=False,
n_features=10000)
# Combine all of to columns into 1 column.
combined_data = convert_dataframe_to_single_text(df)
# Fit the vectorizer.
fitted_vectorizer = client.persist(vectorizer.fit(combined_data))
# Transform the data.
X = client.persist(fitted_vectorizer.transform, combined_data)
I hope the information is enough.
Important note: I am not getting any kind of error when I say client.compute but from what I understand this doesn't work in the cluster of machines and instead it runs in the local machine. And it returns a csr matrix instead of a lazily evaluated dask.array.
This is not how I was supposed to use client.persist. Functions I was looking for are client.submit and client.map... In my case client.submit solved my issue.

Pandas column names begin from index column when running on visual code jupyter environment

I have written a very simple code that creates a pandas data frame. The issue is when I do my column naming ['X','Y'], my column heading X passes itself on to the column containing the index values. This only occurs when you run the code in jupyter environment, initiated in visual code by #%%. When you run the same code in the terminal the results are accurate. Shown in image below. Any ideas why?
#%%
import pandas as pd
from matplotlib import pyplot as plt
data_1 = {'X': [1.0,2.0,3.0], 'Y': [1.0,2.5,3.5]}
df_1 = pd.DataFrame(data_1)
print(df_1)
I have also tried other methods, but the result is the same
data_1 = [(1.0,1.0),(2.0,2.5),(3.0,3.5)]
df_1 = pd.DataFrame(data_1, columns = ['X','Y']
It's the default of the plugin and affects only visually... does not affect the functionallity

Trying to name multiple indexes getting Key error

Good morning,
I'm using python 3.6. I'm trying to name my index (see last line in code below) because I plan on joining to another DataFrame. The DataFrame should be multi-indexed. The index is the first two columns ('currency' and 'rtdate') and the data
rate
AUD 2010-01-01 0.897274
2010-02-01 0.896608
2010-03-01 0.895943
2010-04-01 0.895277
2010-05-01 0.894612
This is the code that I'm running:
import pandas as pd
import numpy as np
import datetime as dt
df=pd.read_csv('file.csv',index_col=0)
df.index = pd.to_datetime(df.index)
new_index = pd.date_range(df.index.min(),df.index.max(),freq='MS')
df=df.reindex(new_index)
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)
Running this throw's an error message:
KeyError: 'currency'
What am I missing.
Thanks for the assistance
I think you need to set the names of the levels of MultiIndex by using rename_axis first and then reset_index for columns from MultiIndex:
So you'd end up with this:
rate = df.interpolate().unstack().set_axis(('currency','rtdate')).reset_index()
instead of this:
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)

xlwings vs pandas native export with multi index dataframes: how to reconcile?

I'll start with an image for it explains clearly the problem with exporting multindexes with pandas native export and xlwings
essentially, I'd like the 'Pandas native' result [the multiindex properly exported to excel] done with Xlwings because I have a bunch of other features that XlWings can do and others cannot (no not even if use ExcelWriter and the like because I have to clear the sheet and I have a non python thing inserted in the same sheet which would be cleared as well when initialized)
the code used:
import pandas as pd
import numpy as np
import xlwings as xw
import os
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.DataFrame(np.random.randn(8, 8), index=index).transpose()
print(s)
# EXPORT
filename = 'format_excel_export.xlsx'
s.to_excel(filename)
outpath = os.path.join(os.path.abspath(os.path.dirname(__file__)), filename)
os.path.sep = r'/'
wb = xw.Workbook(outpath)
xw.Range('Sheet1', 'A13').value = s
You could implement a custom converter that formats it the way that Pandas does, see here.
However, as of v0.7.2, xlwings doesn't yet natively support bolding fonts, merging cells and cell borders. You can get around this by falling back to pywin32 (on Windows), see here.
In essence, your custom converter needs to override the write_value method, see here.
It does make sense to build this into the library at some point, so I've opened an issue, see here.

Resources