Convert a dataframe column to english text - python-3.x

I am trying to convert a column to english , but i get
AttributeError: 'NoneType' object has no attribute 'group'.
Here is my code:
from googletrans import Translator
translator = Translator()
df['Name'] = df['Name'].apply(translator.translate, dest='en')
Name
สวัสดีจีน
日本国)
日本の会社

It appears that some of the newer versions of this library have known issues. Please run this code below to install a working version and restart your kernel:
pip install googletrans==3.1.0a0
#this also may work for a working newer version:
pip install googletrans==4.0.0-rc1
Then, run the below code to confirm it is working. This solved for me. CREDIT to this answer (Moritz's Answer):
import pandas as pd
from googletrans import Translator
df = pd.DataFrame({'Name': {0: 'สวัสดีจีน', 1: '日本国)', 2: '日本の会社'}})
translator = Translator()
df['Name2'] = df['Name'].apply(lambda x: translator.translate(x, dest='en').text)
df
Out[1]:
Name Name2
0 สวัสดีจีน hello china
1 日本国) Japan)
2 日本の会社 Japanese company

Related

gspread worksheet.update error - Worksheet has no attribute 'update'

I am trying to write a dataframe to an open Google Sheet in Google Colab, but am getting the error:
AttributeError: 'Worksheet' object has no attribute 'update'
I documented and tested the parts up to the error.
# General Imports
# Example at https://colab.research.google.com/notebooks/io.ipynb
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
#Import the library, authenticate, and create the interface to Sheets.
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
import numpy as np
import pandas as pd
# Load the DataFrame
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']]
print(dataframe)
# Open the Google Sheet
# This assumes that you have worksheet called "RS Output" with sheet "Data" on your Google Drive,
gc = gspread.authorize(GoogleCredentials.get_application_default())
my_workbook = 'RS Output'
my_sheet = "Data"
worksheet = gc.open(my_workbook).worksheet(my_sheet)
list_of_lists = worksheet.get_all_values()
print(list_of_lists)
# update the Google Sheet with the values from the Dataframe
# per gspread documentation at
# https://gspread.readthedocs.io/en/latest/user-guide.html
worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
This is the output:
[['A', 'B', 'C'], ['1', '2', '3'], ['Mary', 'Mai', 'Kat']]
[['Testing'], ['This']]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-132-e085af26b2ed> in <module>()
21 # https://gspread.readthedocs.io/en/latest/user-guide.html
22
---> 23 worksheet.update([dataframe.columns.values.tolist()] + worksheet.values.tolist())
AttributeError: 'Worksheet' object has no attribute 'update'
I can't seem to find a clear example of how to write the dataframe to a Google Sheet.
Thanks
I had same issue, this is my first time using colab.research.google.com notebook.
it turned out the default gspread module was version 3.0
!pip install --upgrade gspread!
Updated it to version 3.7 and fixed the missing .update() problem.
Found existing installation: gspread 3.0.1
Uninstalling gspread-3.0.1:
Successfully uninstalled gspread-3.0.1
Successfully installed gspread-3.7.0
Big thanks to: Shashank Rautela
AttributeError: 'Worksheet' object has no attribute 'update' means that the variable worksheet has no update attribute in it, you can verify that by adding print(dir(worksheet)) in your code.
If the variable has update attribute, it should print something like this:
Also, I tried to replicate your code and found some issues:
dataframe = [['A', 'B', 'C'], ['1', '2' ,'3'], ['Mary', 'Mai', 'Kat']] is already a list. dataframe.columns.values.tolist() will give you error since the variable dataframe is a list and has no attribute columns. Using only the variable dataframe in the update method is enough since it is already a list of list. Here is an example of a dataframe: {'col1': [1, 2], 'col2': [3, 4]}.
Incorrect worksheet.update() usage. According to this document, the parameter of update() are range and values (list of list if the range contains multiple cells). The parameter of your update() method should look like this: worksheet.update("Range", data in form of list of list).
Here is an example on how to use the update() method:
Using List:
Code:
data = [["It" , "works!"]]
worksheet.update("A1:B1", data)
Before:
After:
Using panda's dataframe.
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
sh.update("A1:B1", [values])
Before:
After:
Based on how you used the update() method, you want to insert the column names
above the current data of worksheet. Instead of using update, you can use insert_rows()
Code:
df = pd.DataFrame({'Name': ['A', 'B', 'C'], 'Age': [20, 19, 23]})
values = df.columns.values.tolist()
worksheet.insert_rows([values], row=1, value_input_option='RAW')
Before:
After:
References:
insert_rows
update
I ran into the same issue on a Jupyter notebook running on a server (ubuntu 18.04) while it works fine using Pycharm on my local machine (ubuntu 20.04) instead.
Meanwhile, here's how I push my pandas dataframe to a google spreadsheet:
import string
# create a spreadsheet range that matches the size of the df (including 1 row for the column names). It looks like that: 'A1:AA3'
letters = list(string.ascii_uppercase)
col_names_spreadsheet = letters+list(np.array([[X+x for x in letters] for X in letters]).flat)
range_for_df = col_names_spreadsheet[0]+"1"+":"+col_names_spreadsheet[df.shape[1]-1]+str(df.shape[0]+1)
# retrieve the matching cells
cell_list = worksheet.range(range_for_df)
# flatten the df, add the column names at the beginning
cell_values = list(df.columns)+list(df.values.flat)
# set the value of each cell
for i, val in enumerate(cell_values): #gives us a tuple of an index and value
cell_list[i].value = val #use the index on cell_list and the val from cell_values
# update the cells in bulk
worksheet.update_cells(cell_list)
if the df has dates it may return this error
Object of type date is not JSON serializable
In this case I use this
# turn all datetime columns into strings
import datetime
dt_cols = list(df.columns[[type(df[col].iloc[0]) is datetime.date for col in df.columns]])
for c in dt_cols:
df[c] = df[c].apply(lambda x: x.isoformat())
credit to this guy for the trick: Python/gspread - how can I update multiple cells with DIFFERENT VALUES at once?
!pip install --upgrade gspread
upgrade the gspread lib with above command.
you will be able to call the update method.
If you are getting any kind of Attribute Error ( assuming that you have used correct syntax and correct attributes for gspread )
Then that is because you are using gspread's old version 3.0.1, if you haven't used gspread before in google colab then this is the standard version that comes pre-installed. Just do
!pip install --upgrade gspread
At the time of writing this gspread gets upgraded to version 3.7.1 with above command.
Happy Coding!

Why am I unable to use na_values in Pandas?

I don't know why na_values is not changing the values with "$-" to NaN. I have manually entered the $- in the file and there are no spaces.
import pandas as pd
df=pd.read_csv('discounted_products.csv',na_values = ['$-'])
df.head()
enter image description here
Please help here.
It could be because pandas works with regex by default on string methods (albeit it is not mentioned in the specific read_csv documentation). Try
na_values = [r'$-']
Update
It worked fine for me
from io import StringIO
df = pd.read_csv(StringIO(
'''a,b
test,$-'''), na_values=[r'$-'])
print(df)
a b
0 test NaN

How to fix "module 'pandas' has no attribute 'Int64Dtype' " error in python?

I am trying to execute a code, but i am getting an error.
import pandas as pd
import numpy as np
pd.Series([1, 2, np.nan, 4], dtype=pd.Int64Dtype())
I expect the output as:
Out[14]:
0 1
1 2
2 NaN
3 4
dtype: Int64
It might be due to wrong installation or older version as I tried the same and it's working
try using np.int64 by importing numpy as np
or you can also try by typing pip install --upgrade pandas in your terminal or command prompt

(Python3.6 using cx_Freeze) Exe does not run pandas, numpy application

I wrote a .py script called Expiration_Report.py using the following libraries: pandas, numpy. This code runs perfectly fine when executed in Spyder(python 3.6).
(Using Anaconda for everything)
I then created another .py file called 'setup.py' with the following code in order to convert Expiration_Report.py to Expiration_Report.exe:
import sys
from cx_Freeze import setup, Executable
# Dependencies are automatically detected, but it might need fine tuning.
build_exe_options = {"packages": ["os"],
"excludes": ["tkinter"]}
# GUI applications require a different base on Windows (the default is for a
# console application).
base = None
if sys.platform == "win32":
base = "console"
setup( name = "my prog",
version = "1.0",
description = "My application!",
options = {"build_exe": build_exe_options},
executables = [Executable("Expiration_Report.py", base = base)])
Then in the command prompt I write:
python setup.py build
It builds without any errors. And the build folder is available with the .exe file as well. However, when I run the .exe file from the build folder: nothing happens.
Here is the code from the Expiration_Report.py script:
import pandas as pd
import numpy as np
df = pd.read_excel('C:/Users/Salman/Desktop/WIP Board - 007.xlsx', index_col=None, na_values=['NA'])
df.columns = df.iloc[12]
df.columns
df.shape
df = df.dropna(axis=1, how = 'all')
df
df.columns
df1 = df.copy()
df1 = df1.iloc[13:]
df1
df1 = df1.dropna(axis=1, how = 'all')
df1.shape
from datetime import datetime
print(str(datetime.now()))
df2 = df1.copy()
df2["Now_Time"] = pd.Series([datetime.now()] * (13+len(df1)))
df2["Now_Time"]
df2
df2.fillna(value='NaN')
df2 = df2.dropna(how='any')
df2.shape
df3 = df2.copy()
df3 = df3[df3.Size>0]
df3['Lot Expiration Date'] = pd.to_datetime(df3['Lot Expiration Date'])
df3['Days_Countdown'] = df3[['Lot Expiration Date']].sub(df3['Now_Time'], axis = 0 )
df3.dtypes
df3['Hours_Countdown'] = df3['Days_Countdown'] / np.timedelta64(1, 'h')
df3 = df3.sort_values('Hours_Countdown')
df_expiration = df3[df3.Hours_Countdown<12]
df_expiration['Hours_Countdown'].astype(int)
df_expiration
df_expiration.to_excel('C:/Users/Salman/Desktop/WIP Board - 000.xlsx', sheet_name = 'Sheet1')
The method for creating an exe file from cs_Freeze is correct. Because I converted a simple script HelloWorld.py to exe and it worked fine. It is not importing the pandas library and just exits the exe.
Maybe you need to add pandas and numpy to the packages list. cx_freeze can be a bit dodgy when it comes to finding all the necessary packages.
build_exe_options = {"packages": ["os", "numpy", "pandas"],
"excludes": ["tkinter"]}
It seems that this (including the packages in the setu.py file) doesn'work for CX_freeze 5 and 6 ( as I understand it, these are the latest versions).
I had the same problem and whatever advice I followed here, including adding the packages. It is numpy that seems to cause the trouble.
You can test this by putting import numpy in your very simple testscript, and see it crash when you freeze it.
The solution for me worked for python 3.4, but I doubt that it works under python 3.6:
I uninstalled cx_freeze and reinstalled cx_freeze 4.3.4 via pip install cx_Freeze==4.3.4 and then it worked.
pd.read_csv works but pd.read_excel does not, once the application is cx_freezed.
Pandas read_excel requires an import called xlrd. If you don't have it, you need to install it and pd.read_excel works after.

How do I write to excel using pandas?

I have tried two methods and I keep getting an error.
My Current Code:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
writer = pd.ExcelWriter('/Users/anonymous/Desktop/Version4.xlsx', engine='xlsxwriter').
df.to_excel(writer, sheet_name='Ex1')
writer.save()
I have the following error:
.... import xlsxwriter ModuleNotFoundError: No module named 'xlsxwriter'
I also tried not having an engine and then it had the same error except the module was openpyxl. I checked to make sure in my terminal and both of them are installed.
open command script. go to start, search for cmd, then install pip.
I was having this error, then I solve it by doing this from another solution:
pip install xlsxwriter
bro,
have you tried the simple
pd.read_excel()
pd.to_excel()
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_excel.html

Resources