Convert '0000-00-00' to 'flag' in python - excel

I want to convert some data of dates into the string after checking them in a specific range. I have first converted all data into type float so that it can provide output as dates format but when I applied this for the dates it shows:
a1 = float(a1)
ValueError: could not convert string to float: '0000-00-00'
My whole code is:
import xlrd
import os.path
from datetime import datetime
date_array = []
wb = xlrd.open_workbook(os.path.join('E:\Files','SummaryLease.xlsx'))
sh = wb.sheet_by_index(0)
for i in range(1,sh.nrows):
a1 = sh.cell_value(rowx=i, colx=80)
if a1 is '0000-00-00':
date_array.append('flag')
else:
a1 = float(a1)
a1_as_datetime = datetime(*xlrd.xldate_as_tuple(a1, wb.datemode))
date_array.append(a1_as_datetime.date())
print(date_array)
How should I solve this?

Don't compare strings using is operator, use ==.
if a1 == '0000-00-00':
date_array.append('flag')
else:
a1 = float(a1)
You can read more about the difference here:
Is there a difference between `==` and `is` in Python?

Related

using Python to retrieve formatted strings from excel cell

I'm trying to pull a string from an excel cell that will retain it's formatting when executed in Python. For example. I'm only a week into learning this (and this is my first post on stackoverflow), please forgive any errors of convention in my code or post.
The variable 'name' is global and is defined through input earlier in the program. Everything works fine when the cell contents are defined in the program instead (ex: question = f"Hello {name} returns exactly what i expect, with the variable value swapped out for {name}).
I am pulling the correct workbook, sheet and cell (1,1), and the cell's contents are: Hello {name}
I've also tried: f"Hello {name}"
Input:
import openpyxl
from gtts import gTTS
import os
def speak(question):
language = 'en'
myobj = gTTS(text=mytext, lang=language, slow=False)
myobj.save("q.mp3")
os.system("q.mp3")
path = "wb1.xlsx"
wb_obj = openpyxl.load_workbook(path)
sheet_obj = wb_obj.active
question = f"{sheet_obj.cell(row = 1, column = 2).value}"
speak(question)
Output:
Hello {name}
I've tried the above format of question = f"(...)" as well as without the formatting. I've also tried leaving the sheet_obj.cell(row = 1, column = 2).value as is without formatting the string. Nothing has worked for me yet, any insight would be greatly appreciated. This community has been an amazing resource so far! Thanks in advance!
To use a dynamically created format string, use the eval function.
This example may help. It creates an excel file with a formatted cell value, then retrieves the format string and creates an final output string using the eval function.
import openpyxl as px
# create workbook
wb = px.Workbook()
ws = wb.active
ws.cell(1,1).value='hello {xxx}' # format string
wb.save("extest.xlsx")
# retrieve workbook
wb = px.load_workbook('extest.xlsx')
ws = wb.worksheets[0]
v = ws.cell(1,1).value # hello {xxx}
xxx = 'python'
print(v) # hello {xxx}
print(eval('f"' + v + '"')) # hello python
print(eval('f"' + ws.cell(1,1).value + '"')) # hello python
print(eval(f'f"{ws.cell(1,1).value}"')) # hello python
Output
hello {xxx}
hello python
hello python
hello python

Pandas read excel and skip cells with strikethrough

I have to process some xlsx received from external source. Is there a more straightforward way to load a xlsx in pandas while also skipping rows with strikethrough?
Currently I have to do something like this:
import pandas as pd, openpyxl
working_file = r"something.xlsx"
working_wb = openpyxl.load_workbook(working_file, data_only=True)
working_sheet = working_wb.active
empty = []
for row in working_sheet.iter_rows("B", row_offset=3):
for cell in row:
if cell.font.strike is True:
p_id = working_sheet.cell(row=cell.row, column=37).value
empty.append(p_id)
df = pd.read_excel(working_file, skiprows=3)
df = df[~df["ID"].isin(empty)]
...
Which works but only by going through every excel sheet twice.
Ended up subclassing pd.ExcelFile and _OpenpyxlReader. It was easier than I thought :)
import pandas as pd
from pandas.io.excel._openpyxl import _OpenpyxlReader
from pandas._typing import Scalar
from typing import List
from pandas.io.excel._odfreader import _ODFReader
from pandas.io.excel._xlrd import _XlrdReader
class CustomReader(_OpenpyxlReader):
def get_sheet_data(self, sheet, convert_float: bool) -> List[List[Scalar]]:
data = []
for row in sheet.rows:
first = row[1] # I need the strikethrough check on this cell only
if first.value is not None and first.font.strike: continue
else:
data.append([self._convert_cell(cell, convert_float) for cell in row])
return data
class CustomExcelFile(pd.ExcelFile):
_engines = {"xlrd": _XlrdReader, "openpyxl": CustomReader, "odf": _ODFReader}
With the custom classes set, now just pass the files like a normal ExcelFile, specify the engine to openpyxl and voila! Rows with strikethrough cells are gone.
excel = CustomExcelFile(r"excel_file_name.xlsx", engine="openpyxl")
df = excel.parse()
print (df)
In this case I would not use Pandas. Just use openpyxl, work from the end of the worksheet and delete rows accordingly. Working backwards from the end of the worksheet means you don't suffer with side-effects when deleting rows.

From CSV list to XLSX. Numbers recognise as text not as numbers

I am working with CSV datafile.
From this file I took some specific data. These data convey to a list that contains strings of words but also numbers (saved as string, sigh!).
As this:
data_of_interest = ["string1", "string2, "242", "765", "string3", ...]
I create new XLSX (should have this format) file in which this data have been pasted in.
The script does the work but on the new XLSX file, the numbers (float and int) are pasted in as text.
I could manually convert their format on excel but it would be time consuming.
Is there a way to do it automatically when writing the new XLSX file?
Here the extract of code I used:
## import library and create excel file and the working sheet
import xlsxwriter
workbook = xlsxwriter.Workbook("newfile.xlsx")
sheet = workbook.add_worksheet('Sheet 1')
## take the data from the list (data_of_interest) from csv file
## paste them inside the excel file, in rows and columns
column = 0
row = 0
for value in data_of_interest:
if type(value) is float:
sheet.write_number(row, column, value)
elif type(value) is int:
sheet.write_number(row, column, value)
else:
sheet.write(row, column, value)
column += 1
row += 1
column = 0
workbook.close()
Is the problem related with the fact that the numbers are already str type in the original list, so the code cannot recognise that they are float or int (and so it doesn't write them as numbers)?
Thank you for your help!
Try int(value) or float(value) before if block.
All data you read are strings you have to try to convert them into float or int type first.
Example:
for value in data_of_interest:
try:
value.replace(',', '.') # Note that might change commas to dots in strings which are not numbers
value = float(value)
except ValueError:
pass
if type(value) is float:
sheet.write_number(row, column, line)
else:
sheet.write(row, column, line)
column += 1
row += 1
column = 0
workbook.close()
The best way to do this with XlsxWriter is to use the strings_to_numbers constructor option:
import xlsxwriter
workbook = xlsxwriter.Workbook("newfile.xlsx", {'strings_to_numbers': True})
sheet = workbook.add_worksheet('Sheet 1')
data_of_interest = ["string1", "string2", "242", "765", "string3"]
column = 0
row = 0
for value in data_of_interest:
sheet.write(row, column, value)
column += 1
workbook.close()
Output: (note that there aren't any warnings about numbers stored as strings):

python3 formatting SQL response from rows to string

im trying to print values from database and im getting this output:
[('CPDK0NHYX9JUSZUYASRVFNOMKH',), ('CPDK0KUEQULOAYXHSGUEZQGNFK',), ('CPDK0MOBWIG0T5Z76BUVXU5Y5N',), ('CPDK0FZE3LDHXEJRREMR0QZ0MH',)]
but will like to have this fromat:
'CPDK0NHYX9JUSZUYASRVFNOMKH'|'CPDK0KUEQULOAYXHSGUEZQGNFK'|'CPDK0MOBWIG0T5Z76BUVXU5Y5N'|'CPDK0FZE3LDHXEJRREMR0QZ0MH'
Python3
existing code
from coinpayments import CoinPaymentsAPI
from datetime import datetime
from lib.connect import *
import argparse
import json
sql = 'SELECT txn_id FROM coinpayment_transactions WHERE status = 0 '
mycursor.execute(sql)
result = mycursor.fetchall()
mydb.close()
print(result)
What you are getting is a list of tuples and it is stored in result object. If you want the output to be formatted the way you say then do this
#Paste this instead of print(result)
output=''
for i in result:
if (output!=''):
output=output+'|'+"'"+i[0]+"'"
else:
output=output+"'"+i[0]+"'"
print(output)
The better way to do these kinds of thing is using join and format() methods of string.
Here is your solution:
output = '|'.join([f"'{row[0]}'" for row in result])
print(output)

How to keep one column with numbers as string in xlsxwriter while converting csv to xlsx?

I'm using script from here convert-csv-to-xlsx to do the conversion with little modification (I add several arguments including 'strings_to_numbers': True ):
import os
import glob
import csv
from xlsxwriter.workbook import Workbook
for csvfile in glob.glob(os.path.join('.', '*.csv')):
workbook = Workbook(csvfile[:-4] + '.xlsx', {'constant_memory': True, 'strings_to_urls': False, 'strings_to_numbers': True})
worksheet = workbook.add_worksheet()
with open(csvfile, 'rt', encoding='utf8') as f:
reader = csv.reader(f)
for r, row in enumerate(reader):
for c, col in enumerate(row):
worksheet.write(r, c, col)
print(r)
workbook.close()
Everything works fine but as I've added the above mentioned argument I'm getting all the numbers as numbers in my xlsx files. I need to keep one column (first) as a string as there are long numbers which convert (due to excel number length limitation) to something like this 1,04713E+18 example
Maybe i need to remove the argument and convert the needed columns from strings to numbers at the end. Is that possible?
Based on Workbook documentation. strings_to_numbers converts string to float. There is no parameter that converts string to int. Therefore, you need to make conversion manually.
Instead of doing worksheet.write(r, c, col). You need to do worksheet.write_number(r, c, col). Also, you need to convert col to int(col).
Number can be checked by regexp or any other method.

Resources