Openpyxl Issue: Deleting row not moving merged cells - python-3.x

I am having an issue with openpxyl deleting a row and not moving the merged cells up. For example, in the first picture, I have two merged cells with values Alex & Bob. When I delete row 2, Alex gets moved up to a single cell, Bob gets deleted, and the position of the merged cells stay in the same spot while the remainder of the data points get moved up. When normally working with excel outside of Python, the merged cells would simply move up with the rest of the data. It appears that openpxyl wants to move up the data values but keep the position of the merged cells the same. What is the work around for this? Thank you in advance!!
Before deleting row 2:
ws.delete_rows(2)
When I delete row 2 the following happens:
How it should look like if you were deleting row two manually in excel:

I was facing a similar issue when removing columns, merged cells that intersect such a column ended up being unchanged.
This works in a more satisfactory manner.
The code provides the delete_row() function, which can be used to imitate Excel behavior, and move merged cells accordingly:
import openpyxl
from openpyxl.utils import range_boundaries
from openpyxl.utils.cell import _get_column_letter
from openpyxl.worksheet.cell_range import CellRange
def delete_row(target_row):
# Assuming that the workbook from the example is the first worksheet in a file called "in.xlsx"
workbook = openpyxl.load_workbook('in.xlsx')
ws = workbook.worksheets[0]
affected_cells = [] # array for storing merged cells that need to be moved
row = target_row # specify the row that we want to delete
sheet_boundary = [4,6] # specify how far to search for merged cells in the sheet in the format of [ max_col, max_row ]
## Define a range of cells that are below the deleted row
# top left corner of the range; will be A2
tl_corner = "A"+str(row)
# bottom right corner of the row; will be D6
br_corner = _get_column_letter(sheet_boundary[0]) + str(sheet_boundary[1])
target_row_range_string = tl_corner+":"+br_corner
# express all cells in the row that is to be deleted as object CellRange from openpyxl
target_range = CellRange(range_string=target_row_range_string)
# loop over all merged cells in the sheet
for merged_cell_range in ws.merged_cells.ranges:
# if merged_cell is within target_range add current merged cell to 'affected_cells'
if merged_cell_range.issubset(target_range):
print("Cell '"+str(merged_cell_range)+"' is within range of '"+str(target_range)+"'")
affected_cells.append(merged_cell_range)
# unmerge all affected cells
for cell in affected_cells:
# get a tuple of coordinates, instead of Xlsx notation
cell_range = range_boundaries(cell.coord)
print("'"+str(cell)+"' ---> '"+str(cell_range)+"'")
# unmerge the affected cell
ws.unmerge_cells(start_column = cell_range[0], start_row = cell_range[1],
end_column = cell_range[2], end_row = cell_range[3])
# perform row deletion as usual
ws.delete_rows(row)
# merged all affected cells
for cell in affected_cells:
# get a tuple of coordinates, instead of Xlsx notation
cell_range = range_boundaries(cell.coord)
# merge back the affected cell, while lifting it up by one row
ws.merge_cells(start_column = cell_range[0], start_row = cell_range[1]-1,
end_column = cell_range[2], end_row = cell_range[3]-1)
# save the edited workbook
workbook.save('out.xlsx')
# call our custom function, specifying the row you want to delete
delete_row(2)
This finds all the merged cells that intersect the target_range, a range that starts with the deleted row and ends with the range defined by sheet_boundary and first unmerges them.
Only merged cells that are fully within target_range are changed.
If a part of the merged cell is within the target_range, then no operations are performed on that cell.
Then it deletes the desired row, and merges back all the affected cells, while taking into account that they have been moved up one row.

I found a temporary but unsatisfactory fix.
If I use the following code:
for merged_cell in ws.merged_cell.ranges:
merged_cell.shift(0,-1)
ws.delete_rows(2)
This will fix my problem and move the merged cells up. However, the one issue I have with this code is that it moves up ALL merged cells in the file. If I want to only moved the merged cells up in column A, I am not sure how to reduce the list of ranges to only include those in a.
For example, the following code doesn't work but highlights what I am trying to accomplish with specificity:
ws['A'].merged_cells.ranges

Related

Auto fill specific cell range with formulas

I have this formula:
IF(ROWS($Q$27:Q27)<=$P$25,INDEX(DataTable[[#All],[Time]],$P27),"")
and if I drag it to the right, it should automatically read each column respectively; example:
=IF(ROWS($Q$27:R27)<=$P$25,INDEX(DataTable[[#All],[Name]],$P27),"")
^Notice that the first Q27 is fixed, the second Q27 is variable.
I drag this formula to the right by 15 columns, and down to 50 rows. that's 750 formulas in total.
I want to do this in vba, but if I did this, it will be 750 lines of code for each cell representing each row/column.
example: .Range("G17").Formula=IF(ROWS($Q$27:R27)<=$P$25,INDEX(DataTable[[#All],[Name]],$P27),"""")
and if I drag it down, it will automatically pick up what I exactly want, example:
=IF(ROWS($Q$27:Q28)<=$P$25,INDEX(DataTable[[#All],[Time]],$P28),"")
so this formula should be written 750 times in total for the cell range [ A27:N76 ]
Any faster / more dynamic approach? and if possible, can I make it depend on more than 50 lines based on a cell value inside the sheet?
Example:
This should do it all in one line:
Range("A27:N76").FormulaR1C1 = "=IF(ROWS(R27C17:RC[16])<=R25C16,INDEX((DataTable[[#All],[Name]],RC16),"""")"
EDIT: Seems a more that one line of code required after all 😊
The code below will do what you want (this time fully tested)
Sub FillFormulas()
Dim inC%, rgHead As Range
''' Assumes the target sheet is Active.
''' o If that's not the case, change this With statement to reference the target sheet
With ActiveSheet
''' Set rgHead to the Table's header row
Set rgHead = .ListObjects("DataTable").Range.Rows(1)
''' Add the formulas to the target range, column by column updating the table header on the fly
With .Range("A27:N76")
For inC = 1 To .Columns.Count
.Columns(inC).FormulaR1C1 = _
"=IF(ROWS(R27C17:RC[16])<=R25C16,INDEX(DataTable[[#All],[" & rgHead.Cells(inC) & "]],RC16),"""")"
Next inC
End With
End With
End Sub
so this formula should be written 750 times in total for the cell range [A27:N76]
You don't need to do that. If you specify range.Formula, it will fill the proper formulas all the way across and down. Just give it the formula of the top/left most cell.
So, in your case
Range("A27:N76").Formula = "=IF(ROWS($Q$27:R27)<=$P$25 ... "
EDIT: This response had some obvious errors
This has an obvious error (as tested part and then merged to the full thing).
Range(A27:N76).FormulaR1C1 = "=IF(ROWS(R27C17:RC[16])<=R25C16,INDEX((DataTable[[#All],[Name]],$P27),"""")"

With openpyxl, nested list starts writing / appending to cell A2 not A1

I have an excel spreadsheet to calculate data that is added every week for comparison on the first sheet and each additional sheet is the raw data listed as W01 to W52 for each week. I've stripped down the code here to make the issue the only thing not working. In reality, I'm taking multiple CSVs and dumping them into a list, formatting the list and then I am trying to write that list into the first unused worksheet.
It is successfully finding the first empty worksheet, but then when it writes the data from the list into it, it is starting at A2 not A1.
So the worksheet already exists. I have tried deleting all the cells and clearing the contents within excel first, but it always starts at A2 not A1. What am I missing?
Note: If this helps, originally, I had just deleted last year's data from the worksheet, and when I left it like this, it would start adding the list's data on the first row after the deleted info, such as A52 or something. Deleting the contents of the sheet has "fixed" that issue so it is now only starting at A2 but I want to start it at A1.
If I manually add something to the A1 cell, it works, such as:
ws_week.cell(row=1, column=1).value = 'This should be A1'
So I think I can force it to write with a loop for the row number in a loop for the column number, but it seems like the below should be working.
import openpyxl
output_excel = 'KB Videos 2021.xlsx' #Excel Report
#opens excel report
wb = openpyxl.load_workbook(output_excel)
#find the first blank worksheet
for sheet in wb.sheetnames:
if sheet == 'Weekly Stats':
pass #ignore first worksheet i.e. calculations
elif wb[sheet]['A1'].value == None:
ws_week = wb[sheet]
break
test=[[1,2,3],['A','B','C'],[4,5,6],['a','b','c']]
for x in test:
ws_week.append(x)
wb.save(output_excel)
print('Populated ',ws_week.value)`enter code here`)
The 1,2,3 from the "test" list are being put into A2, B2, C2 when I expect them to be in A1, B1, C1.
What did I miss?
Worksheets should never really be considered as "blank". When appending to worksheets, openpyxl uses an internal counter that will start at the next below any existing cells. As openpyxl creates cells on demand, wb[sheet]['A1'].value will implicitly create the cell "A1" so that it can check the value. This is why data is subsequently appended from the second row. You can avoid this by deleting the row after the check, but you might also want to make the check a little more robust.
Here's the code I used to make it work:
I replaced:
for x in test:
ws_week.append(x)
with:
# calculate max rows and columns
maxr = len(test)
maxc = len(test[0]) #I could probably put this into the loop in case a column is different, but not with my data
for this_row in range (1, maxr + 1):
for this_column in range (1, maxc + 1):
cellsource = test[this_row-1][this_column-1]
ws_week.cell(row = this_row, column = this_column).value = cellsource

Excel Match Numbers in 2 Columns to a Number in a 3rd

I've run into a bit of a road block. I get a .PDF output from an accounting program and copy/paste the data into excel, then convert text to columns. I am trying to match the GL code with the totals for that specific account. Columns A, B, and C show the state of my data prior to sorting it, and the lines under Intended Output show how I would like the data to output.
I am trying to automate this process, so I can paste data into columns A, B, & C in the raw format and have it automatically spit out the required numbers in the format of the Intended Output. The GL codes remain the same, but the numbers and the number of rows will change. I've color coded them for ease of review.
Thank you very much in advance!
Using a combination of the following formulas you can create a list of filtered results. It works on the principal that you Data1 text that you want to pull is the only text with a "-" in it, and that the totals you are pulling from Data2 and Data3 are the only numbers in the column. Any change to that pattern will most likely break the system. Note the formulas will not copy formatting.
IFERROR
INDEX
AGGREGATE
ROW
ISNUMBER
FIND
Lets assume the output will be place in a small table with E2 being the upper left data location.
In E2 use the following formula and copy down as needed:
=IFERROR(INDEX(A:A,AGGREGATE(15,6,ROW($A$1:$A$30)/ISNUMBER(FIND("-",$A$1:$A$30)),ROW(A1))),"")
In F2 use the following formula and copy to the right 1 column and down as needed:
=IFERROR(INDEX(B:B,AGGREGATE(15,6,ROW($A$1:$A$30)/ISNUMBER(B$1:B$30),ROW(A1))),"")
AGGREGATE performs array like calculations. As such, do not use full column references such as A:A in it as it can lead to excess calculations. Be sure to limit it to the range you are looking at.
Try this procedure:
Public Sub bruce_wayne()
'Assumptions
'1. Data spreadsheet will ALWAYS have the structure shown in the question
'2. The key word "Total" (or whatever else it might be) is otherwise NOT found
' anywhere else in the 1st data column
'3. output is written to the same sheet as the data
'4. As written, invoked when data sheet is the active sheet
'5. set the 1st 3 constants to the appropriate values
Const sData2ReadTopLeft = "A1" 'Top left cell of data to process
Const sData2WriteTopLeft = "J2" 'Top left cell of where to write output
Const sSearchText = "Total" 'Keyword for summary data
'*******************
Const sReplaceText = "Wakanda"
Dim r2Search As Range
Dim sAccountCode As String
Dim rSearchText As Range
Dim iRowsProcessed As Integer
Set r2Search = Range(sData2ReadTopLeft).EntireColumn
sAccountCode = Range(sData2ReadTopLeft).Offset(1, 0).Value
iRowsProcessed = 0
Do While Application.WorksheetFunction.CountIf(r2Search, sSearchText) > 0
Set rSearchText = r2Search.Find(sSearchText)
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 0) = sAccountCode
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 1) = rSearchText.Offset(0, 1).Value
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 2) = rSearchText.Offset(0, 2).Value ' add this if there are more summary columns to return
'last two lines could be collapsed into a single line; at the expense of readability..
rSearchText.Value = sReplaceText 'so that next search will find the next instance of the trigger text
iRowsProcessed = iRowsProcessed + 1
sAccountCode = rSearchText.Offset(1, 0).Value
Loop
r2Search.Replace what:=sReplaceText, Replacement:=sSearchText
End Sub

Copy Block of Data with Blanks to Other Location

I'm reusing this string from another thread to copy a whole column WITH the blanks (needed for alignment of other information) to a new location. BUT I see that its copy action will stop at the first blank AND infact it does. What I need it to do is copy the blanks and everything as a block then put it under the Range as below. I considered filling all the blanks first but that just sends the fill value all the way to infinity. There will be more blanks than data.
Range(Range("P2"), Range("P2").End(xlDown)).Copy '!!!Stops at frist blank!!!
For idx = 1 To 1
Columns("P:P").Cut
Cells(Range("D2").End(xlDown).Row + 1, "D").Select
ActiveSheet.Paste
Next
I'm not seeing why it can't do that. I don't need it to be THIS code if there is some other solution. The task is that I'm changing the layout from one associated information fills a complete row to a "stacked" layout, where associated data (with some blanks) repeats down the column. So the segments are to be stacked. Cut copy pate with the whole columns has been mostly working.
It could copy a range based on the non empty value of another cell prior to moving it. BUT it needs to land on the first empty cell at the bottom of the new range. I'll repeat this for several columns but can do it separately.
I could bypass the first issue if I with some code that would look at the cells in columns BI through BO and fill them with a value ("0" or "-") IF the value in BH is NOT blank.
Your code and question is a little bit confusing but I think this is what you're looking for. this should copy all data from column P including blanks.
Range(Range("P2"), Range("P" & ActiveSheet.Cells(ActiveSheet.Rows.Count, "P").End(xlUp).Row)).Copy

ListRows.Count Returns Inconsistent Results

I have a strange problem with two of my Excel Tables residing on two different worksheets in my project. I am using VSTO but VBA shows the same result: an empty table's row count has 0 rows in one case and 1 row (I presume the insert row) in another case.
The Setup
Two worksheets: Sheet1, Sheet2
Two corresponding named Excel Tables: Sheet1Table, Sheet2Table
Both tables are empty, i.e. they have one empty row, which is insert row that cannot be deleted.
I run the following code to determine the number of data rows (i.e. excluding the header row):
Microsoft.Office.Tools.Excel.ListObject sheet1Table = Globals.Sheet1.Sheet1Table;
int numberOfListRows1 = sheet1Table.ListRows.Count;
and
Microsoft.Office.Tools.Excel.ListObject sheet2Table = Globals.Sheet2.Sheet2Table;
int numberOfListRows2 = sheet2Table.ListRows.Count;
The result is that numberOfListRows1 is 1 and numberOfListRows2 is 0 although the result (whichever is correct) should be the same. I compared the table and worksheet properties, as well as the source files in Visual Studio, and I could not spot any differences. Any idea what I should be looking for (and which result is the correct one)?
In VBA I used these subs to test your case:
Sub Sheet1TableRowsCount()
Dim numberOfListRows1 As Integer
Dim sheet1Table As ListObject
Set sheet1Table = Sheet1.ListObjects("Sheet1Table")
numberOfListRows1 = sheet1Table.ListRows.Count
Set sheet1Table = Nothing
End Sub
Sub Sheet2TableRowsCount()
Dim numberOfListRows2 As Integer
Dim sheet2Table As ListObject
Set sheet2Table = Sheet2.ListObjects("Sheet2Table")
numberOfListRows2 = sheet2Table.ListRows.Count
Set sheet2Table = Nothing
End Sub
These are the tables:
When these tables are created (incorrectly) by selecting the headers and one empty row and then formatting the selection as a table, they have one empty row. This row has no content, COUNTA across the row is 0, and it appears like the insert row that cannot be deleted. It is however a valid data row, so ListRows.Count will get the value of 1.
Similarly, if you fill in some data and manually delete those entered values (using Delete) so that your tables would look like at the beginning, the results will still be 1.
If you delete rows manually or programmatically (using something like Sheet1.Range("Sheet1Table").Rows("1").Delete), the ListRows.Count will then yield the value of 0.
The solution to determining the actual count of data rows is to check ListRows.Count, and if the result is 1 - delete the row after checking if it does not contain actual values. This whole situation can be avoided if an empty table is created by selecting the header row only before clicking Format as Table. The insert row is then created automatically and not counted as a data row (the result of ListRows.Count in this case is 0).

Resources