I usually do my data cleaning in python but the issue with python (pandas) is that when you read and print a table to excel it doesn't retain any of the excel formatting.
In this case I was given a large table where a lot of the cells are color coded and or commented. I need to retain all the coloring, comments, font styles and etc. I don't know how else to do that but to work in excel
The issue:
In one sheet I have a large table (400 rows x 45 columns). It is structured like below
Sheet 1:
|ID|C|D|E|F|
:--|:--|:--|:--|:--|
|EDMU025|1|2|3|4|
|EDMU026|5|6|7|8|
|EDMU027|9|2|3|4|
|EDMU028|5|6|7|8|
In another sheet I have a series of small tables which look like this
Sheet 2:
|ID|Date|C|D|E|F|
:--|:--|:--|:--|:--|:--|
|EDMU025|9/14/22|100|210|300|450|
|EDMU025|9/14/22|100|200|340|400|
|||||||
|Value to be replaced||100|200|300|400|
|||||||
|EDMU028|9/14/22|700|810|900|550|
|EDMU028|9/14/22|700|800|940|500|
|||||||
|Value to be replaced||700|800|900|500|
For each ID in Sheet 2 I need to find the ID in Sheet 1 and replace the values in sheet 1 columns C-F with the Values to be replaced.
The output would be:
|ID|C|D|E|F|
:--|:--|:--|:--|:--|
|EDMU025|100|200|300|400|
|EDMU026|5|6|7|8|
|EDMU027|9|2|3|4|
|EDMU028|700|800|900|500|
What is the most efficient way to do that for the entire table (while still keeping the original values that don't need to be replaced intact?)??
Try nested XLOOKUP() like-
=XLOOKUP(A2,Sheet2!$B$1:$B$15,Sheet2!$D$1:$G$15,XLOOKUP(A2,$A$2:$A$15,$B$2:$E$15,"",0),0,-1)
Related
I am attempting to combine multiple columns that get filtered results from another workbook that uses checkboxes. The checkboxes when true send a ticker to another sheet but to 6 separate columns. My goal is to automatically form these tickers into their own column once sent to the other workbook.
The workbook which receives the tickers can look like this in each of their own columns.
(The format when I post this shows 8 columns but its only 6 FYI, A:F)
WTRH PRKR GESI REV XPON
#CALC!
SIMP CNTA
ELMS MNSO
CXDO
I tried doing a text join separated by a comma but it won't work anytime a column doesn't have a value. (Coming from the other spreadsheet column D is tickers from a portfolio, column E is insider buying, etc and sent by making checkboxes true)
Either way my goal is to get the tickers into one column with each in its own row automatically every time I check a box from the other workbook.
For instance
WTRH
PRKR
GESI
REV
XPON
SIMP
CNTA
ELMS
MNSO
CXDO
I'm still very new to VBA and after attempting to use intersect and making a row count I only got more confused. Any help is greatly appreciated! Thank you.
If you are open to go with formula then could try-
=FILTERXML("<t><s>"&TEXTJOIN("</s><s>",TRUE,IF(IFERROR(A1:H3,0)<>0,IFERROR(A1:H3,0),""))&"</s></t>","//s")
I am trying to achieve the following using OpenpyXL. I need to apply a filter based on a condition to my sheet and then delete all visible rows at once.
Initial State:
After I apply the filter:
I now need to delete these 2 rows which are visible after the filter is applied.
May be something like:
ws.auto_filter.ref = "A1:C6"
ws.auto_filter.add_filter_column(3, ["1"])
ws.visiblecells.delete.entirerow
Any help is appreciated.
You can't do this in openpyxl: the filter is applied by Excel based on the definition of the filter you create in openpyxl. In fact, when it filters, Excel hides the rows and adjusts the formatting.
If you want to filter the data in Python then you will need to write your own code. The following example should help.
rows_to_delete = []
for row in ws.iter_rows(min_col=3, max_col=3, min_row=2):
cell = row[0]
if cell.value == 1:
rows_to_delete.append(cell.row)
for row in reversed(rows_to_delete): # always delete from the bottom of the sheet
ws.delete_rows(row)
For more complicated filtering, you might want to load the cells into a Pandas dataframe, filter there, and write the results back into the worksheet.
I have one sheet to record the raw data.
Now, I want to make another is for picking up the validated.
My Raw table is like the following
==Sheet A==
1--John--1992--Attend
2--Mary--1990--
3--Jam--1920--Attend
4--Mark--4820--
5--Aaron--4710--Attend
6--Chris--6893--Attend
And I expect having another sheet for picking up the "Attend" and export like this
==Sheet B ==
1--John--1992
2--Jam--1920
3--Aaron--4710
4--Chris--6893
So I tried this
=INDEX('Sheet A'!A1:B6,Match("Attend",'Sheet A'!C2:C6))
But the formula I wrote only exported the first row
==Sheet B ==
1--John--1992
How could I get the rest data?
Filter() formula is best fit for this case. Try-
=FILTER('Sheet A'!A1:C6,'Sheet A'!C1:C6="Attend")
I have a large table that contains a large data. Most of the time when I apply a filter to it i can manipulate and edit the filtered data with no problems. However sometimes(every 200th time perhaps...) when i select filtered range and try to paste in the selection some text - it seems like it has done the job but when I unfilter the table, the range that was edited is the range as it wasn't filtered at all.
Example:
my data is A1:A10
the filtered range is the cells A1 and A10,
when I select the filtered range and paste a text, occasionally the whole A1:A10 range is changed.
Anyone faced this issue?
the consequences are disastrous.
How will i avoid it in the future.
Thanks!
Ok I figured it out.
When the data is filtered, I select cells and want to apply some changes to it - what happens is that excel defines the rows range for manipulation as "upper row in selection to bottom row in selection":
the problem is that sometimes the row indexes are not consecutive(common issue when the data is not logically ordered in the first place) and excel treats the whole range in between the visible selected cells as the range for manipulation TOO.
It hapened to me occasionally only because my data is more or less ordered.
Example: a small table of numbers
**nums**
1
2
3
4
5
3
6
if i filter the nums table to show only 3s
it will show me this:
**nums**
3
3
when i select these two cells by dragging from one 3 to the other, paste the number 0 and unfilter the table back, the result will be
**nums**
1
2
0
0
0
0
6
because the cells inbetween the visible cells were in selected range too.
To prevent it, the solution is as Lior suggested:
Find & Select --> Go to... --> Visible cells only.
After you select the column you want to edit use select visible cells only.
in the link there is an example of the copy you can use the same for the paste.
http://office.microsoft.com/en-001/excel-help/copy-visible-cells-only-HA010244897.aspx
I'm using GemBox to read Excel files. I'm copying the fields to a DataTable, so I have to add the columns to the DataTable first.
Therefore I'm using this code:
For i As Integer = 0 To objWorksheet.Columns.Count - 1
objDataTable.Columns.Add(i, GetType(ExcelCell))
Next
But objWorksheet.Columns.Count is 0 even if there is data in 4 columns.
Any ideas?
Cells are internally allocated in rows and not in columns. ExcelColumn objects are created only if they have non-standard width or style, or they are accessed directly. So, while ExcelRowCollection.Count shows number of rows occupied with data, ExcelColumnCollection.Count does not say which Column is the last one occupied with data!
If you want to read all data in a sheet, use ExcelRow.AllocatedCells property.
If you want to find last column occupied with data, use CalculateMaxUsedColumns method.
In version 3.5 method ExcelWorksheet.CreateDataTable(ColumnTypeResolution) is added. This method will automatically generate DataTable columns with appropriate type from excel file columns and will import cells with data to DataTable rows.