Removing Blank Columns from an excel file - python-3.x

I have written a program to download data from an API, and parse the json response into an excel (xlsx) sheet in multiple worksheets.
However, depending on user's choice on what attributes to be saved, there can be empty columns in the excel. Is there any way to delete empty columns from all worksheets in the excel file?
I can do this through pandas, however, that is not possible in this case, as pandas and openpyxl library breaks the GUI based on the python script?
Is there any other way to implement this?
Edit:
Current Output:
+----------+----------+--+----------+----------+
| Column A | Column B | | Column D | Column E |
+----------+----------+--+----------+----------+
| A1 | B1 | | D1 | |
| A2 | | | D2 | |
| A3 | B3 | | | E3 |
+----------+----------+--+----------+----------+
Desired Output:
+----------+----------+----------+----------+
| Column A | Column B | Column D | Column E |
+----------+----------+----------+----------+
| A1 | B1 | D1 | |
| A2 | | D2 | |
| A3 | B3 | | E3 |
+----------+----------+----------+----------+
Current Code for removing blank columns:
excel_file = pd.ExcelFile(path, engine='openpyxl')
df = pd.read_excel(excel_file,header=None, sheet_name=None)
writer = pd.ExcelWriter(finalpath,engine='openpyxl')
for key in df:
sheet= df[key].dropna(how="all").dropna(1,how="all")
sheet.to_excel(writer, key,index=False, header=False )
writer.save()
Need to replace this code with something which don't references pandas library.

Related

Is there a way to automate populating cells with the names of the column depending on whether the row has values?

I'm tasked with populating cells in excel with values that are names of the columns. For example here's the type of table I'm trying to accomplish. I need to populating the values of column A with the names of the columns that have values (an "x") in that specific row.
+---+-----------+----------+----------+-----------------+
| | A | B | C | D |
+---+-----------+----------+----------+-----------------+
| 1 | | name1 | name2 | name3 |
+---+-----------+----------+----------+-----------------+
| 2 |name2;name4| x | | x |
----+-----------+----------+----------+-----------------+
| 3 |name3;name4| | x | x |
+---+-----------+----------+----------+-----------------+
| 4 |name1;name2| x | x | |
+---+-----------+----------+----------+-----------------+
| 5 | | x | x | x |
+---+-----------+----------+----------+-----------------+
| 6 | | | | |
+---+-----------+----------+----------+-----------------+
For example, A5 should have the value name1;name2;name3, because columns B, C and D have values. A6 should not have anything.
Is there a way to automate this in Excel? Or do I just have to keep doing it manually?
Thank you!
With Office 365 and later:
=TEXTJOIN(";",TRUE,FILTER($B$1:$D$1,B2:D2="x",""))
With Excel 2019:
=TEXTJOIN(";",TRUE,IF(B2:D2="x",$B$1:$D$1,""))
And use Ctrl-Shift-Enter instead of Enter when exiting edit mode.
If one has an older version than those above they can use a UDF in vba. HERE is a UDF that mimics TEXTJOIN. The formula used would be the second one above.

Excel: get the value of third column on the behalf of second column

i am not much familiar with excel formulas and i am trying to get the value of third column on the behalf of second column.
Example:
|---------------------------------------------------------|
| A B C D E |
|-----|----------|----------|--------------|--------------|
|Sr.No| Bar Code | Cat Id | Org BarCode | Org Category |
|---------------------------------------------------------|
| 1 | 89457898 | | 85214784 | 2 |
| 2 | 87414714 | | 63247458 | 3 |
| 3 | 85214784 | | 89457898 | 4 |
| 4 | 63247458 | | ---- | --- |
-----------------------------------------------------------
i just want to update column C by column E on the behalf of column D and B
can any one please tell me the formula, how i can do this?
Use VLOOKUP. Enter the following formula into cell C1 and then copy it down the C column:
=VLOOKUP(B1, D$1:E$4, 2, FALSE)
To cover more than 4 rows, then just update the formula accordingly. If you want to display a certain placeholder value if a value in column B be not found, then you wrap the call to VLOOKUP as follows:
=IFNA(VLOOKUP(B1, D$1:E$4, 2, FALSE), "Not found")

excel return the value of a cell based on two other values

What I'm trying to do is a little complex but I think it's doable in Excel.
I have two worksheets in a workbook on sheet one I have this...
| Code1 | Code2 | Code3 | Code4 |
| BA1 | xxxxx | xxxxx | |
| BA2 | xxxxx | xxxxx | |
| BA3 | xxxxx | xxxxx | |
And on the second sheet...
| CodeA | CodeB | CodeC | CodeD |
| BA1 | 1 | date | text |
| BA3 | 1 | date | text |
| BA1 | 2 | date | text |
| BA2 | 1 | date | text |
| BA1 | 3 | date | text |
| BA3 | 2 | date | text |
| BA2 | 2 | date | text |
What I want to do is lookup Code1 on sheet one and find it in the second sheet in CodeA then find the highest CodeB for CodeA and then concatenate CodeC and CodeD and place them on Sheet one in Code4.
I hope that makes sense, Thanks for any advice.
I think I understand. Does this look correct?
Sorry for the swedish formulas but it's an array formula that you add with CTRL+SHIFT+ENTER.
The formula in english is:
{=MAX(IF(Data=A2,CodeB;-1))}
And the named range Data is Column H and I, and CodeB is Column I.
If it does not find the value it returns -1
Sorry noticed now that I only did half of the job.
Make another named range called Table that spans column I to K (Code B -> Code D).
And in column code3 add this formula:
=Vlookup(B2,Table,2,false)
And in code4:
=Vlookup(B2,Table,3,false)
And you should get:
This should find the results you are looking for.
This is an array formula so you will need to press CTRL+SHIFT+ENTER once you have entered it into the formula bar, this will have to done for every formula you add to the column.
As it is an array formula I have only written it to reference rows 1 to 18, you will need to update all references to include you last row.
Columns titled CODE1(to 4) are on the first sheet (Sheet 1)
Columns titled CODEA(to D) are on the Second sheet (Sheet 2)
=CONCATENATE(VLOOKUP(CONCATENATE(A2,MAX(IF(Sheet2!A:A=A2,Sheet2!B:B,-1))), CHOOSE({1,2},Sheet2!A1:A18 & Sheet2!B1:B18, Sheet2!C1:C18 ),2,0)," ",VLOOKUP(CONCATENATE(A2,MAX(IF(Sheet2!A:A=A2,Sheet2!B:B,-1))), CHOOSE({1,2},Sheet2!A1:A18 & Sheet2!B1:B18, Sheet2!D1:D18 ),2,0))
If you do not require a space between the dates, just remove " ", from the middle of the formula.

Excel - Skip Blank Table Cells Formula

I have a table being created via an XML map so it has a lot of blank cells in each column. It looks like:
| Name | Stat 1 | Stat 2 | Stat 3|
| Test | | | |
| | Four | | |
| | | 5 | |
| | | | 102 |
Basically each row has only one value and I am trying to transpose it onto another worksheet where all the values are one row like this:
| Name | Stat 1 | Stat 2 | Stat 3 |
| Test | Four | 5 | 102 |
In my searching I found this formula:
=IFERROR(INDEX(Table9[#name],SMALL(IF(Table9[#name]<>"",ROW(Table9[#name])-ROW(Table9[#name])+1),ROWS(A2))),"")
I set that and in A1 of another sheet and drag it down and it does return the populated cells but it is also returning 0 for all the blank cells instead of skipping them until it has a value to return.
There may be a better way to do this so I am open to other options but would prefer to avoid vba if possible.
Thanks.
Let's say input sheet is called Sheet1 and the Name is in cell A1 on both sheets. Then use following formula for Name on output sheet:
=INDEX(Sheet1!A:A,(ROW()-2)*4+2)
and for Stat 1:
=INDEX(Sheet1!B:B,(ROW()-2)*4+3)
and so on ... more generally:
=INDEX(
input_column_range,
(ROW()-first_row_in_output)*number_of_columns + first_row_in_input+column_index
)

Creating an SQL select statement out of excel values

I have a sheet with 2 columns. I need to CONCATENATE the two cells within each row to create a large WHERE statement in the SQL based off every row. For example:
Where A1 = 'B1' and A2 = 'B2' etc etc.
What do you suggest is the best method to do this? I need to do this across many sheets. Originally I was going to do something like this: C1=CONCATENATE(A1," = ","'",B1,"'") across every row, then CONCATENATE those outputs as well (C1,D1 etc) but just wondering if there are any other options? Would using VBA be easier?
No need to use any functions.
You may do like this,
assuming your excel sheet is like:
| A | B | C | D |
1 | a1 | b1 | | |
2 | a2 | b2 | | |
3 | a3 | b3 | | |
4 | a4 | b4 | | |
Insert one new row between A and B, and write =' in cell B1 and drag that cell to COPY cell value upto total number of your rows. Similarly write ' and in Cell D1 and do same, so it will be like this.
| A | B | C | D |
1 | a1 | =' | b1 |' and|
2 | a2 | =' | b2 |' and|
3 | a3 | =' | b3 |' and|
4 | a4 | =' | b4 |' and|
Now, Copy paste these cells to Notepad++ and replace TAB and \n by a space (new lines)
So, now you should get string like,
a1='b1' and a2='b2' and a3='b3' and a4='b4' and
You just have to edit minor thing, place this to your query and remove last and

Resources