Perl Excel::Writer::XLSX - set column format/merge cells dynamically - excel

I am using Excel::Writer::XLSX to create an Excel file from an array of arrays. Right now I'm trying to create a formatted table from the data (as much as I can, as opposed to just spitting it back into another file).
First off, when I use set_column() to set the background color, that color is formatted for the entire column. Is there a way to specify to only go as far as the content in the file goes? Unfortunately, when the program is run it is dynamic each time and unknown what the final row in the table should be.
Second, is there a way to merge cells based on the content inside of them? This has to do with the dynamic problem again, there is an optimal output if all the data I am gathering is online. If that were the case I could easily set a range of what these merged cells should be. But for example, if I have 10 rows of column 2 saying 'A' and then 10 rows of column 2 saying 'B', I would like to merge the A's and B's together. The issue is that is is unknown if it will always have 10 rows with that value inside of it.
Thanks for your input!

First off, when I use set_column() to set the background color, that color is formatted for the entire column. Is there a way to specify to only go as far as the content in the file goes?
No. You will have to have to add the format to the cells as you write them.
But for example, if I have 10 rows of column 2 saying 'A' and then 10 rows of column 2 saying 'B', I would like to merge the A's and B's together.
This isn't possible with Excel::Writer::XLSX. (In fact I don't think it is possible in Excel without using macros).
Since both of your issues relate to not knowing the size and value of the data beforehand then perhaps you could first read your data into an array of arrays, process it to find the required format dimensions and merge ranges and then write them out.

Related

dynamically generate formula in excel

Lets say I have two sets of data and want to compare each row and column to make sure that they are identical.
Both sets of data have the same number of Columns and rows, say first set is Columns A-G, 2nd set of data is on the same tab an goes from H-N (in reality I actually have 50+ columns in each set).
Typically what I do when I don't have a lot of columns, i do something like:
=if(AND(A2 = h2, B2=i2, c2=j2),"Good","Bad")
Once I have a formula, then I press the little square and drag it down across all rows. This is able to quickly show me whether there is data difference in any of the columns or not.
However in this case I have a lot of columns to compare. Is there a quicker way to do this, or generate dynamically somehow?
Thanks.
You could use SUMPRODUCT:
=IF(SUMPRODUCT(--(A2:G2=H2:N2))=0,"Good","Bad")
=TEXTJOIN(,,A1:C1)=TEXTJOIN(,,h1:j1)
This will return either TRUE or FALSE.

Indirect reference to thisRow in Excel table

The situation: I have an automatic procedure for gathering data from different input-sheets and presenting in a pivot-friendly format. It appears others are in need of the same data, though they want it formatted slightly differently (and they are not friends with excel). I therefor have a version of my table formatted as they want it (with empty columns where my extract does not contain any data).
The table (both) is one line for each department for each year for each cost/income (from now, cost) category. The raw data contains the cost for each year, though some of the users want it to be cost delta from initial year. I want:
One column for raw cost (X). One column for delta cost (Y). One output column (Z) that contains one of those two values, depending on dropdown selection. The first two columns are situated to the right of the "select with mouse and copy these"-columns, so that I dont need to teach the other users how to select non-adjecent columns :P (just letting u know the level of understanding i have to work with here)
Now the naive approach to this would be to have an if-statement in column Z like this:
=IF(selected_Calc="Use raw cost";[#[X]];[#[Y]])
Alternatively nest more ifs (one for "Use difference to 2019", and potentially add more nesting if more ways to show the value should appear in future)
This works. However, it isnt as elegant as I would like it, and if I indeed end up with more ways to calculate this for other people, it will be a lot of nested ifs.
I was therefore considering something like this:
=INDIRECT("[#["INDEX(mapTab_out;match(selected_Calc;mapTab_in;0))&]]")
But this gives a #ref, and tbh i didn't really expect it to work.
The idea is though: .
Have a range mapTab_in. This has the different selections for the dropdown box.
Have the adjecent range mapTab_out. This has the name of the column (X,Y...) that contains the desired calculation)
Have in column Z a formula for selecting which column's (X,Y...) value is to be displayed in Z
The google-stuff I have found so far all seem interested in using the indirect function from outside the table, and usually want to sum an entire column. I have used this in the past. The "ThisRow" things like using # dont seem to work with indirect though. Any ideas, or have I simply made some beginner-error in my formula?
Assuming it's in the same table, you can take advantage of implicit intersection and simply use:
=INDEX(Tablename,,MATCH(selected_Calc,Tablename[#Headers],0))
where selected_Calc is the name of the column you want back. (You could make that the result of a further INDEX/MATCH if you want to use a lookup table for some reason.)

how to auto sorted result from vlookup(google sheet)

I have this formula on google sheet
VLOOKUP(upper(J2:J),colorState!A:B,{2}*sign(row(J2:J)),FALSE)
and I want it to sort the result ascending automatically when I add new data or edit(like arrayformula)
Is there anyway or any formula to do that? (I know that there's SORT formula but I'm not sure how to use it together)
thanks.
I believe I understand what you need :)
Essentially what I understand is that you would like to recreate the "main" sheet but have it automatically ordered by the 'color' column when new data is added. I don't have any idea how to do this to the raw data but you can mirror the raw data by creating another sheet (name 'mainmirror') and in cell A1 just enter this formula:
=query(main!$A:$R,"select * order by P ASC",-1)
It will take you 2 seconds to reformat with a filter view, and you'll be left with a mirror of 'main' that is always sorted by column P and should remain current as data is added.
Hopefully this is an acceptable workaround. Other option would be to use a script but this is less tedious if it's suitable.
Side note: this method will turn your values into strings to mirror them on the duplicate sheet, so on the 'main' sheet I would recommend changing the cell format of column P to a custom number format, 00, which will ensure there's a leading 0 if there's only one digit. this will cause the strings in the mirror to sort correctly, instead of 1,11,12,2,3,4,etc. If you're expecting column P to have 3 digit value, make the number format 000 accordingly.

Using AverageIf function on large amount of cells does not behave as expected. What am I doing wrong?

I have an excel spreadsheet that I'm fooling around with attempting to analyze data. I could go the pivot table route possibly, but I like being stubborn and building my own formulas/tables to analyze the data sometimes.
Anyways my problem is this:
I'm trying to find the average of a column of cells (just one column) that contain 'x' value somewhere in the same row. Using the function AverageIf, I can easily do this 'manually', but I'd like to be able to change the 'x' value by editing another cell.
Currently said code looks like this:
=AVERAGEIF($D$2:$D$527,L$2,$I$2:$I$527).
It works fine. But it limits what 'x' value I can sort the Average through to only ones within the D column. I attempted to highlight all the data I would potentially be analyzing like so:
=AVERAGEIF($A$2:$H$527, L$2, $I$2:$I$527)
That just gives me all sorts of issues. I'm obviously not utilizing the If function correctly here, is there anyway to fix this error or am I stuck analyzing column by column for different 'x' values.
Side note, 'x' values are all string text, not an actual digit. Not sure if that makes a difference. And I am attempting to make a small table with this data (not an actual excel table), hence the $ in the formulas, as I'm using the fill option. There are just a ton of different comparisons I could potentially make and I don't like limiting myself.
Also, when I used fill to move this formula over one column, I have a completely different error from the above formula. In the first case, the output is a . .
In the second case the output is a #div/0 error. Only difference in the two formula's is the criteria portion.
The actual code for the second filled formula:
=AVERAGEIF($A$2:$H$527, M$2, $I$2:$I$527)
Though the output is changing based upon the 'x' value. Some work fine after testing a few, most give me issues.
EDIT: Playing around with this for a bit, weird stuff keeps happening. I find it depends where on the 'table' I set up matters to how the average is calculated. In other words, an 'x' value that doesn't work in the first column of my analyzing table will work perfectly fine elsewhere in the table, but only if certain other values are selected in the other spaces of the table. If that doesn't make sense in how I described it, let me know.
EDIT 2: Just going to throw up the data set, DropBox Excel File
The problem is that when using a multi-column range as the first parameter, the "value" range (Col I) in your example gets shifted to the right, according to the location on each row where the "criteria" value is matched.
Simple example:
Using "A" as the criteria get the average from ColF as expected, but using "B" or "C" gets the average from cols G and H respectively.
I think in some cases you're actually averaging numbers in your "analysis" block (which in your posted file is to the right of your data block).

spss count all cells that are not empty

I am fairly new to SPSS, and I am test a file brought in from Excel into SPSS that has roughly 100 columns (variable name in first row) with some data in each row. What I would like to check is that if any data from any of the cells was dropped. I am trying to compare my "count" function in excel to whatever is possible in SPSS. If there are other ways to make sure no data was dropped?
It's perhaps easier to count all empty cells and see if that's zero for all rows. Note that in SPSS, empty cells virtually always indicate system missing values. Now if your first variable is x1 and your last variable is x5, running
count check = x1 to x5 (sysmis).
sort cases by check(d).
computes a new variable, check, holding the number of system missing values per row and sorts your rows according to it, thus moving the rows with most system missing values (if any) to the top of your file.
Alternatively, you could use the nmiss function. This includes user missing values too but these won't be present just after importing from Excel.
The best way is to set two. Assuming your Value column is A than
this what goes in column B =ISBLANK(A2). It will bring true or false values
Then second column C:
=IF(B2=TRUE,COUNTIF(B:B,B2 ),0)

Resources