I am fairly new to SPSS, and I am test a file brought in from Excel into SPSS that has roughly 100 columns (variable name in first row) with some data in each row. What I would like to check is that if any data from any of the cells was dropped. I am trying to compare my "count" function in excel to whatever is possible in SPSS. If there are other ways to make sure no data was dropped?
It's perhaps easier to count all empty cells and see if that's zero for all rows. Note that in SPSS, empty cells virtually always indicate system missing values. Now if your first variable is x1 and your last variable is x5, running
count check = x1 to x5 (sysmis).
sort cases by check(d).
computes a new variable, check, holding the number of system missing values per row and sorts your rows according to it, thus moving the rows with most system missing values (if any) to the top of your file.
Alternatively, you could use the nmiss function. This includes user missing values too but these won't be present just after importing from Excel.
The best way is to set two. Assuming your Value column is A than
this what goes in column B =ISBLANK(A2). It will bring true or false values
Then second column C:
=IF(B2=TRUE,COUNTIF(B:B,B2 ),0)
Related
I have a table of total 12 columns and 30 rows. The table looks like below. Note that real data are very different than this, but follows this pattern - the value goes upto some number and keeps repeating for all rows.
Data
I want to plot a line chart that looks like this-
But I am getting this-
I am able to get an expected chart by manually deleting repeating values from the table. I am looking for a way to do that automatically.
Assuming your data is arranged like this:
You could create another table referring to the first one with these formulas:
So basically, this formula =IF(B3-B2=0,NA(),B3) in H4 copy-pasted in all cells but the first row.
Which would give:
And plotting this second table would give you the desired result since NAs aren't plotted (as mentionned by Solar Mike).
Caveat
This works only if the values are stricly increasing or decreasing for every row. If there is no change between 2 data points before the end of the series where it flattens for good, then there would be missing point in your line.
For example, if 2020.Q2 started with two zeroes in a row, you would have a NA appearing before you want it.
So, you would still need to manually replace those NAs.
But if you want to automate the whole process, you could add another table that checks if there is non-NA values after a NA and if there is change it back to the previous number.
Something like this:
In this solution, the formula in O3 would have to be : =IF(AND(ISNA(I3),PRODUCT(IF(NOT(ISNA(I4:I$8)),0,1))=0),I2,I3)
Explanation:
I4:I$8 is the range of values after the current cell. We use the $ so that the range is anchored to the last row.
IF(NOT(ISNA(I4:I$8)),0,1) returns an array filled with 0's where there is a non-NA value and 1's when the value is NA.
PRODUCT(IF(NOT(ISNA(I4:I$8)),0,1))=0 checks if the product of the elements in the array is 0. Since only one zero is needed for the value of the product to be zero, this essentially checks if there's at least one non-NA value after the current one.
EDIT: If it's impossible for a series in your dataset to reach its maximum before the end, then the solution you found is way simpler. However, the method I'm suggesting is more general since it works whether the series flattens at its maximum, minimum or anywhere in between.
Replace the zeroes.
Use na()
As that is not plotted.
See:
Added benefit: if overlap is set to 100% on the blue series then it looks like the value is "highlighted" in the first series - when discussing data it is a neat way to focus attention.
Edit, this works whether the preceeding values increase or decrease:
So to solve my problem, I created one more table and applied the formula =IF(MAX(B$1:B$8)=H1,NA(),B2). This formula computes the maximum value of the source table column and compares it with the upper immediate value. New table looks like this-
Range of this table is from G1:K8
1-Select a column (or columns) to look for duplicated data.
2-Open the Data tab at the top of the ribbon.
3-Find the Data Tools menu, and click Remove Duplicates.
4-Press the OK button
Is this that you want?
I have this formula on google sheet
VLOOKUP(upper(J2:J),colorState!A:B,{2}*sign(row(J2:J)),FALSE)
and I want it to sort the result ascending automatically when I add new data or edit(like arrayformula)
Is there anyway or any formula to do that? (I know that there's SORT formula but I'm not sure how to use it together)
thanks.
I believe I understand what you need :)
Essentially what I understand is that you would like to recreate the "main" sheet but have it automatically ordered by the 'color' column when new data is added. I don't have any idea how to do this to the raw data but you can mirror the raw data by creating another sheet (name 'mainmirror') and in cell A1 just enter this formula:
=query(main!$A:$R,"select * order by P ASC",-1)
It will take you 2 seconds to reformat with a filter view, and you'll be left with a mirror of 'main' that is always sorted by column P and should remain current as data is added.
Hopefully this is an acceptable workaround. Other option would be to use a script but this is less tedious if it's suitable.
Side note: this method will turn your values into strings to mirror them on the duplicate sheet, so on the 'main' sheet I would recommend changing the cell format of column P to a custom number format, 00, which will ensure there's a leading 0 if there's only one digit. this will cause the strings in the mirror to sort correctly, instead of 1,11,12,2,3,4,etc. If you're expecting column P to have 3 digit value, make the number format 000 accordingly.
EDIT: I have revived the source data source to remove the ambiguity of my last screen shots
I am trying to transpose spreadsheet data where there are many rows where the customer name may be duplicated but each row contains a different product.
For instance
revised original data source
to
revised proposed data format
I would like to do it with formulae if possible as I struggle with VB
Thank you for any help
I realise this is a huge answer, apologies but I wanted to be clear. If you need anything from me, drop me a comment and I'll help out.
Here's the output from my formula:
EDITED ANSWER - Named ranges used for ease of understanding:
These are just an example of a few of the named ranges I have used, you can reference the ranges directly or name them yourself (simplest way is to highlight the data then put the name in the drop down next to the formula bar [top left])
Be wary that as we will be using Array formulas for AccNum and AccType, you will not want to select the entire column and instead opt for either the exact data length or overshoot it by 100 or so. Large array formulas tend to slow down calculation and will calculate every cell individually regardless of it being empty.
First formula
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Account Number ",LEFT((COLUMN(A:A)+1)/2,1)),"")
This formula is identical to the one in the original answer apart form the adjusted heading title.
=IF(Condition,True,False) - There are so many uses for the IF logic, it is the best formula in Excel in my opinion. I have used to IF with COUNTIF to check whether there is more than 0 cells that are more than BLANK (or ""). This is just a trick around using ISBLANK() or other blank identifiers that get confused when formula is present.
If the result is TRUE, I use CONCATENATE(Text1,Text2,etc.) to build a text string for the column header. ROW(1:1) or COLUMN(A:A) is commonly used to initiate an automatically increasing integer for formulas to use based on whether the count increase is required horizontally or vertically. I add 1 to this increasing integer and divide it by 2 so that the increase for each column is 0.5 (1 > 1.5 > 2 > 2.5) I then use LEFT formula to just take the first digit to the left of this decimal answer so the number increases only once every 2 columns.
If the result is FALSE then leave the cell blank ,""). Standard stuff here, no explanation needed.
Second Formula
=CONCATENATE(INDEX(Forename,MATCH(Sheet4!$A2,Reference,0)))
=CONCATENATE(INDEX(Surname,MATCH(Sheet4!$A2,Reference,0)))
CONCATENATE has only been used here to force blank cells to remain blank when pulled by INDEX. INDEX will read blank cells as values and therefore 0's whereas CONCATENATE will read them as text and therefore "".
INDEX(Range,Row,Column): This is a lookup formula that is much more advanced than VLOOKUP or HLOOKUP and not limited in the way that they are.
The range i have used is the expected output range - Forename or Surname
The row is then calculated using MATCH(Criteria,Range,Match Type). Match will look through a range and return the position as an integer where a match occurs. For this I have set the criteria to the unique reference number in column A for that row, the range to the named range Reference and the match type as 0 (1 Less than, 0 Exact Match, -1 Greater than).
I did not define a column number for INDEX as it defaults to the first column and I am only giving it one column of data to output from anyway.
Third Formula
Remember these need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter)
=IFERROR(INDEX(AccNum,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0))),"")
=IFERROR(INDEX(AccType,SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))),"")
As you can see, one of these is used for AccNum and the other for AccType.
IFERROR(Value): The reason that this has been used is that we are not expecting the formula to always return something. When the formula cannot return something or SMALL has run out of matches to go through then an error will occur (usually #VALUE or #NUM!) so i use ,"") to force a blank result instead (again standard stuff).
I have already explained the INDEX formula above so let's just dive in to how I have worked out the rows that match what we are looking for:
SMALL(IF(Reference=Sheet4!$A2,ROW(Reference)-ROW(INDEX(Reference,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0))
The IF statement here is fairly self explanatory but as we have used it as an array formula, it will perform =Sheet4!$A2 which is the unique reference on every cell in the named range Reference individually. In your mock data this returns a result of: {FALSE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} for the first entry (I included titles in the range, hence the initial FALSE). IF will do my row calculation* for every true but leave the FALSEs as they are.
This leaves a result of {FALSE;2;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE} that SMALL(array,k) will use. SMALL will only work on numeric values and will display the 'k'th result. Again the column trick has been used but to cover more ground, I used another method: ROUNDDOWN(Number,digits) as opposed to using LEFT() Digits here means decimal places so I used 0 to round down to a whole integer for the same result. As this copies across the columns like so: 1, 1, 2, 2, 3, 3, SMALL will alternatively (as the formulas alternate) grab the 1st smallest AccNum then the 1st Smallest AccType before grabbing the 2nd AccNum and Acctype and so forth.
*(Row number of the match minus the first row number of the range then plus 1, again fairly common as a foolproof way to always get the correct row regardless of where the data starts; actually as your data starts on row 1 we could just do ROW(Reference) but I left it as is incase you had data in a different format)
ORIGINAL ANSWER - Same logic as above
Here's your solution in 3 parts
Part 1 being a trick for the auto completion of the titles so that they will hide when not used (in case you will just copay and paste values the whole lot to speed up use again).
=IF(COUNTIF(C2:C11,">""")>0,CONCATENATE("Product ",LEFT((COLUMN(A:A)+1)/2,1)),"") in C
=IF(COUNTIF(D2:D11,">""")>0,CONCATENATE("Prod code ",LEFT((COLUMN(B:B)+1)/2,1)),"") in D
Highlight both of the cells and drag across to stagger the outputs "Product " and "Prod code "
Part 2 would be inputting the unique IDs to the new sheet, I would suggest copying your entire column A across to a new sheet and using DATA > REMOVE DUPLICATES > Continue with current selection to trim out the multiple occurrences of unique IDs.
In column B use =INDEX(Sheet2!$B$1:$B$7,MATCH(Sheet4!$A2,Sheet2!$A$1:$A$7,0)) to get the names pulled across.
Part 3, the INDEX
Once again, we are doing a staggered input here before copying the formula across the page to cover the entirety of the data.
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(A:A)+1)/2,0)),1),"") in C
=IFERROR(INDEX(Sheet2!$C$1:$D$11,SMALL(IF(Sheet2!$A$1:$A$11=Sheet4!$A2,ROW(Sheet2!$A$1:$A$11)-ROW(INDEX(Sheet2!$A$1:$A$11,1,1))+1),ROUNDDOWN((COLUMN(B:B)+1)/2,0)),2),"") in D
The formulas of Part 3 will need to be entered as an array (when in the formula bar hit Ctrl+Shift+Enter) . This will need to be done before copying the formulas across.
These formulas can now be dragged / copied in all directions and will feed off of the unique ID in column A.
My Answer is already rather long so I haven't gone on to break the formula down. If you have any trouble understanding how this works, let me know and I will be happy to write up a quick guide, breaking it down chunk by chunk for you.
i am looking at cells A2:A201 for a value of 1. create an array list of all items that contain a 1 with the contents of cell E and B of each row.
the way i am currently using is not really a clean version so im looking to simplify
=INDEX(tbl,SMALL(IF((INDEX(tbl,,12,1)<=1)*(INDEX(tbl,,12,1)>=1),ROW(tbl)-MIN(ROW(tbl))+1,""),ROWS($T$14:T14)),,1)
while that works, i am trying to simplify and make it look better to use more often without creating multiple tables.
also, i am using the above to also get the positions (cell e) of players (cell b). the output is scrambled order so i would really love to put them in order for sports. PG Name, PG Name, SG Name, SG name, SF name, SF Name, etc etc.
If you are looking to visually simplify the formula the single biggest portion that can be eliminated is ROW(tbl)-MIN(ROW(tbl))+1. This produces the position within tbl and not the actual row on the worksheet. By starting tbl in the second row, you have to make this calculation. If you simply started tbl at row 1, then this becomes ROW(tbl) because the rows within tbl will always match the rows on the worksheet and do not have to be adjusted due to a different starting position.
In order to get the second, third, etc match from the column of ones and blanks, you need to provide cyclic calculation. Whether you use an array formula or a function that provides array-like calculation, you will want to but down the number of rows processed to the minimum required for processing.
You could do this with a named range (e.g. tbl) and then you have to slice off columns for individual column lookups. You also have to maintain and resize tbl if the data changes. This can be avoided by defining tbl with a formula that reduces the rows to the last number found in column A.
=Sheet1!$A$1:INDEX(Sheet1!$E:$E, MATCH(1e99, Sheet1!$A:$A))
When dealing with columns within tbl, you can 'slice' off a column using the INDEX function. Leave the row_num blank (or 0) to indicate all rows and provide the column_num you want to deal with.
The newer AGGREGATE¹ function can make quick work of retrieving the first, second, third, etc. matches to one or more columns. Use the SMALL sub-function (i.e. 15) and force and non-matches into a $DIV/0! error state while using the 6 option to ignore errors.
Sorting the data 'on-the-fly' during retrieval is marginally possible but the better solutions involve a 'helper' column.
¹ The AGGREGATE function was introduced with Excel 2010. It is not available in earlier versions.
I am using Excel::Writer::XLSX to create an Excel file from an array of arrays. Right now I'm trying to create a formatted table from the data (as much as I can, as opposed to just spitting it back into another file).
First off, when I use set_column() to set the background color, that color is formatted for the entire column. Is there a way to specify to only go as far as the content in the file goes? Unfortunately, when the program is run it is dynamic each time and unknown what the final row in the table should be.
Second, is there a way to merge cells based on the content inside of them? This has to do with the dynamic problem again, there is an optimal output if all the data I am gathering is online. If that were the case I could easily set a range of what these merged cells should be. But for example, if I have 10 rows of column 2 saying 'A' and then 10 rows of column 2 saying 'B', I would like to merge the A's and B's together. The issue is that is is unknown if it will always have 10 rows with that value inside of it.
Thanks for your input!
First off, when I use set_column() to set the background color, that color is formatted for the entire column. Is there a way to specify to only go as far as the content in the file goes?
No. You will have to have to add the format to the cells as you write them.
But for example, if I have 10 rows of column 2 saying 'A' and then 10 rows of column 2 saying 'B', I would like to merge the A's and B's together.
This isn't possible with Excel::Writer::XLSX. (In fact I don't think it is possible in Excel without using macros).
Since both of your issues relate to not knowing the size and value of the data beforehand then perhaps you could first read your data into an array of arrays, process it to find the required format dimensions and merge ranges and then write them out.