Restructure Data in SPSS - error message - statistics

I want to reshape my long format data into a wide format. First I aggregated the data for later analysis, then I used the restructure command, with Subject_Nr as identifier variable and Switch_Type as index. My dependent variables are Reaction times and accuracy. I'd like to restructure the data so that every row is one Subject (in the long version, one row is one trial) so that I can use a within subject design. However, I keep running into the problem that at the end of the procedure, SPSS shows me the error message that sets from the original data will still be in used in the restructured data. I should use the "Use sets" dialogue. When I use this dialogue and select only the newly created variables, my dataset is empty.
I am kind of desperate at this moment, because I spent already several hours to fix this issue without success. Has anyone a clue what I am doing wrong?
this is how the data looks like after the restructuring. as you can see, it is still in the long format and not in the wide format
here is some syntax:
SORT CASES BY Subj_Nr Swicht_Type.
CASESTOVARS
/ID=Subj_Nr
/INDEX=Swicht_Type
/GROUPBY=INDEX.

If Swicht_Type is an index within ID=Subj_Nr then you would need Swicht_Type to be unique within each Subj_Nr (which doesn't seem to be the case with the data you present going by the print screen). And if that is not the cases then you would receive an error suggesting so.
Perhaps the two demonstrations (the first setup to fail deliberately) may shed some light to how the CASESTOVARS command works and help you to revise you code or setup accordingly.
DATA LIST LIST / Dim1 (A1) Month (F1.0) Measure (F8.0).
BEGIN DATA.
A 1 50
A 2 40
A 3 20
A 1 56
A 2 86
A 1 45
B 2 68
B 3 58
B 1 57
END DATA.
DATASET NAME DSRaw.
SORT CASES BY Dim1 .
COMPUTE Month2=Month.
CASESTOVARS /ID=Dim1 /AUTOFIX=NO /INDEX=Month.
Which results in an error due to repeat of Month 1 in case row number 4.
If however you do in fact have unique index values within IDs then CASESTOVARS works as expected:
DATA LIST LIST / Dim1 (A1) Month (F1.0) Measure (F8.0).
BEGIN DATA.
A 1 50
A 2 40
A 3 20
B 2 68
B 3 58
B 1 57
END DATA.
DATASET NAME DSRaw.
SORT CASES BY Dim1 .
COMPUTE Month2=Month.
CASESTOVARS /ID=Dim1 /AUTOFIX=NO /INDEX=Month.
Which gives you this:

Related

Permutations of two lists with each three different outcomes (32,000 combinations)

There are 50 Success Criteria (“requirements”) broken into two levels: Single-A (with 25 requirements) and Double-A (with 12 requirements). Using the philosophy of the distributive property, I need to create a sort of permutated list of all possible combinations from these two levels. The trouble I’m running into, though, there are various make ups of the levels themselves against one of three conformance levels.
A reviewer will go through each of the Success Criteria to fill out a VPAT. A VPAT will have the 50 Success Criteria listed out and my reviewer will look at the product, and based on the given success criteria, give it a result of “Does Not Support”, “Partially Supports”, or “Supports”. Each line can have only one result.
So, a completed review could look like this:
Requirement #
Level
Status
1
Single-A
SUPPORTS
2
Single-A
SUPPORTS
3
Single-A
PARTIALLY SUPPORTS
4
Single-A
DOES NOT SUPPORT
5
Single-A
DOES NOT SUPPORT
11
Double-A
DOES NOT SUPPORT
12
Double-A
PARTIALLY SUPPORTS
13
Double-A
SUPPORTS
14
Double-A
PARTIALLY SUPPORTS
The final tally, is what I’m trying to summarize (tab “Intended Result”). Every VPAT should output a pivoted result like this:
Single A Fail
Single A Partial
Single A Pass
Double A Fail
Double A Partial
Double A Pass
0
0
25
0
0
12
Here’s my problem. There are 351 permutations for the Single A list and 91 for the Double A list. I’ve already mapped those out manually in the given tabs. Now, I need to permutate both lists by their three dimensions (nearly 32,000 possibilities), but I can’t figure it out with the several dimensions. Here’s the “distributive property”: From the Single-A list, row #1, I need to line up with all 91 of the Double-A list. Then, do it again for line #2 to all 91 lines. And on and on. See example here.
Can anyone help me with a formula that might be able to accomplish this without me copying 91 lines 361 times and doing a bunch of “fill downs”? Attached, as well, is my workbook so far.
I've tried the various permutation formulas, but I'm getting the "number" of combinations that are possible, not the actual results. I'd like to list out every possible scenario for both lists.
I don't know if it's helpful. I think solved it through Google Sheets, through this formula:
=REDUCE({SINGLE_A!A1:D1,DOUBLE_A!A1:D1},SEQUENCE(COUNTA(SINGLE_A!A2:A)),LAMBDA(a,b,{a;
SCAN(,SEQUENCE(COUNTA(DOUBLE_A!A2:A)),LAMBDA(c,d,{INDEX(SINGLE_A!A2:D,b),INDEX(DOUBLE_A!A2:D,d)}))}))
The problem is can't be replicated in Excel... but if you only need the values, you can download it as an .XLSX and it will keep the values without the formula.
As you can see, there are 31491 combinations. Here is the link
Permutate Rows of Two Tables
Feel free to download the file from my Google Drive.
LAMBDA
=LAMBDA(Table1,Table2,LET(Data1,DROP(Table1,1),Data2,DROP(Table2,1),
rCount1,ROWS(Data1),cCount1,COLUMNS(Data1),rCount2,ROWS(Data2),cCount2,COLUMNS(Data2),rCount,rCount1*rCount2,
Result1,WRAPCOLS(INDEX(TOCOL(Data1,,1),ROUNDUP(SEQUENCE(rCount*cCount1,,1)/rCount2,0)),rCount),
Result2,WRAPROWS(INDEX(TOCOL(Data2),MOD(SEQUENCE(rCount*cCount2,,1)-1,rCount2*cCount2)+1),cCount2),
VSTACK(HSTACK(TAKE(Table1,1),TAKE(Table2,1)),HSTACK(Result1,Result2))))(A1:C4,E1:G5)
PermutRows Function (Formulas->Name Manager->New)
=LAMBDA(Table1,Table2,LET(Data1,DROP(Table1,1),Data2,DROP(Table2,1),
rCount1,ROWS(Data1),cCount1,COLUMNS(Data1),rCount2,ROWS(Data2),cCount2,COLUMNS(Data2),rCount,rCount1*rCount2,
Result1,WRAPCOLS(INDEX(TOCOL(Data1,,1),ROUNDUP(SEQUENCE(rCount*cCount1,,1)/rCount2,0)),rCount),
Result2,WRAPROWS(INDEX(TOCOL(Data2),MOD(SEQUENCE(rCount*cCount2,,1)-1,rCount2*cCount2)+1),cCount2),
VSTACK(HSTACK(TAKE(Table1,1),TAKE(Table2,1)),HSTACK(Result1,Result2))))
=PermutRows(A1:C4,E1:G5)
Using Your Tables
=PermutRows(Table2[[#All];[A_FAIL]:[A_PASS]];Table1[[#All];[AA_FAIL]:[AA_PASS]])
Helper Formulas
J2: =TOCOL(A2:C4,,1)
K2: =ROUNDUP(SEQUENCE(36,,1)/4,0)
L2: =INDEX(J2#,K2#)
M2: =WRAPCOLS(L2#,12)
Q2: =TOCOL(E2:G5,,0)
R2: =MOD(SEQUENCE(36,,1)-1,12)+1
S2: =INDEX(Q2#,R2#)
T2: =WRAPROWS(S2#,3)
X2: =HSTACK(M2#,T2#)

Excel 2013, multiple search criteria to back fill empty fields, large database

item
countDate
shopID
lineNumber
defectNumber
jobID
jobID2
123A1234-123
1/1/2022
1234
1
123A1234-123
1/2/2022
1234
2
123A1234-123-AB123
1/1/2022
AB12C
1234
0
IP-0000123456
IP-000ABC0123
1/1/2022
AB12C
1234
1
IP-000ABC0123
Above is an example of the type of data I usually get, limited to only those fields that are relevant.
As you can see there are three instances of item 123A1234-123, 2 of which have no Job ID. I am trying to find a way to fill those blank spaces with the Job ID from the defect number 0 instance of the item.
The problem I am encountering is that the item code is longer on that instance. There is also the fact that listings in the item field are of several different formats, though they are mostly uniform in in format for the ones I am trying to fix.
Previous attempts at building an equation to fill the new column jobID2 result in value errors.
[#jobID2] = INDEX([jobID],MATCH(1,[countDate]=[#countDate])*([lineNumber]=[#lineNumber])*([item]=LEFT([#item],SEARCH(CHAR(164), SUBSTITUTE([#item],"-",CHAR(164),2))-1)),0))
This equation was written before I found an example in the data that didn't share the same count date.
I had help writing this but that person never replied to my follow up questions.
From what I can work out it is supposed to replace the second - with a special character and then search for that special character before returning the jobID. I think the problem is partially that not all the data matches this format, but I'm not that versed in using LEFT(SEARCH(
I am running Excel 2013 on a (8gb ram, windows 10 enterprise 64 bit, 3.1GHZ i5) machine. I've killed it a couple of times trying to run big equations on some of the data sheets so I'm unsure how complex the equation or script can be.
Any help I can get is very much appreciated.
Edit:
Current state of the equation, returns N/A on all data entries.
[#jobID2] =INDEX([jobID],MATCH(1,(([#[defectNumber]] = 0)*([lineNumber]=[#lineNumber])*(LEFT([#item],12)=LEFT([item],12))),0))
The problem I am seeing now is that
it doesn't return a row number, match just spits out 1 aka true
it doesn't look like each row is searching the whole database, just its own row
Looked up a different formula
[#jobID2] = VLOOKUP(1,CHOOSE({1,2},(LEFT([#item],12)=LEFT([item],12))*([#lineNumber]=[lineNumber])*(""<>[shopID]),[shopID]),2,0)
This returns n/a for all entries except for the ones where shopID <> 0 where it returns a 0
So... progress

Doing VLOOKUP on "Nth" value using LARGE function in Excel

I am working on an Excel file which compiles client data to group them by region/city, using a COUNTIFS function, as:
60 Ottawa
10 Otterburn Park
14 Outremont
40 Philipsburg
59 Pierrefonds
59 Pincourt
...
I would then like to use a combined VLOOKUP + LARGE function to determine the 10 or 15 cities where most of our clients are. Naturally I tried something along the lines of:
1st value
=VLOOKUP(MAX(Lists!$R:$R),Lists!$R:$S,1,FALSE)
2nd value
=VLOOKUP(MAX(Lists!$R:$R),Lists!$R:$S,2,FALSE)
etc.
However in this example, the first entry with a 59 count (Pierrefonds) keeps appearing and I am unable to have entry "Pincourt" displayed using this method. What am I doing wrong. Should I go about this a different way?
Thank you!
Use this formula, pay attention to the $O$1:O1 reference. This needs to be placed in at least the second row, with the $O$1:O1 reference pointed at the cell above.
Put in first cell make sure the references are correct and copy/drag down
=IFERROR(INDEX(S:S,AGGREGATE(15,6,ROW($R$1:INDEX(R:R,MATCH(1E+99,R:R)))/(($R$1:INDEX(R:R,MATCH(1E+99,R:R))=AGGREGATE(14,6,$R$1:INDEX(R:R,MATCH(1E+99,R:R)),ROW(1:1)))*(COUNTIF($O$1:O1,$S$1:INDEX(S:S,MATCH(1E+99,R:R)))=0)),1)),"")

Rank the top 5 entries in different criteria

I have a table that I want to find the top X people in each of the different groups.
Unique Names Number Group
a 30 1
b 4 2
c 19 3
d 40 2
e 1 1
f 9 2
g 15 3
I've ranked the top 5 people by number by using =index($A$2:$A$8,match(large($B$2:$B$8,1),$B$2:$B$8,0)). The 1 in the LARGE function I linked to a ranked range so that when I dragged down it changed up the number.
What I would like to do next is rank the top x number of people in each group. So top 3 in group 1.
I tried =index($A$2:$A$8,match("1"&large($B$2:$B$8,1),$C$2:$C$8&$B$2:$B$8,0)) but it didn't seem to work.
Thanks
EDIT: After looking at the answers below I have realised why they are not working for me. My actual data that I want to use the formula with have multiple entries of numbers. I have adjusted the example data to show this. The problem I have is that if there are duplicate numbers then it returns both of the names even if one is not in the group.
Unique Names Number Group
a 30 1
b 30 2
c 19 3
d 40 2
e 1 1
f 30 2
g 15 3
Proof of Concept
Use the following formula in the example above in cell F2 and copy down and to the right as needed.
=IFERROR(INDEX($A$2:$A$8,MATCH(AGGREGATE(14,6,($C$2:$C$8=F$1)*($B$2:$B$8),ROW($A2)-1),$B$2:$B$8,0)),"")
In the header row provide the group numbers. or come up with a formula to augment and reset the group number as you copy down based on your X number in your question.
Explanation:
The AGGREGATE function unlike the large function is an array function without the need to use CSE. As such we can add criteria to what we want to use. In this case only 1 criteria was used and that was the group number. in the formula it was the following part:
($C$2:$C$8=F$1)
If there were multiple criteria we would use either an + operator as an OR or we would use an * operator as an AND.
The 6 option in the aggregate function allows us to ignore errors. This is useful when trying to get the small. It is also useful for dealing with other information that may cause errors that do not need to be worried about.
As this is technically an array operation avoid using full column/row references as they can bog down your system.
The basics of what the over all formula is doing is building a list that match the group number you are interested in. After filtering your numbers, it then determines which is the largest, second largest etc by what row you have copied down to. It then determine what row the nth largest number occurs in through the match function, and finally it returns to the corresponding name to that row with the index function.
Building on all the other great answers.
Because you have the possibilities of duplicate values in each group we need to do this with two formulas.
First we need to get the numbers in order. I used the Aggregate, but this could be done with the array LARGE(IF()) also:
=IFERROR(AGGREGATE(14,6,$B$2:$B$8/($C$2:$C$8=E$1),ROW(1:1)),"")
Then using that number and order we can reference, we can use a modified version of #ForwardEd's formula, using COUNTIF() to ensure we get the correct name in return.
=IFERROR(INDEX($A$2:$A$8,AGGREGATE(15,6,(ROW($B$2:$B$8)-ROW($B$2)+1)/(($C$2:$C$8=F$1)*($B$2:$B$8=E3)),COUNTIF(E$2:E2,E3)+1)),"")
This will count the number in the results returned and then bring in the correct name.
You could also solve this with array formulas - to filter a group whose name is stored in E1, your code
=INDEX($A$2:$A$8,MATCH(LARGE($B$2:$B$8,1),$B$2:$B$8,0))
would then be adapted to
=INDEX($A$2:$A$8,MATCH(LARGE(IF($C$2:$C$8<>E1,-1,$B$2:$B$8),1),$B$2:$B$8,0))
Note: After entering an array formula, you have press CTRL+SHIFT+ENTER.
Thank you to everyone who offered help but for some reason none of your methods worked for me, which I am sure was to do with the quality of my data. I used an alternate method in the end which is slightly convoluted but seemed to work.
=IF($C2="1",RANK($B2,$B$2:$B$8,1)+ROW()/10000,-1)
Essentially using the rank function and adding a fraction to separate out duplicate values.

Give warning when duplicate values are placed in different keywords

Let's say I have a sheet with multiple columns but to make the example I only put 2.
What I want to achieve is to give the user a warning when the same person get's placed in different teams. (It's possible to have duplicate persons, but it's not allowed to have them in seperate teams)
First I thought I'll filter it so I can only see the duplicate and then check if the same person get's placed in 2 different teams. But now I see it's not possible to filter the duplicates. Then I thought using conditional formatting and check if the cell.color.interior is changed but I noticed it does not change it! Solutions provided on stackoverflow does not suffice for me. Neither do I want to a pivottable or an extra column since my sheet is already overcrowded.
Example:
Value A Value B
Tom Team 1
Ben Team 1
Tom Team 1 <- possible
Elle Team 2
Tom Team 2 <- not possible, give warning!
Rick Team 2
And the list goes on.
Does someone know I can give the user a warning when placing the same person in different teams?
Or how to get to see the duplicate values in the sheet or get it in a range in vba?
Thanks!
Okay then, since you cant sort the table it requires a rather more brutal or complicated approach.
1) First a simple, quick approach
Although you could still add a temporary column, number it, sort the sheet based on the Value A field, then just iterate through the cells, looking for instances where
Cells(curRow, rowNumOfValueA).value = Cells(curRow-1, rowNumOfValueA).value
When this happens, check that the value in column 'ValueB' is the same for each item. If so, continue onto the next row. If not, mark the row in some way - setting the interior colour is often a good way to do it. You can then sort the table according to the added column, delete the column and then take action.
2) Second a brute-force
Starting at the first row, get the value in ValueA and ValueB, call them for example findMeA and findMeB
a) starting at the following row, check to see if ValueA matches findMeA. If it doesn't, move on to the next row. If it does match, ensure that ValueB matches findMeB too. if not, mark the row or add its index to an array for reporting when the sanity-check is done.
b) move to the next row, if it's empty, return to (2). If it's not, return to (a)
As you can probably see, the brute force approach gets exceedigly nasty cpu-wise as the number of rows goes up. With just 10 rows of data, it's 9 + 8 + 7 + 6 + 5 + 4 + 3 + 2 + 1 searches that need to be done, unless you make a mistake of logic and check every row against every row. Just 10 rows would be 100 checks. - 10^2
If, on the other hand, you sort the table first - you can get away with just N checks for every N rows. 10 rows takes 10 checks, 1000 rows takes 1000 checks. (I can't be bothered to find out how to easily calculate 1000 + 999 + 998 + .... + 1, but it's huge. Actually, it's about 500,000 (1000 + 999+1 + 998+2 + 997+3 ... 499+501 + 500). - that is 1000 + 499*1000 + 500
So, the brute force takes exponentially longer as the list grows, but the sort-first method takes linear time. Twice as big equals twice as long.
Sorted method: 10 items - 10 checks. 1000 items 1000 checks.
Brute Force: 10 items - 55 checks. 1000 items 500,500 checks.
I.e at worst case, brute force takes 500 times as long as a sorted approach. If it was only a few items, I'd just suggest brute-forcing it. Over half a million checks with VBA doesn't seem so pleasant though, so I'd sort the sheet, or at the very least - make a copy and sort then operate on that, before reporting the problems with the actual sheet.

Resources