Creating a Results Grid from a Pandas DataFrame - python-3.x

I have the following table of results in a Pandas DataFrame. Each player has been assigned an ID number:
+----------------+----------------+-------------+-------------+
| Home Player ID | Away Player ID | Home Points | Away Points |
+----------------+----------------+-------------+-------------+
| 1 | 2 | 3 | 0 |
| 3 | 4 | 1 | 1 |
| 2 | 3 | 3 | 0 |
| 4 | 1 | 3 | 0 |
| 2 | 4 | 1 | 1 |
| 3 | 1 | 1 | 1 |
| 2 | 1 | 0 | 3 |
| 4 | 3 | 1 | 1 |
| 3 | 2 | 0 | 3 |
| 1 | 4 | 0 | 3 |
| 4 | 2 | 1 | 1 |
| 1 | 3 | 1 | 1 |
+----------------+----------------+-------------+-------------+
The aim is to create a 4x4 numpy matrix (dimensions equal to the number of players) and fill the matrix with the points they earned from games between the respective players.
The matrix should end up like this:
+--------+---+---+---+---+
| Matrix | 1 | 2 | 3 | 4 |
+--------+---+---+---+---+
| 1 | 0 | 3 | 1 | 0 |
| 2 | 0 | 0 | 3 | 1 |
| 3 | 1 | 0 | 0 | 1 |
| 4 | 3 | 1 | 1 | 0 |
+--------+---+---+---+---+
The left hand column is the ID number of the home players, with the column headers the IDs of the away players.
For example, when the Home Player ID = 1 and the Away Player ID = 2, Player 1 earned 3 points, so the entry for the Matrix(1,2) (or 0,1 because of the zero indexing) would equal 3.
I can just about manage to do this with two for loops, but it seems quite inefficient. Is there a better way to achieve this?
Would really appreciate any advice!

Use
In [217]: df.pivot_table(columns='Home Player ID', index='Away Player ID',
values='Away Points', fill_value=0)
Out[217]:
Home Player ID 1 2 3 4
Away Player ID
1 0 3 1 0
2 0 0 3 1
3 1 0 0 1
4 3 1 1 0
Or use
In [221]: df.set_index(['Away Player ID', 'Home Player ID'])['Away Points'].unstack(fill_value=0)
Out[221]:
Home Player ID 1 2 3 4
Away Player ID
1 0 3 1 0
2 0 0 3 1
3 1 0 0 1
4 3 1 1 0

Related

How to copy data within case-set in case control study using SPSS?

I'm doing a case-control study about ovarian cancer. I want to do stratified analyses for the different histotypes but haven't found a good way of doing it in SPSS. I was thinking about copying the information about the diagnoses from the cases to the controls, but I don't know the proper syntax to do it.
So - what I want to do is to find the diagnosis within the case-control pair, copy it, and paste it into the same variable for all the controls within that pair. Does anyone know a good way to do this?
ID = unique ID for the individual, casecontrol = 1 for case, 0 for control, caseset = stratum, ID for each matched group of individuals.
My dataset looks like this:
ID | casecontrol | caseset | diagnosis
1 | 1 | 1 | 1
2 | 0 | 1 | 0
3 | 0 | 1 | 0
4 | 0 | 1 | 0
5 | 1 | 2 | 3
6 | 0 | 2 | 0
7 | 0 | 2 | 0
8 | 0 | 2 | 0
And I want it to look like this:
ID | casecontrol | caseset | diagnosis
1 | 1 | 1 | 1
2 | 0 | 1 | 1
3 | 0 | 1 | 1
4 | 0 | 1 | 1
5 | 1 | 2 | 3
6 | 0 | 2 | 3
7 | 0 | 2 | 3
8 | 0 | 2 | 3
Thank you very much.
According to your example, in each value of caseset you have one line with diagnosis equals some positive number, and in the rest of the lines diagnosis equals zero (or is missing?).
If this is true, all you need to do is this:
aggregate out=* mode=add overwrite=yes /break=caseset /diagnosis=max(diagnosis).
The above command will overwrite the original data, so make sure you have that data backed up, or use a different name for the aggregated data (eg /FullDiagnosis=max(diagnosis) .

Graphical representation of a puzzle

Just for fun, I have written a solver for str8ts puzzles. While dealing with the REPL representation of a puzzle is okay for me, e.g.
STR8TS> (solve-puzzle #p"puzzles/2019-02-04-hard")
Initial puzzle:
-----------------------------------------------------
| -7 | -9 | 0 | 0 | 10 | 0 | 0 | 0 | 10 |
| 3 | 0 | 6 | 0 | 0 | 0 | 0 | 0 | 10 |
| 0 | 0 | 10 | 0 | 0 | 10 | 10 | 0 | 0 |
| 0 | 1 | 0 | 10 | 10 | 0 | 0 | 5 | 0 |
| 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10 |
| 0 | 0 | 0 | 0 | -6 | 10 | 0 | 9 | 0 |
| 0 | 0 | 10 | 10 | 0 | 0 | -2 | 0 | 0 |
| 10 | 0 | 9 | 0 | 0 | 5 | 0 | 0 | 0 |
| -4 | 0 | 0 | 0 | 10 | 0 | 0 | -1 | -3 |
-----------------------------------------------------
Final state:
-----------------------------------------------------
| -7 | -9 | 5 | 6 | 10 | 2 | 3 | 4 | 10 |
| 3 | 8 | 6 | 5 | 7 | 1 | 4 | 2 | 10 |
| 1 | 2 | 10 | 7 | 8 | 10 | 10 | 6 | 5 |
| 2 | 1 | 3 | 10 | 10 | 7 | 8 | 5 | 6 |
| 10 | 6 | 4 | 3 | 5 | 8 | 9 | 7 | 10 |
| 5 | 3 | 2 | 4 | -6 | 10 | 7 | 9 | 8 |
| 6 | 5 | 10 | 10 | 3 | 4 | -2 | 8 | 9 |
| 10 | 4 | 9 | 8 | 2 | 5 | 6 | 3 | 7 |
| -4 | 7 | 8 | 9 | 10 | 6 | 5 | -1 | -3 |
-----------------------------------------------------
Puzzle solved in 4.168 seconds.
I was wondering what could be a more elegant way to /draw/ the puzzle. The puzzle is stored in a two-dimensional array and 10 and negative number should be black fields.
Is there a library which allows for the generation of a simple png or svg file of the puzzle grid in b/w and the numbers as text?
I use Vecto for things like that. It's fairly low-level (kind of like writing PostScript code), but lets you draw stuff like the Movie Charts, so it's a matter of planning and practice to make what you like.

Assigning ranks to items that vary in order

I am trying to build a dataset from an online questionnaire. In this questionnaire, participants were asked to name 6 items. These items are represented with numbers from 1 to 6 (order of mention does not matter). Afterwards, participants were asked to rank those items from most important to least important (order here matters). Right now I have three columns "Named items", "Item ranked" and "Rank." The last column represents the position at which each case was ranked at. Thus, the idea would be to look at the number in the first column "Named item" and search for its position on the second column "Items Ranked" and return its position to the third column corresponding row.
Since the numbers go from 1 to 6, every six rows the process has to start again on the 7th row. I have a total of 186 participants, which means there's a total of 1116 items. What would be the most efficient way of doing this and preventing human error?
Here is an example of how the sheet looks like done manually:
+----------------------+-----------------------------+------+
| Order of named items | Items ranked (# = Identity) | Rank |
+----------------------+-----------------------------+------+
| 1 | 2 | 4 |
| 2 | 5 | 1 |
| 3 | 6 | 6 |
| 4 | 1 | 5 |
| 5 | 4 | 2 |
| 6 | 3 | 3 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 3 | 3 |
| 4 | 4 | 4 |
| 5 | 5 | 5 |
| 6 | 6 | 6 |
| 1 | 5 | 3 |
| 2 | 6 | 4 |
| 3 | 1 | 5 |
| 4 | 2 | 6 |
| 5 | 3 | 1 |
| 6 | 4 | 2 |
| 1 | 2 | 2 |
| 2 | 1 | 1 |
| 3 | 6 | 4 |
| 4 | 3 | 5 |
| 5 | 4 | 6 |
| 6 | 5 | 3 |
+----------------------+-----------------------------+------+
You can use this non volatile function:
=MATCH(A2,INDEX(B:B,INT((ROW(1:1)-1)/6)*6+2):INDEX(B:B,INT((ROW(1:1)-1)/6)*6+7),0)
Assuming 1st column starts at A2 and second column at B2 use this formula in C2 copied down
=MATCH(A2,OFFSET(B$2,6*INT((ROWS(C$2:C2)-1)/6),0,6),0)
OFFSET returns the 6 cell range required and MATCH finds the position of the relevant item within that
See screenshot below

Multiple Lookup Criteria

I have this data below in Excel. What I want is to return the No.of Inactive months and the Inactive months themselves.
ACTIVITY MONTH
Jan17 Feb17 Mar17 Apr17 Reg Month No.Inactive months Months Inactive
User ID
1 5 38 0 60 Jan17
2 0 242 203 20 Feb17
3 30 0 0 30 Jan17
4 0 0 0 40 Apr17
5 0 0 16 0 Mar17
To count the inactive months you can use the following.
+---+------+--------+--------+--------+--------+--+-----------------+
| | A | B | C | D | E | F| G |
+---+------+--------+--------+--------+--------+--+-----------------+
| 1 | User | Jan 17 | feb-17 | mar-17 | apr-17 | | Inactive months |
| 2 | 1 | 5 | 38 | 0 | 60 | | 1 |
| 3 | 2 | 0 | 242 | 203 | 20 | | 1 |
| 4 | 3 | 30 | 0 | 0 | 30 | | 2 |
| 5 | 4 | 0 | 0 | 0 | 40 | | 3 |
| 6 | 5 | 0 | 0 | 16 | 0 | | 3 |
+---+------+--------+--------+--------+--------+--+-----------------+
where in cell G2 the is this formula =COUNTIF(B2:E2,0)
To show the list of inactive months it's a little bit harder.
The point is that you have to explain how you want to see these results.
The easier way is to use the conditional formatting anc color the cell with zero (but this is not so useful). Others way could be to traspose the table and filter the column with zero. Another one could be to use a VBA macro....

How to get the cell with the highest number, work with it, get the next highest and so on in excel?

I'm trying to get a cell with value BBBBBBBGGGGGJJJJCCCCDDDDAA from these cells:
-----------------------------------------
| 2 | 7 | 4 | 4 | 0 | 0 | 5 | 0 | 0 | 4 |
-----------------------------------------
So it gets the highest value and writes the cell's horizontal address (that might have an offset) that many times. Then gets the next highest and does the same thing until it reaches the zeroes. Is that possible in excel?
additional samples:
------------------------------------------------------------------------------------
| 2 | 0 | 0 | 3 | 0 | 0 | 5 | 0 | 0 | 0 | GGGGGDDDAA |
------------------------------------------------------------------------------------
| 0 | 0 | 2 | 0 | 0 | 0 | 5 | 0 | 0 | 0 | GGGGGCC |
------------------------------------------------------------------------------------
| 0 | 7 | 2 | 2 | 4 | 3 | 3 | 0 | 0 | 0 | BBBBBBBEEEEFFFGGGCCDD |
------------------------------------------------------------------------------------
| 4 | 7 | 0 | 7 | 7 | 0 | 0 | 0 | 8 | 7 | IIIIIIIIBBBBBBBDDDDDDDEEEEEEEJJJJJJJAAAA |
------------------------------------------------------------------------------------
| 0 | 2 | 0 | 2 | 8 | 0 | 8 | 0 | 7 | 10| JJJJJJJJJJEEEEEEEEGGGGGGGGIIIIIIIBBDD |
------------------------------------------------------------------------------------

Resources