Best technique to convert to panel data - excel

I have some returns data on 1000+ firms that I want to convert into panel form.
From my understanding, it is neither truly wide nor long form (at least from the examples I've seen).
I have attached an example of the original data set and what I want it to look like. Is there a way to achieve this? I am intermediate with Excel/VBA, and new to SAS/Stata but can use them and self-teach myself.

Consider this example using reshape in Stata:
clear *
input float(date FIRM_A FIRM_B FIRM_C FIRM_D)
1 .14304407 .8583148 .3699433 .7310092
2 .34405795 .9531917 .6376472 .2895169
3 .04766626 .6588161 .6988417 .5564945
4 .21615694 .18380463 .4781089 .3058527
5 .709911 .85116 .14080866 .10687433
6 .3805699 .070911616 .55129284 .8039169
7 .1680727 .7267236 .1779183 .51454383
8 .3610604 .1578059 .15383714 .9001798
9 .7081585 .9755411 .28951603 .20034006
10 .27780765 .8351805 .04982195 .3929535
end
reshape long FIRM_, i(date) j(Firm_ID) string
rename FIRM_ return
replace Firm_ID = "Firm " + Firm_ID
list in 1/8, sepby(date)
+---------------------------+
| date Firm_ID return |
|---------------------------|
1. | 1 Firm A .1430441 |
2. | 1 Firm B .8583148 |
3. | 1 Firm C .3699433 |
4. | 1 Firm D .7310092 |
|---------------------------|
5. | 2 Firm A .3440579 |
6. | 2 Firm B .9531917 |
7. | 2 Firm C .6376472 |
8. | 2 Firm D .2895169 |
+---------------------------+
see help reshape for more on the topic.

This can be done very easily with proc transpose in SAS. All you will need to add is a column name for column A. This will be your by variable so that the following variables will be transposed along each specific date. Other than that just make sure your data is sorted by the date column. The code would look similar to this:
proc sort data=have;
by date;
run;
proc transpose data=have out=want; /* you could add a name= or prefix= statement here to rename your variables */
by date;
run;

Related

Excel - Check if Value of Two Columns are new in a List and Passing a new ID to them

Good Morning,
I'm trying to formulate something in Excel that allow us to check if the value of two columns are new in a list, and if so, assign a new ID for them. If it's not, let it "Blank" or Assign the same ID that have been assigned before(Either way would work for me).
I'm trying to use something with Count.if, but it doesn't fit. As i'm thinking about this for some time, i decided to look for help.
What i want to do is a formula that solves the "Formula" Columns below:
Space|Name|*Formula*
1 | AB | 1
1 | AB | 1
1 | AB | 1
1 | CA | 2
2 | DD | 3
2 | EE | 4
2 | EE | 4
3 | SS | 5
3 | SS | 5
1 | ZZ | 6
1 | AB | 1
Sequential Numbering of Groups of Data
In cell C2 use the following array formula (Ctrl,Shift+Enter):
=IF(COUNTIFS(A$2:A2,A2,B$2:B2,B2)=1,MAX(C$1:C1)+1,
INDEX(C$1:C1,MATCH(1,(A$1:A1=A2)*(B$1:B1=B2),0)))
Then copy C2 and pastedown from C3 to the last cell.
If you're satisfied with just numbering each first occurrence then use the following formula:
=IF(COUNTIFS(A$2:A2,A2,B$2:B2,B2)=1,MAX(C$1:C1)+1,"")
Both solutions use the headers i.e. the headers must not be numbers.
If you don't mind non-sequential numbering, you can just return the index of the first match found as your identifier:
Copy into C2, then fill down as necessary. The match row stop may need alteration based on how much data you have
=MATCH(A2&"#"&B2, A$2:A$100&"#"&B$2:B$100,0)
Or as an array formula (only need to place in C2);
=MATCH(A2:A11&"#"&B2:B11, A2:A11&"#"&B2:B11,0)

Split columns by groups in Excel

I have a list of groups, subgroups and items in one single column that I would to split in three columns. as the example below:
Class, order and family | Quant.
1. Mammals | 10
1.1 Primates | 6
1.1.1 Lemuridae | 4
1.1.2 Lorisidae | 2
1.2 Carnivora | 4
1.2.1 Felidae | 3
1.2.2 Hyaenidae | 1
I would like to split in columns following the number order, like that:
Class | Order | Family | Quant.
1. Mammals | 1.1 Primates | 1.1.1 Lemuridae | 4
1. Mammals | 1.1 Primates | 1.1.2 Lorisidae | 2
1. Mammals | 1.2 Carnivora| 1.2.1 Felidae | 3
1. Mammals | 1.2 Carnivora| 1.2.2 Hyaenidae | 1
I already separeted numbers from text using RIGHT function, but I do not know what to do next.
Edit for multi digit indexes.
Assuming the source table is in A1:B15 and the result table in E1:H15, the following formulas work:
for Class (E2)
=IFERROR(INDEX($A$1:$A$15,AGGREGATE(15,6,1/(((LEFT(F2,FIND(".",F2)) & " ") = LEFT($A$1:$A$15,FIND(".",F2)+1)))*ROW($A$1:$A$15),1)),"")
for Order (F2)
=IFERROR(INDEX($A$1:$A$15,AGGREGATE(15,6,1/(((LEFT(G2,FIND(CHAR(1),SUBSTITUTE(G2,".", CHAR(1),2))-1) & " ")=LEFT($A$1:$A$15,FIND(CHAR(1), SUBSTITUTE(G2,".",CHAR(1),2)))))*ROW($A$1:$A$15),1)),"")
for Family (G2)
=IFERROR(INDEX($A$1:$A$15,AGGREGATE(15,6,1/((LEN($A$1:$A$15)-LEN(SUBSTITUTE($A$1:$A$15,".",""))=2)*(COUNTIF($G$1:G1,$A$1:$A$15)=0))*ROW($A$1:$A$15),1)),"")
for Quantity (H2)
=IFERROR(VLOOKUP(G2,$A$1:$B$15,2),"")
simplest way i can think off is, if the data above is in cells A:B then the following to be entered in Cells D:G (Headers in first row): -
D: - =IF(SEARCH(" ",A2,1)=3,A2,D1)
E: - =IF(SEARCH(" ",A4,1)=4,A4,E3)
F: =IF(SEARCH(" ",A4,1)=6,A4,F3)
You will then have to delete the first couple of rows for each change in Class/Order.
G: to have a = to the Quantity.
Hope this helps.
Note this way would only work if no number exceeds single digits, if it does let me know and i will make it a bit more intelligent :)

Need some kind of pivotal table without agregation in Excel

I struck with problem of getting reports from table, that look like this:
C1| C2 | C3 | C4
A | 2015-05-15 | 34 | 4
A | 2015-03-12 | -4 | 5
A | 2014-03-12 | 24 | 8
B | 2015-11-10 | -4 | 5
B | 2015-06-12 | 3 | 5
C | 2013-05-12 | 3 | 5
...
600+ rows
...
So I need to make a diagram by different value columns (C3 and C4) grouping by values in the first column. In usual case it is achieved with to separate table which a looks like this (e.g. for col3):
A | B | C | ....
34 | -4 | 3 | ....
-4 | 3 | | ....
24 | | | ....
For col4, I need a table with the similar layout. So in short, I need to make some pivotal table by without aggregation on value per term. Is it possible to get such small table with the original table? If you can offer some other layout for original data which will be more suitable (and easier, in ideal with standard excel functions) for this task, fell free to offer - with some Python script I can resave it.
Find a simple solution: just download tool Tableau. It affords to create very flexible graphs based on raw data, freely place any col as row and vice versa, groping also available.
First, I will assume that C1,C2,C3,C4 are column headings in A1:D1 and the data is in A2:D1000
In F1:H1 place C3,C3,C3
In F2:H2 place A,B,C
In J1:L1 place C4,C4,C4
In J2:L2 place A,B,C
In F3 place the following array formula:
=IFERROR(INDEX(OFFSET($A$2:$A$1000,0,MATCH(F$1,$A$1:$D$1,0)-1),SMALL(IF($A$2:$A$1000=F$2,ROW($A$2:$A$1000)-ROW(A$2)+1),ROWS(F$2:F2))),"")
Don't forget to use Ctrl+Shift+Enter to make it an array formula instead of just using Enter for a normal formula.
Copy F3 to F4:F7, G3:H7 and J3:L7
You end up with something that looks like this:
| A B C D E F G H I J K L
----|--------------------------------------------------------------------
1 | C1 C2 C3 C4 C3 C3 C3 C4 C4 C4
2 | A 15/05/2015 34 4 A B C A B C
3 | A 12/03/2015 -4 5 34 -4 3 4 5 5
4 | A 12/03/2014 24 8 -4 3 5 5
5 | B 10/11/2015 -4 5 24 8
6 | B 12/06/2015 3 5
7 | C 12/05/2013 3 5
Decomposing the formula in F3:
=IFERROR(INDEX(source column,SMALL(filtered row numbers,ranking)),"")
Where:
source column is OFFSET($A$2:$A$1000,0,MATCH(F$1,$A$1:$D$1,0)-1)
filtered row numbers is IF($A$2:$A$1000=F$2,row numbers)
row numbers is ROW($A$2:$A$1000)-ROW(A$2)+1
ranking is ROWS(F$2:F2)
How the formula works is a multipart explanation:
IFERROR(formula,"") If the embedded formula produces an error then we display an empty cell instead. This is useful since we don't know how many results we will get.
INDEX(range,row) Get the value in the row we want to see. The row number here is dynamically generated based on whether the data matches the criteria.
SMALL(array,k) Extract the kth smallest value in the array. The array is the filtered values. k is just a number depending on which row in F we put the formula in.
IF(criteria,value) Based on the criteria either produces the value or FALSE.
ROW(cell) The row number of the cell
ROWS(range) The size of the range
So in the array formula:
ROW($A$2:$A$1000)-ROW(A$2)+1 is a list of all the row numbers in the range starting at 1
IF($A$2:$A$1000=F$2,ROW($A$2:$A$1000)-ROW(A$2)+1) is a list of just those row numbers that match the criteria
SMALL takes the filtered list and returns the kth smallest filtered row number which means that F3 gets the first one, F4 gets the second one, etc
INDEX grabs the data at that row number and displays it
I got the idea for this from http://www.exceltactics.com/make-filtered-list-sub-arrays-excel-using-small/ and on that page is a really good explanation of how it all works.

Display Value in cell Y based on greater than, less than of cell X

Here's the scenario. I have a large spreadsheet of candidates for NHS at my school that are given a score by several teachers, community members, etc. I average out their score and then based on that number they are given a score/value from a rubric. I am looking for a formula that will read the value of cell X (their average score) and display a specific value in cell Y(their rubric score). The following is the criteria:
value<2.0, display 0
value>2.0 value<3.0, display 1
value>3.0 value<3.5, display 2
value>3.5 value<3.75, display 3
value>3.75, display 4
I tried looking this up and the closest I found was a formula that I modified to look like this:
=IF(I10="AVERAGE_CHARACTER",IF(I10<2,0,IF(AND(I10>2,I11<3),1,IF(AND(I10>3,I11<3.5),2,IF(AND(I10>3.5,I11<3,75),3,IF(I11>3.75,4,0))))))
All it says is FALSE in the cell. Not sure if I'm using the wrong formula or have a typo in the formula. Thoughts? If there is an alternate or easier method, I'm open for suggestions.
Thanks!
source: http://www.excelforum.com/excel-formulas-and-functions/575953-greater-than-x-but-less-than-y.html
It's easy if you keep the thresholds and the rubric in separate arrays:
=LOOKUP(A1,{0,2,3,3.5,3.75},{0,1,2,3,4})
You might use something like: (value to be changed in A1)
=VLOOKUP(A1,{0,0;2,1;3,2;3.5,3;3.75,4},2)
or having a table like this: (value to be changed in C1)
| A | B |
1 | 0 | 0 |
2 | 2 | 1 |
3 | 3 | 2 |
4 | 3.5 | 3 |
5 | 3.75 | 4 |
=VLOOKUP(C1,A1:B5,2)

Excel formula to get ranking position

I have a table of people with points. The more points, the higher your position. If you have the same points you are equal first, second etc.
| A | B | C
1 | name | position | points
2 | person1 | 1 | 10
3 | person2 | 2 | 9
4 | person3 | 2 | 9
5 | person4 | 2 | 9
6 | person5 | 5 | 8
7 | person6 | 6 | 7
Using an Excel formula, how can I automatically determine the position? I'm currently using an IF statement that works fine for 5 or 6 matching positions, but I can't add 30+ if statements because there's a limit to the formula.
=IF(C7=C2,B2,IF(C7=C3,B2+5,IF(C7=C4,B3+4,....
So if the points column is the same as the position above then it's the same position value. If the points are less than above then it drops a position so the previous row position +1. But if the row above that is the same then it's the previous position +2 and so on.
You could also use the RANK function
=RANK(C2,$C$2:$C$7,0)
It would return data like your example:
| A | B | C
1 | name | position | points
2 | person1 | 1 | 10
3 | person2 | 2 | 9
4 | person3 | 2 | 9
5 | person4 | 2 | 9
6 | person5 | 5 | 8
7 | person6 | 6 | 7
The 'Points' column needs to be sorted into descending order.
Type this to B3, and then pull it to the rest of the rows:
=IF(C3=C2,B2,B2+COUNTIF($C$1:$C3,C2))
What it does is:
If my points equals the previous points, I have the same position.
Othewise count the players with the same score as the previous one, and add their numbers to the previous player's position.
You can use the RANK function in Excel without necessarily sorting the data. Type =RANK(C2,$C$2:$C$7). Excel will find the relative position of the data in C2 and display the answer. Copy the formula through to C7 by dragging the small node at the right end of the cell cursor.
Try this in your forth column
=COUNTIF(B:B; ">" & B2) + 1
Replace B2 with B3 for next row and so on.
What this does is it counts how many records have more points then current one and then this adds current record position (+1 part).
If your C-column is sorted, you can check whether the current row is equal to your last row. If not, use the current row number as the ranking-position, otherwise use the value from above (value for b3):
=IF(C3=C2, B2, ROW()-1)
You can use the LARGE function to get the n-th highest value in case your C-column is not sorted:
=LARGE(C2:C7,3)
The way I've done this, which is a bit convoluted, is as follows:
Sort rows by the points in descending order
Create an additional column (D) starting at D2 with numbers 1,2,3,... total number of positions
In the cell for the actual positions (D2) use the formula if(C2=C1), D2, C1). This checks if the points in this row are the same as the points in the previous row. If it is it gives you the position of the previous row, otherwise it uses the value from column D and thus handle people with equal positions.
Copy this formula down the entire column
Copy the positions column(C), then paste special >> values to overwrite the formula with positions
Resort the rows to their original order
That's worked for me! If there's a better way I'd love to know it!

Resources