I have a dataset that I am editing so it can be used for a time series regression since the time is not currently in a format that is usable. The format of the existing data is as follows:
--------------------------------------------------
| id|size |2017price|2016price|2015price|2014price| ...
-------------------------------------------------
| 1 | 3 | 50 | 80 | 21 | 56 | ...
--------------------------------------------------
| 2 | 5 | 78 | 85 | 54 | 67 | ...
--------------------------------------------------
| 3 | 2 | 18 | 22 | 34 | 54 | ...
--------------------------------------------------
...
...
...
I would like to add a time variable that accounts for each year and gives the corresponding value as a price variable;
---------------------------
| id | size |t | price|
--------------------------
| 1 | 3 |2017| 50 |
--------------------------
| 1 | 3 |2016| 80 |
--------------------------
| 1 | 3 |2015| 21 |
--------------------------
| 1 | 3 |2014| 21 |
--------------------------
| 2 | 5 |2017| 78 |
--------------------------
| 2 | 5 |2016| 85 |
--------------------------
| 2 | 5 |2015| 54 |
--------------------------
| 2 | 5 |2014| 67 |
--------------------------
| 3 | 2 |2017| 18 |
--------------------------
| 3 | 2 |2016| 22 |
--------------------------
| 3 | 2 |2015| 34 |
--------------------------
| 3 | 2 |2014| 54 |
--------------------------
...
...
...
Is there a function in Stata or Excel that can do this automatically? I have data for 20 years with over 35,000 entries so manually editing won't work.
Your data example as given is not quite suitable as Stata data as variable names cannot begin with numeric characters.
That fixed, this is an exercise for the reshape command (not function).
clear
input id size price2017 price2016 price2015 price2014
1 3 50 80 21 56
2 5 78 85 54 67
3 2 18 22 34 54
end
reshape long price, i(id size) j(year)
sort id size year
list , sepby(id)
+--------------------------+
| id size year price |
|--------------------------|
1. | 1 3 2014 56 |
2. | 1 3 2015 21 |
3. | 1 3 2016 80 |
4. | 1 3 2017 50 |
|--------------------------|
5. | 2 5 2014 67 |
6. | 2 5 2015 54 |
7. | 2 5 2016 85 |
8. | 2 5 2017 78 |
|--------------------------|
9. | 3 2 2014 54 |
10. | 3 2 2015 34 |
11. | 3 2 2016 22 |
12. | 3 2 2017 18 |
+--------------------------+
Related
I want to join 2 tables. I know I can do it with power query but as I am on Macbook I can't do it, unfortunately. Does anyone have any suggestions? (I would love to try this in VBA would that be possible?) I've created Pivot Tables before using VBA but never joining 2 tables. My goal is to create a Pivot Table from the resulting table (resulting table being after combining Table 1 and Table 2).
Table 1
Foreign Keys: Division and Location
Division | Year | Week | Location | SchedDept | PlanNetSales | ActNetSales | AreaCategory
----------|------|------|----------|-----------|--------------|-------------|--------------
5 | 2018 | 10 | 520 | 541 | 1943.2 | 2271.115 | Non-Comm
5 | 2018 | 10 | 520 | 608 | 4378.4 | 5117.255 | Non-Comm
5 | 2018 | 10 | 520 | 1059 | 1044.8 | 1221.11 | Comm
5 | 2018 | 10 | 520 | 1126 | 6308 | 7372.475 | Non-Comm
5 | 2018 | 10 | 520 | 1605 | 1119.2 | 1308.065 | Non-Comm
5 | 2018 | 10 | 520 | 151 | 2995.2 | 3500.64 | Non-Comm
5 | 2018 | 10 | 520 | 1637 | 6371.2 | 7446.34 | Non-Comm
5 | 2018 | 10 | 520 | 3081 | 1203.2 | 1406.24 | Non-Comm
5 | 2018 | 10 | 520 | 6645 | 7350.4 | 8590.78 | Vendor Paid
5 | 2018 | 10 | 520 | 452 | 1676.8 | 1959.76 | Non-Comm
5 | 2018 | 10 | 520 | 527 | 7392 | 8639.4 | Non-Comm
5 | 2018 | 10 | 520 | 542 | 6824.8 | 7976.485 | Non-Comm
5 | 2018 | 10 | 520 | 824 | 1872.8 | 2188.835 | Non-Comm
5 | 2018 | 10 | 520 | 1201 | 6397.6 | 7477.195 | Non-Comm
5 | 2018 | 10 | 520 | 1277 | 2517.6 | 2942.445 | Non-Comm
5 | 2018 | 10 | 520 | 1607 | 2196.8 | 2567.51 | Vendor Paid
5 | 2018 | 10 | 520 | 104 | 3276.8 | 3829.76 | Non-Comm
Table 2
Foreign Keys: Division and Location
Division | Location | LocationName | Region | RegionName | District | DistrictName
----------|----------|--------------|--------|------------|----------|--------------
5 | 520 | Location 520 | 1 | Region 1 | 1 | District 1
5 | 584 | Location 584 | 1 | Region 1 | 1 | District 1
5 | 492 | Location 492 | 1 | Region 1 | 2 | District 2
5 | 215 | Location 215 | 1 | Region 1 | 3 | District 3
5 | 649 | Location 649 | 1 | Region 1 | 4 | District 4
5 | 674 | Location 674 | 1 | Region 1 | 1 | District 1
5 | 139 | Location 139 | 1 | Region 1 | 1 | District 1
5 | 539 | Location 539 | 1 | Region 1 | 5 | District 5
5 | 489 | Location 489 | 1 | Region 1 | 5 | District 5
5 | 139 | Location 139 | 1 | Region 1 | 1 | District 1
5 | 161 | Location 161 | 1 | Region 1 | 6 | District 6
5 | 543 | Location 543 | 1 | Region 1 | 4 | District 4
5 | 166 | Location 166 | 1 | Region 1 | 6 | District 6
5 | 71 | Location 71 | 1 | Region 1 | 5 | District 5
5 | 618 | Location 618 | 1 | Region 1 | 5 | District 5
I did it with index match but it is super slow. Here's a screenshot.
I tried it with the above and then again with the Table Name and Column Names.
=INDEX(LocTable[[#Headers],[Region]], MATCH(MetricsTable[[#Headers],[Division]]&MetricsTable[[#Headers],[Location]],LocTable[[#Headers],[Division]]&LocTable[[#Headers],[Location]],0))
However the above creates a table array "multi-cell array formulas are not allowed in tables". Is the only solution to revert back to nontables so I can run my formula and just deal with the super slowness or is there an option in VBA etc? Thanks in advance!
I have a supermarket shopping dataset with two columns: Id, Products.
Id is the unique ID of the customer, Products contain the items he has shopped for.
Table looks like this:
| S.No | ID | Products |
|----- |-------------|--------------------|
| 1 | 23 | 4,5,6 |
| 2 | 21 | 21,11 |
| 3 | 21 | 11,21,23,18,17 |
| 4 | 125 | 21,22 |
| 5 | 23 | 4,5,8 |
Now i want to identify who is the most shopped customer of each of the product like this
| Product | highestshopper |
| 4 | 23 |
| 11 | 21 |
| 21 | 21 |
Using get_dummies with sum before idxmax
df.set_index('ID').Products.str.get_dummies(',').sum(level=0).idxmax()
Out[145]:
11 21
17 21
18 21
21 21
22 125
23 21
4 23
5 23
6 23
8 23
dtype: int64
I am trying to write a formula in excel that will take values from a table based on few criterias. Table looks like this:
+------------+----+----+----+-----+----+----+----+----+
| Month/Hour | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+------------+----+----+----+-----+----+----+----+----+
| 1 | 93 | 21 | 32 | 42 | 55 | 30 | 98 | 97 |
| 2 | 41 | 73 | 25 | 58 | 48 | 10 | 19 | 20 |
| 3 | 87 | 17 | 92 | 100 | 9 | 31 | 22 | 22 |
| 4 | 92 | 91 | 63 | 28 | 95 | 58 | 55 | 47 |
| 5 | 56 | 34 | 69 | 60 | 87 | 64 | 40 | 53 |
| 6 | 8 | 64 | 92 | 30 | 48 | 27 | 52 | 65 |
| 7 | 77 | 6 | 27 | 45 | 29 | 91 | 24 | 90 |
| 8 | 13 | 14 | 31 | 10 | 40 | 49 | 22 | 57 |
| 9 | 70 | 16 | 55 | 95 | 85 | 41 | 65 | 17 |
| 10 | 72 | 79 | 81 | 48 | 13 | 40 | 99 | 52 |
| 11 | 76 | 88 | 80 | 45 | 97 | 68 | 64 | 62 |
| 12 | 73 | 82 | 97 | 74 | 93 | 7 | 6 | 71 |
+------------+----+----+----+-----+----+----+----+----+
In the first column we have months (1 - January, 2 - February, etc.) In the first row we have hours from 1 to 8.
Now I need a formula that for months <3,8> would sum values from hours <2,6> and for the rest of months (1-2, 9-12) would sum values from hours <4,7>. I've tried using Sumifs formula but it seems like it works only for one column at a time.
If it is not viable to use formula then I have some VBA knowledge and such solution would be welcome too.
Assuming months are in A2:A13 and the hours in B1:I1 you can use this formula
=SUMPRODUCT(ISNUMBER(MATCH(A2:A13,{3,8},0)*MATCH(B1:I1,{2,6},0))+0,B2:I13)
Vary as required for any combination of months/hours
list your months and hours then use MATCH inside an SUMPRODUCT:
=SUMPRODUCT(B2:I13,ISNUMBER(MATCH(B1:I1,M:M,0))*ISNUMBER(MATCH(A2:A13,L:L,0)))
Neither the Months nor the Hours desired needs to be consecutive.
I have this data below in Excel. What I want is to return the No.of Inactive months and the Inactive months themselves.
ACTIVITY MONTH
Jan17 Feb17 Mar17 Apr17 Reg Month No.Inactive months Months Inactive
User ID
1 5 38 0 60 Jan17
2 0 242 203 20 Feb17
3 30 0 0 30 Jan17
4 0 0 0 40 Apr17
5 0 0 16 0 Mar17
To count the inactive months you can use the following.
+---+------+--------+--------+--------+--------+--+-----------------+
| | A | B | C | D | E | F| G |
+---+------+--------+--------+--------+--------+--+-----------------+
| 1 | User | Jan 17 | feb-17 | mar-17 | apr-17 | | Inactive months |
| 2 | 1 | 5 | 38 | 0 | 60 | | 1 |
| 3 | 2 | 0 | 242 | 203 | 20 | | 1 |
| 4 | 3 | 30 | 0 | 0 | 30 | | 2 |
| 5 | 4 | 0 | 0 | 0 | 40 | | 3 |
| 6 | 5 | 0 | 0 | 16 | 0 | | 3 |
+---+------+--------+--------+--------+--------+--+-----------------+
where in cell G2 the is this formula =COUNTIF(B2:E2,0)
To show the list of inactive months it's a little bit harder.
The point is that you have to explain how you want to see these results.
The easier way is to use the conditional formatting anc color the cell with zero (but this is not so useful). Others way could be to traspose the table and filter the column with zero. Another one could be to use a VBA macro....
I have a set of data as below.
SHEET 1
+------+-------+
| JANUARY |
+------+-------+
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | ALFRED | 11 | 150 |
| 2 | ARIS | 22 | 120 |
| 3 | JOHN | 33 | 170 |
| 4 | CHRIS | 22 | 190 |
| 5 | JOE | 55 | 120 |
| 6 | ACE | 11 | 200 |
+----+----------+------+-------+
SHEET2
+----+----------+------+-------+
| ID | NAME |COUNT | PRICE |
+----+----------+------+-------+
| 1 | CHRIS | 13 | 123 |
| 2 | ACE | 26 | 165 |
| 3 | JOE | 39 | 178 |
| 4 | ALFRED | 21 | 198 |
| 5 | JOHN | 58 | 112 |
| 6 | ARIS | 11 | 200 |
+----+----------+------+-------+
The RESULT should look like this in sheet1 :
+------+-------++------+-------+
| JANUARY | FEBRUARY |
+------+-------++------+-------+
+----+----------+------+-------++-------+-------+
| ID | NAME |COUNT | PRICE || COUNT | PRICE |
+----+----------+------+-------++-------+-------+
| 1 | ALFRED | 11 | 150 || 21 | 198 |
| 2 | ARIS | 22 | 120 || 11 | 200 |
| 3 | JOHN | 33 | 170 || 58 | 112 |
| 4 | CHRIS | 22 | 190 || 13 | 123 |
| 5 | JOE | 55 | 120 || 39 | 178 |
| 6 | ACE | 11 | 200 || 26 | 165 |
+----+----------+------+-------++-------+-------+
I need formula in column name "FEBRUARY". this formula will find its match in sheet 2
Assuming the first Count value should go in cell E3 of Sheet1, the following formula would be the usual way of doing it:-
=INDEX(Sheet2!C:C,MATCH($B3,Sheet2!$B:$B,0))
Then the Price (in F3) would be given by
=INDEX(Sheet2!D:D,MATCH($B3,Sheet2!$B:$B,0))
I think this query will work fine for your requirement
SELECT `Sheet1$`.ID,`Sheet1$`.NAME, `Sheet1$`.COUNT AS 'Jan-COUNT',`Sheet1$`.PRICE AS 'Jan-PRICE', `Sheet2$`.COUNT AS 'Feb-COUNT',`Sheet2$`.PRICE AS 'Feb-PRICE'
FROM `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet1$` `Sheet1$`, `C:\Users\Nagendra\Desktop\aaaaa.xlsx`.`Sheet2$` `Sheet2$`
WHERE (`Sheet1$`.NAME=`Sheet2$`.NAME)
Provide Actual path insted of
C:\Users\Nagendra\Desktop\aaaaa.xlsx
First you need to know about how to make connection. So refer http://smallbusiness.chron.com/use-sql-statements-ms-excel-41193.html