How to split data and assign it into designated variables? - string

I have data in Stata regarding the feeling of the current situation. There are seven types of feeling. The data is stored in the following format (note that the data type is a string, and one person can respond to more than 1 answer)
feeling
4,7
1,3,4
2,5,6,7
1,2,3,4,5,6,7
Since the data is a string, I tried to separate it by
split feeling, parse (,)
and I got the result
feeling1
feeling2
feeling3
feeling4
feeling5
feeling6
feeling7
4
7
1
3
4
2
5
6
7
1
2
3
4
5
6
7
However, this is not the result I want. which is that the representative number of feelings should go into the correct variable. For instance.
feeling1
feeling2
feeling3
feeling4
feeling5
feeling6
feeling7
4
7
1
3
4
2
5
6
7
1
2
3
4
5
6
7
I am not sure if there is any built-in command or function for this kind of problem. I am thinking about using forval in looping through every value in each variable and try to juggle it around into the correct variable.

A loop over the distinct values would be enough here. I give your example in a form explained in the Stata tag wiki as more helpful and then give code to get the variables you want as numeric variables.
* Example generated by -dataex-. For more info, type help dataex
clear
input str13 feeling
"4,7"
"1,3,4"
"2,5,6,7"
"1,2,3,4,5,6,7"
end
forval j = 1/7 {
gen wanted`j' = `j' if strpos(feeling, "`j'")
gen better`j' = strpos(feeling, "`j'") > 0
}
l feeling wanted1-better3
+---------------------------------------------------------------------------+
| feeling wanted1 better1 wanted2 better2 wanted3 better3 |
|---------------------------------------------------------------------------|
1. | 4,7 . 0 . 0 . 0 |
2. | 1,3,4 1 1 . 0 3 1 |
3. | 2,5,6,7 . 0 2 1 . 0 |
4. | 1,2,3,4,5,6,7 1 1 2 1 3 1 |
+---------------------------------------------------------------------------+
If you wanted a string result that would be yielded by
gen wanted`j' = "`j'" if strpos(feeling, "`j'")
Had the number of feelings been 10 or more you would have needed more careful code as for example a search for "1" would find it within "10".
Indicator (some say dummy) variables with distinct values 1 or 0 are immensely more valuable for most analysis of this kind of data.
Note Stata-related sources such as
this FAQ
this paper
and this paper.

Related

Counting 15's in Cribbage Hand

Background
This is a followup question to my previous finding a straight in a cribbage hand question and Counting Pairs in Cribbage Hand
Objective
Count the number of ways cards can be combined to a total of 15, then score 2 points for each pair. Ace worth 1, and J,Q,K are worth 10.
What I have Tried
So my first poke at a solution required 26 different formulas. Basically I checked each possible way to combine cards to see if the total was 15. 1 way to add 5 cards, 5 ways to add 4 cards, 10 ways to add 3 cards, and 10 ways to add 2 cards. I thought I had this licked until I realized I was only looking at combinations, I had not considered the fact that I had to cap the value of cards 11, 12, and 13 to 10. I initially tried an array formula something along the lines of:
MIN(MOD(B1:F1-1,13)+1,10)
But the problem with this is that MIN takes the minimum value of all results not the individual results compared to 10.
I then tried it with an IF function, which worked, but involved the use of CSE formula even wehen being used with SUMPRODUCT which is something I try to avoid when I can
IF(MOD(B1:F1-1,13)+1<11,MOD(B1:F1-1,13)+1,10)
Then I stumble on an answer to a question in code golf which I modified to lead me to this formula, which I kind of like for some strange reason, but its a bit long in repetitive use:
--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2)
My current working formulas are:
5 card check
=(SUMPRODUCT(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2))=15)*2
4 card checks
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,3,4}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,3,5}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,2,4,5}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{1,3,4,5}))=15)*2
=(SUM(AGGREGATE(15,6,--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2),{2,3,4,5}))=15)*2
3 card checks
same as 4 card checks using all combinations for 3 cards in the {1,2,3}.
There are 10 different combinations, so 10 different formulas.
The 2 card check was based on the solution by Tom in Counting Pairs in Cribbage Hand and all two cards are checked with a single formula. (yes it is CSE)
2 card check
{=SUM(--(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2)+TRANSPOSE(--MID("01020304050607080910101010",1+(MOD(B1:F1-1,13)*2),2))=15))}
Question
Can the 3 and 4 card combination sum check be brought into a single formula similar to the 2 card check?
Is there a better way to convert cards 11,12,13 to a value of 10?
Sample Data
| B | C | D | E | F | POINTS
+----+----+----+----+----+
| 1 | 2 | 3 | 17 | 31 | <= 2 (all 5 add to 15)
| 1 | 2 | 3 | 17 | 32 | <= 2 (Last 4 add to 15)
| 11 | 18 | 31 | 44 | 5 | <= 16 ( 4x(J+5), 4X(5+5+5) )
| 6 | 7 | 8 | 9 | 52 | <= 4 (6+9, 7+8)
| 1 | 3 | 7 | 8 | 52 | <= 2 (7+8)
| 2 | 3 | 7 | 9 | 52 | <= 2 (2+3+K)
| 2 | 4 | 6 | 23 | 52 | <= 0 (nothing add to 15)
Excel Version
Excel 2013
For 5:
=(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2
For 4:
=SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$5),{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}),ROW($1:$4)^0)=15))*2
For 3
=SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$10),{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}),ROW($1:$3)^0)=15))*2
For 2:
SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15))
All together:
=(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$5),{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}),ROW($1:$4)^0)=15))*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,ROW($1:$10),{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}),ROW($1:$3)^0)=15))*2+
SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15))
For older versions we need to "trick" INDEX into accepting the arrays as Row and Column References:
We do that by using N(IF({1},[thearray]))
=(SUMPRODUCT(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))=15)*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,N(IF({1},ROW($1:$5))),N(IF({1},{1,2,3,4;1,2,3,5;1,2,4,5;1,3,4,5;2,3,4,5}))),ROW($1:$4)^0)=15))*2+
SUMPRODUCT(--(MMULT(INDEX(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)*ROW($1:$10)^0,N(IF({1},ROW($1:$10))),N(IF({1},{1,2,3;1,2,4;1,2,5;1,3,4;1,3,5;1,4,5;2,3,4;2,3,5;2,4,5;3,4,5}))),ROW($1:$3)^0)=15))*2+
SUMPRODUCT(--((CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10))+(TRANSPOSE(CHOOSE(MOD(A1:E1-1,13)+1,1,2,3,4,5,6,7,8,9,10,10,10,10)))=15))
This is a CSE That must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.

Creation of brace sequence not working in bash

I must create a sequence of numbers using the number of elements that an list has.
arr1=(1 2 3 4 5 6)
I thought about the following expression in order to do so, but it is now working.
echo {0..$(expr ${#arr1[*]} - 1)}
{0..5} # output
The correct output should be:
0 1 2 3 4 5
Could anyone explain me why I do not get the correct values?
You just need to add an eval:
$ a=(1 2 3 4 5 6)
$ eval echo {0..$(expr ${#a[*]} - 1)}
0 1 2 3 4 5

Best technique to convert to panel data

I have some returns data on 1000+ firms that I want to convert into panel form.
From my understanding, it is neither truly wide nor long form (at least from the examples I've seen).
I have attached an example of the original data set and what I want it to look like. Is there a way to achieve this? I am intermediate with Excel/VBA, and new to SAS/Stata but can use them and self-teach myself.
Consider this example using reshape in Stata:
clear *
input float(date FIRM_A FIRM_B FIRM_C FIRM_D)
1 .14304407 .8583148 .3699433 .7310092
2 .34405795 .9531917 .6376472 .2895169
3 .04766626 .6588161 .6988417 .5564945
4 .21615694 .18380463 .4781089 .3058527
5 .709911 .85116 .14080866 .10687433
6 .3805699 .070911616 .55129284 .8039169
7 .1680727 .7267236 .1779183 .51454383
8 .3610604 .1578059 .15383714 .9001798
9 .7081585 .9755411 .28951603 .20034006
10 .27780765 .8351805 .04982195 .3929535
end
reshape long FIRM_, i(date) j(Firm_ID) string
rename FIRM_ return
replace Firm_ID = "Firm " + Firm_ID
list in 1/8, sepby(date)
+---------------------------+
| date Firm_ID return |
|---------------------------|
1. | 1 Firm A .1430441 |
2. | 1 Firm B .8583148 |
3. | 1 Firm C .3699433 |
4. | 1 Firm D .7310092 |
|---------------------------|
5. | 2 Firm A .3440579 |
6. | 2 Firm B .9531917 |
7. | 2 Firm C .6376472 |
8. | 2 Firm D .2895169 |
+---------------------------+
see help reshape for more on the topic.
This can be done very easily with proc transpose in SAS. All you will need to add is a column name for column A. This will be your by variable so that the following variables will be transposed along each specific date. Other than that just make sure your data is sorted by the date column. The code would look similar to this:
proc sort data=have;
by date;
run;
proc transpose data=have out=want; /* you could add a name= or prefix= statement here to rename your variables */
by date;
run;

Concorde TSP Solver - Asymmetric Instances

I am trying to use Concorde to solve some asymmetric instances of the TSP. Although the official website says Concorde does solve this kind of instances, I've seen people saying it doesn't (https://cs.stackexchange.com/a/16336, http://www.math.uwaterloo.ca/tsp/road/austria.html). I'm only doubting the official website because I have the following test instance:
NAME: test
TYPE: ATSP
DIMENSION: 4
EDGE_WEIGHT_TYPE: EXPLICIT
EDGE_WEIGHT_FORMAT: FULL_MATRIX
EDGE_WEIGHT_SECTION
999 | 2 | 2 | 2
2 |999| 2 | 2
2 | 2 |999| 2
2 | 2 | 2 |999
EOF
Concorde gives me, as expected:
Optimal Solution: 8.00
and on the .sol file, the route: 0 1 3 2.
But if I change the matrix to:
EDGE_WEIGHT_SECTION
999 |100| 3 |100
2 |999| 2 | 2
2 | 2 |999| 2
2 | 2 | 2 |999
EOF
The solution given now is 106 with the sequence 0 3 1 2.
No matter what numbers I put in the first row, Concorde never chooses the third city (index 2, value 3).
Does anybody have any idea why that is? Am I reading the input wrong?
--EDIT--
Actually, the instances for the ATSP problem are NOT from the official website. It's from this one:
http://comopt.ifi.uni-heidelberg.de/software/TSPLIB95/
Because the name of this library is TSPLIB (the same as the official set of problems for Concorde) I made this confusion. I'm not sure how this TSPLIB95 is related with Concorde Solver (or if it is related whatsoever).
Concorde is only for symmetric TSP. You'll have to do the standard trick to convert an ATSP into a symmetric TSP (with additional nodes).

Send DNS data: MSB or LSB first?

I'm implementing a DNS(multicast DNS in fact) in c#.
I just want to know if I must encode my uint/int/ushort/... with the LSB on the left or the MSB on the left. And more globally how I could know this? One of this is standard?
Because I didn't found anything in the IETF description. I found a lot of things(each header field length, position), but I didn't found this.
Thank you!
The answer is in RFC 1035 (2.3.2. Data Transmission Order)
Here is the link: http://www.ietf.org/rfc/rfc1035.txt
And the interesting part
2.3.2. Data Transmission Order
The order of transmission of the header and data described in this
document is resolved to the octet level. Whenever a diagram shows a
group of octets, the order of transmission of those octets is the
normal order in which they are read in English. For example, in the
following diagram, the octets are transmitted in the order they are
numbered.
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 1 | 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 3 | 4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 5 | 6 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Whenever an octet represents a numeric quantity, the left most bit in
the diagram is the high order or most significant bit. That is, the
bit labeled 0 is the most significant bit. For example, the following
diagram represents the value 170 (decimal).
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1 0 1 0 1 0 1 0|
+-+-+-+-+-+-+-+-+
Similarly, whenever a multi-octet field represents a numeric quantity
the left most bit of the whole field is the most significant bit.
When a multi-octet quantity is transmitted the most significant octet
is transmitted first.

Resources