Not sure why but when using the power operator I get:
-50 ** 0
Result:
-1
Which is expected but:
(50 - 100) ** 0
Result:
1
Using Python 3.5.2
Precedence is different for the two equations, hence the bracketed equation gives the actual result:
(-50) ** 0
Result:
1
Cheers: Noah Christopher
Related
I am working with survey data and need to compare the means of a couple of variables. Since this is survey data, I need to apply survey weights, requiring the use of the svy prefix. This means that I cannot rely on Stata's ttest command. I essentially need to recreate the results of the following two ttest commands:
ttest bcg_vaccinated == chc_bcg_vaccinated_2, unpaired
ttest bcg_vaccinated == chc_bcg_vaccinated_2
bcg_vaccinated is a self-reported variable on BCG vaccination status while chc_bcg_vaccinated_2 is BCG vaccination status verified against a child health card. You will notice that chc_bcg_vaccinated_2 has missing values. These indicate that the child did not have a health card. So missing indicates no health card, 0 means the vaccination was not given, and finally, 1 means the vaccination was given. But this means that the variables have a different number of non-missing observations.
I have found the solution to the second ttest command, by creating a variable which is a difference between the two vaccination variables:
gen test_diff = bcg_vaccinated - chc_bcg_vaccinated_2
regress test_diff
The above code runs only for the observations where both vaccination variables are non-missing, replicating the paired t-test listed above. Unfortunately, I cannot figure out how to do the first version. The first version would compare the means of both variables on the full set of observations.
Here are some example data for the two variables. Each row represents a different child.
clear
input byte bcg_vaccinated float chc_bcg_vaccinated_2
0 .
1 0
1 1
1 1
1 0
0 .
1 1
1 1
1 1
1 0
0 .
1 1
1 1
0 .
1 1
1 1
1 0
0 .
1 0
1 0
1 0
0 .
0 .
1 1
0 .
You need to get the data into a suitable form for a regression:
. ttest bcg_vaccinated == chc_bcg_vaccinated_2, unpaired
Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
bcg_va~d | 25 .68 .095219 .4760952 .4834775 .8765225
chc_bc~2 | 17 .5882353 .1230382 .5072997 .3274059 .8490647
---------+--------------------------------------------------------------------
Combined | 42 .6428571 .0748318 .4849656 .4917312 .7939831
---------+--------------------------------------------------------------------
diff | .0917647 .1536653 -.2188044 .4023338
------------------------------------------------------------------------------
diff = mean(bcg_vaccinated) - mean(chc_bcg_vaccin~2) t = 0.5972
H0: diff = 0 Degrees of freedom = 40
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.7231 Pr(|T| > |t|) = 0.5538 Pr(T > t) = 0.2769
. display r(p)
.5537576
. quietly stack bcg_vaccinated chc_bcg_vaccinated_2, into(vax_status) clear
. quietly recode _stack (1 = 1 "SR") (2 = 0 "CHC"), gen(group) label(group)
. regress vax_status i.group
Source | SS df MS Number of obs = 42
-------------+---------------------------------- F(1, 40) = 0.36
Model | .085210084 1 .085210084 Prob > F = 0.5538
Residual | 9.55764706 40 .238941176 R-squared = 0.0088
-------------+---------------------------------- Adj R-squared = -0.0159
Total | 9.64285714 41 .235191638 Root MSE = .48882
------------------------------------------------------------------------------
vax_status | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
group |
SR | .0917647 .1536653 0.60 0.554 -.2188044 .4023338
_cons | .5882353 .1185553 4.96 0.000 .3486261 .8278445
------------------------------------------------------------------------------
. testparm 1.group
( 1) 1.group = 0
F( 1, 40) = 0.36
Prob > F = 0.5538
. display r(p)
.5537576
The testparm and display are not needed; they just show more digits.
I have data in Stata regarding the feeling of the current situation. There are seven types of feeling. The data is stored in the following format (note that the data type is a string, and one person can respond to more than 1 answer)
feeling
4,7
1,3,4
2,5,6,7
1,2,3,4,5,6,7
Since the data is a string, I tried to separate it by
split feeling, parse (,)
and I got the result
feeling1
feeling2
feeling3
feeling4
feeling5
feeling6
feeling7
4
7
1
3
4
2
5
6
7
1
2
3
4
5
6
7
However, this is not the result I want. which is that the representative number of feelings should go into the correct variable. For instance.
feeling1
feeling2
feeling3
feeling4
feeling5
feeling6
feeling7
4
7
1
3
4
2
5
6
7
1
2
3
4
5
6
7
I am not sure if there is any built-in command or function for this kind of problem. I am thinking about using forval in looping through every value in each variable and try to juggle it around into the correct variable.
A loop over the distinct values would be enough here. I give your example in a form explained in the Stata tag wiki as more helpful and then give code to get the variables you want as numeric variables.
* Example generated by -dataex-. For more info, type help dataex
clear
input str13 feeling
"4,7"
"1,3,4"
"2,5,6,7"
"1,2,3,4,5,6,7"
end
forval j = 1/7 {
gen wanted`j' = `j' if strpos(feeling, "`j'")
gen better`j' = strpos(feeling, "`j'") > 0
}
l feeling wanted1-better3
+---------------------------------------------------------------------------+
| feeling wanted1 better1 wanted2 better2 wanted3 better3 |
|---------------------------------------------------------------------------|
1. | 4,7 . 0 . 0 . 0 |
2. | 1,3,4 1 1 . 0 3 1 |
3. | 2,5,6,7 . 0 2 1 . 0 |
4. | 1,2,3,4,5,6,7 1 1 2 1 3 1 |
+---------------------------------------------------------------------------+
If you wanted a string result that would be yielded by
gen wanted`j' = "`j'" if strpos(feeling, "`j'")
Had the number of feelings been 10 or more you would have needed more careful code as for example a search for "1" would find it within "10".
Indicator (some say dummy) variables with distinct values 1 or 0 are immensely more valuable for most analysis of this kind of data.
Note Stata-related sources such as
this FAQ
this paper
and this paper.
I must create a sequence of numbers using the number of elements that an list has.
arr1=(1 2 3 4 5 6)
I thought about the following expression in order to do so, but it is now working.
echo {0..$(expr ${#arr1[*]} - 1)}
{0..5} # output
The correct output should be:
0 1 2 3 4 5
Could anyone explain me why I do not get the correct values?
You just need to add an eval:
$ a=(1 2 3 4 5 6)
$ eval echo {0..$(expr ${#a[*]} - 1)}
0 1 2 3 4 5
I have a file with 4 columns separated by space like this bellow:
1_86500000 50 1_87500000 19
1_87500000 13 1_89500000 42
1_89500000 25 1_90500000 10
1_90500000 3 1_91500000 11
1_91500000 23 1_92500000 29
1_92500000 34 1_93500000 4
1_93500000 39 1_94500000 49
1_94500000 35 1_95500000 26
2_35500000 1 2_31500000 81
2_31500000 12 2_4150000 50
The First and Third columns are not in phase so I can not divide the value of one by another.
As there are only two or one possible columns $1 or $3, a solution would be look for the pattern and divide its value in the another column or set it to 0 if there is none like this expected result shows:
P.S. the second field in this expected result is just illustrative to shown the division.
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.333
1_91500000 11/23 0.47826
1_92500000 29/34 0.85294
1_93500000 4/39 0.10256
1_94500000 49/35 1.4
2_35500000 0/1 0
2_31500000 81/12 6.75
2_4150000 50/0 50
I do not archived anything by myself other than this. So I do not have any starting point by now.
I tried separate the fields merged with _ to see if I could match by subtracting the coordinates. If I got 0 would mean that the columns was in phase and correct. But I could not go further.
awk '{if( ($5-$2)==0) print $1,$2,$3,$4,$5,$6}' file
I tried to match both columns but I only got phased results:
awk '{if(($1==$3)) print $1,$4/$2}' file
Can you help me?
awk to the rescue!
$ awk '{d[$1]=$2; n[$3]=$4}
END {for(k in n)
if(k in d) {print k,n[k]"/"d[k],n[k]/d[k]; delete d[k]}
else print k,n[k]"/0",n[k];
for(k in d) print k,"0/"d[k],0}' file | sort
1_86500000 0/50 0
1_87500000 19/13 1.46154
1_89500000 42/25 1.68
1_90500000 10/3 3.33333
1_91500000 11/23 0.478261
1_92500000 29/34 0.852941
1_93500000 4/39 0.102564
1_94500000 49/35 1.4
1_95500000 26/0 26
2_31500000 81/12 6.75
2_35500000 0/1 0
2_4150000 50/0 50
your division by zero result is little strange though!
Explanation keep two arrays for numerator and denominator. Once scanned the file, go over numerator array and find the corresponding denominator and make the division. For the denominators not used apply the convention given.
How would I go about making a function so that x has a range of values from x=0 to x=19 and if the x value exceeds 19 or is below zero how can I get it to wrap around
From:
x=20, x=21, x=22 and x=(-1), x=(-2), x=(-3)
To:
x=0, x=1, x=2 and x=19, x=18, x=17 respectively?
I've heard of modular arithmetic which is apparently the way I should deal with it.
Usually you would use the built-in functions mod and rem, but I assume they are off-limits for homework. So you can write your own function, e.g.
mod20 x | x < 0 = ...
| x > 19 = ...
| otherwise = x
There are different things you can try to fill in the ...s. One of the easiest is repeated addition or subtraction, but I don't want to spoil all the fun.
Once you have this function, you can "rescale" the values after every "normal" arithmetic operation, e.g. mod20 (12 + 17).
Try using the mod function:
(-5) `mod` 20 ==> 15
5 `mod` 20 ==> 5
20 `mod` 20 ==> 0
25 `mod` 20 ==> 5
See also wikipedia on the topic.
Use
x `mod` 20
(This is a filler to make the answer 30 characters.)