How can I read this stem-leaf plot correctly? - statistics

Like the title, I used online an online data set for stem-leaf plot. But I don't know how to read it. For example, in the line of Stem 7. and Leaf .5555, why Frequency = 18? And what does the line Each leaf: 4 case(s) mean?
Every answer is very helpful to me.

Here is an example.
DATA LIST FREE /x1.
BEGIN DATA.
10 22 22 13 14 10 16 17 17 17
END DATA.
EXAMINE VARIABLES=x1 /PLOT STEMLEAF.
x1 Stem-and-Leaf Plot
Frequency Stem & Leaf
4.00 1 . 0034
4.00 1 . 6777
2.00 2 . 22
Stem width: 10.00
Each leaf: 1 case(s)
In these data, the "Stem" is the tens place of each value and the "Leaf" is the ones place. There are four cases in the first line, representing the values 10, 10, 13, and 14 in the data. That's why "Frequency" is 4; there are four cases. There are only 2 in the last one, for both values of 22 in the original data. As the data get larger, StemLeaf plots can get a little harder to read, but the other real value of them is their shape, which gives you an idea of the shape of the distribution. For another view of that shape, ask SPSS to produce a histogram.

Related

How to use Excel Solver for piecewise linear fit?

I am trying to use Excel Solver to get fits for a piecewise linear function (here, a three line fit). The Solver explanation here is helpful for a single linear case, but I am not sure how to set the model up "smartly" so that it re-calculates the hinge-points (i.e., x-values of line intersections will change with the input data). I've never used Solver before.
x y
1 0.1552
2 0.1877
3 0.2016
4 0.2094
5 0.2142
6 0.2176
7 0.2201
8 0.2220
9 0.2235
10 0.2247
11 0.2256
12 0.2265
13 0.2272
14 0.2278
15 0.2283
16 0.2288
17 0.2292
18 0.2296
19 0.2299
20 0.2302

Get Poisson expectation of preceding values of a time series in Python

I have some time series data (in a Pandas dataframe), d(t):
time 1 2 3 4 ... 99 100
d(t) 5 3 17 6 ... 23 78
I would like to get a time-shifted version of the data, e.g. d(t-1):
time 1 2 3 4 ... 99 100
d(t) 5 3 17 6 ... 23 78
d(t-1) NaN 5 3 17 6 ... 23
But with a complication. Instead of simply time-shifting the data, I need to take the expected value based on a Poisson-distributed shift. So instead of d(t-i), I need E(d(t-j)), where j ~ Poisson(i).
Is there an efficient way to do this in Python?
Ideally, I would be able to dynamically generate the result with i as a parameter (that I can use in an optimization).
numpy's Poisson functions seem to be about generating draws from a Poisson rather than giving a PMF that could be used to calculate expected value. If I could generate a PMF, I could do something like:
for idx in len(d(t)):
Ed(t-i) = np.multiply(d(t)[:idx:-1], PMF(Poisson, i)).sum()
But I have no idea what actual functions to use for this, or if there is an easier way than iterating over indices. This approach also won't easily let me optimize over i.
You can use scipy.stats.poisson to get PMF.
Here's a sample:
from scipy.stats import poisson
mu = 10
# Declare 'rv' to be a poisson random variable with λ=mu
rv = poisson(mu)
# poisson.pmf(k) = (e⁻ᵐᵘ * muᵏ) / k!
print(rv.pmf(4))
For more information about scipy.stats.poisson check this doc.

Difference between consecutive maxima and minima in a .csv dataset

I have a dataset which represents tracking data of a mouse's paw moving up and down in the y-axis as it reaches up for and pulls down on a piece of string.
The output of the data is a list of y-coordinates corresponding to a fraction of a second. For example:
1 333.9929833
2 345.4504726
3 355.7046572
4 367.6136684
5 379.7906121
6 390.5470788
7 397.9017118
8 403.677123
9 412.1550843
10 416.516814
11 419.8205706
12 423.7994881
13 429.4874275
14 419.2652898
15 360.1626136
16 298.8212249
17 264.3647809
18 265.0078862
19 268.1828407
20 283.101321
21 294.8219163
22 308.4875135
In this series, there is a max value of 429... and a minimum of 264... - however, as you can see from an example image:
(excuse the gaps), there are multiple consecutive wave-like maxima and minima.
The goal is to find the difference between each maxima and consecutive minima, and each minima and consecutive maxima (i.e. max1-min1, min2-max1, max2-min2...). Ideally, this would also provide the timepoints of each max and min (e.g. 13 and 17 for the provided dataset) - there is a column with integer labels (1, 2, 3...) corresponding to each coordinate.
Thanks for your help!

Excel rotate radar chart

I have been trying to create a windrose that displays the occurence of multiple wind speeds and their respective wind direction. Using other very helpful posts on here I've gotten pretty close to what I want. There is just one thing I can't seem to fix.
As you can see in the figure below the graph starts at 0 degrees while I want the "North" wind direction to start at -11,25 (or +348,75) degrees.
Currently the radial axis labels are added using a pie chart while the rest of the data is plotted in a filled radar chart. It is easy to rotate the pie chart but I can't seem to find a similar function for rotating the radar chart. Any help would be much appreciated. The excel file is attached beneath the figure.
EDIT: Locked excel file against editing
Excel file
I haven't fully digested the netiquette of this website and not sure if it is a good idea to try giving you an answer 6+ months after you posted. Also hope that by this time you found an answer.
If not, this link should be of help:
https://superuser.com/questions/687036/how-to-make-a-pie-radar-chart
In the example the creator made one field for each degree and started the first series, which would be equivalent to your north at 0°. However nothing prevents you from starting at 348.
I have not tested but I also think that nothing prevents you from adding even more "resolution", e.g. half-degree steps.. or even more to your discretion.
EDIT: following L.Guthardt's feedback.
In order to provide you an answer I opted to simplify your table and chart. Mostly for convenience, but also because I struggle to get a full understanding of the original "architecture". Still, the solution should work at any level and is based on two key elements:
first you will have to double the number of rows from 16 to 32 (thus each quadrant being repeated two times, e.g. ... nne - nne - ne - ne...)
second, you have to start and finish with N as showcased here
Direction Cat6
N 6
NNE 4 4
NNE 6
NE 4 4
NE 6
ENE 4 4
ENE 6
E 4 4
E 6
ESE 4 4
ESE 6
SE 4 4
SE 6
SSE 4 4
SSE 6
S 4 4
S 6
SSW 4 4
SSW 6
SW 4 4
SW 6
WSW 4 4
WSW 6
W 4 4
W 6
WNW 4 4
WNW 6
NW 4 4
NW 6
NNW 4 4
NNW 6
N 4 4
which will generate
for the pie chart I used a separate range with alternate gaps in the labels
Direction Dummy
N 1
1
NNE 1
1
NE 1
1
ENE 1
1
E 1
1
ESE 1
1
SE 1
1
SSE 1
1
S 1
1
SSW 1
1
SW 1
1
WSW 1
1
W 1
1
WNW 1
1
NW 1
1
NNW 1
1
Rotating radar charts in Excel can be achieved by building a separate table for plotting the chart. It would have three columns:
Column A: New categories
Column B: Original categories (calculated from A)
Column C: Original data using VLOOKUP() on B
The chart will be plotted using columns B and C. Column B category numbers are offset by the desired number of categories.
If the chart needs to be rotated by other than multiples of a category degree (e.g., 30 degrees for 12 categories), you would need to add rows in between (corresponding to the amount of rotation in relation to the category degree). For example, to rotate a 12-category radar chart by multiples of 15 degrees, one extra row is needed in-between each original category row (to create 24 new categories). In this case, you would need to calculate the intermediate values by linearly interpolating between actual data points.
The trick is that blank category values are not displayed on the chart and the values for these categories blend in smoothly with the real data (because they are interpolated).
I will post an example if the above is not clear enough.
P.S. I cannot look at your new Excel file (in Answers) because it exceeds 5 MB (see screenshot 1).
So I did keep working on this problem and the best solution I've come up with (while using Microsoft Excel) looks as follows:
Currently, the number of sectors in the plot is fixed at 16. If I want to make this number variable, the table required for the plot data requires a very large amount of lookup functions which make the spreadsheet too slow to work with.
I've uploaded the new Excel file here to take a look at:
Excel file

A problem with connected points and determining geometry figures based on points' location analysis

In school we have a really hard problem, and still no one from the students has solved it yet. Take a look at the picture below:
http://d.imagehost.org/0422/mreza.gif
That's a kind of a network of connected points, which doesn't end and each point has its own number representing it. Let say the numbers are like this: 1-23-456-78910-etc. etc.. (You can't see the number 5 or 8,9... on the picture but they are there and their position is obvious, the point in middle of 4 and 6 is 5 and so on).
1 is connected to 2 and 3, 2 is connected to 1,3,5 and 4 etc.
The numbers 1-2-3 indicate they represent a triangle on the picture, but the numbers 1-4-6 do not because 4 is not directly connected with 6.
Let's look at 2-3-4-5, that's a parallelogram (you know why), but 4-6-7-9 is NOT a parallelogram because the in this problem there's a rule which says all the sides must be equal for all the figures - triangles and parallelograms.
Also there are hexagons, for ex. 4-5-7-9-13-12 is a hexagon - all sides must be equal here too.
12345 - that doesn't represent anything, so we ignore it.
I think i explained the problem well. The actual problem which is given to us by using an input of numbers like above to determine if that's a triangle/parallelogram/hexagon(according to the described rules).
For ex:
1 2 3 - triangle
11 13 24 26 -parallelogram
1 2 3 4 5 - nothing
11 23 13 25 - nothing
3 2 5 - triangle
I was reading computational geometry in order to solve this, but i gave up quickly, nothing seems to help here. One friend told me this site so i decided to give it a try.
If you have any ideas about how to solve this, please reply, you can use pseudo code or c++ whatever. Thank you very much.
Let's order the points like this:
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
16 17 18 19 20 21
22 23 24 25 26 27 28
You can store this in a matrix. Now let row[i] = the row number i is on and col[i] = the column number i is on. These can be computed more or less efficiently for each i.
First, sort your given numbers ascendingly. You will need exactly 3 points for a triangle, 4 for a parallelogram and 6 for a hexagon - anything else and you can dismiss it as no-figure.
Notice that we can only have right-angled triangles in this matrix, according to your rules. Label the three points A, B, C. You can check if these form a triangle by iterating from row[A] to row[B], then from col[B] to col[C] and then diagonally from row[C] to row[A] and checking to see if the distances are the same and if you get to the right positions. You can terminate this early, for example if B is 8 and A is 1, then you can tell you won't find it once you hit 11 on column 1.
For parallelograms a similar reasoning can be made. Label the 4 points A, B, C, D and remember to sort them ascendingly (remember, your points here are actually numbers). See if you can get from col[A] to col[B] on the same line, then from col[C] to col[D] on the same line and then diagonally or vertically-down from row[A] to row[C] and then (in the same direction you went the previous diagonal!) from row[B] to row[D].
Hexagons are also have a specific format you must test for. Here's how hexagons look like in this representation:
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
16 17 18 19 20 21
22 23 24 25 26 27 28
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35 36
You can notice that every two pairs of points share the same column, and that the horizontal distance between the two middle points is twice the vertical distance between any two points and also twice the horizontal distance between any other two points.
You will also want to consider rotations, so you'll need to do more tests for each case.
You don't even really need the row and col arrays unless you plan on computing them efficiently. Just walk over your matrix until you identify the first point in sorted order and try to get to the others while following each of the rules.
Not exactly a nice way, but you will only need a 256x256 matrix for this, so while this does result in quite a lot of code, it's pretty efficient. I hope I made myself clear, if not please say what isn't clear. Anyway, maybe someone else will post a better solution, so wait a while longer if you can..

Resources