I am trying to combine four graphs in Stata using graph combine.
The result is shown in the following figure:
All four figures should be of equal size but because of the horizontal ytitle, the first two are compressed. Is there a way to control how graph combine re-sizes the figures?
I have tried ysize and xsize but this seems to be overwritten by graph combine.
Below you can find the code that generates the figure:
sysuse auto, clear
graph drop _all
# delimit ;
* First 2 figures;
twoway (line weight mpg if foreign == 1,
sort ytitle("Some longer ytitle", orientation(horizontal))
title("Foreign", box bexpand) yla(, ang(h)) xtitle("")
xlabel(,noticks) name(A1, replace ) graphregion(color(gs16)));
twoway (line weight mpg if foreign == 1, sort
ytitle("short", orientation(horizontal)) yla(, ang(h)) xtitle("")
xlabel(,noticks) name(A2, replace ) graphregion(color(gs16)));
graph combine A1 A2, cols(1) name(A, replace) imargin(b=0 t=0);
* Second 2 figures;
twoway (line weight mpg if foreign == 0, sort ytitle("")
title("Domestic", box bexpand) xtitle("") xlabel(,noticks)
name(B1, replace ) graphregion(color(gs16)) );
twoway (line weight mpg if foreign == 0, sort ytitle("") xtitle("")
xlabel(,noticks) name(B2, replace ) graphregion(color(gs16)));
graph combine B1 B2, cols(1) name(B, replace) imargin(b=0 t=0);
* Combining the two
graph combine A B ;
You need to change the orientation for each ytitle to vertical and combine the graphs only once in the desired order.
The following will give you figures of equal size as per your request:
sysuse auto, clear
graph drop _all
# delimit ;
* First 2 figures;
twoway (line weight mpg if foreign == 1,
sort ytitle("Some longer ytitle", orientation(vertical))
title("Foreign", box bexpand) yla(, ang(h)) xtitle("")
xlabel(,noticks) name(A1, replace ) graphregion(color(gs16)));
twoway (line weight mpg if foreign == 1, sort
ytitle("short", orientation(vertical)) yla(, ang(h)) xtitle("")
xlabel(,noticks) name(A2, replace ) graphregion(color(gs16)));
* Second 2 figures;
twoway (line weight mpg if foreign == 0, sort ytitle("")
title("Domestic", box bexpand) xtitle("") xlabel(,noticks)
name(B1, replace ) graphregion(color(gs16)) );
twoway (line weight mpg if foreign == 0, sort ytitle("") xtitle("")
xlabel(,noticks) name(B2, replace ) graphregion(color(gs16)));
* Combining the 4 graphs;
graph combine A1 B1 A2 B2;
I would also recommend for the graphs in the first column to rotate their yaxis tick labels in a vertical angle so they match those in the graphs in the second column:
Notice that by decreasing the size of the tick value labels for both axes, you can give more prominence to the ytitle. You may need to adjust the spacing between the ytitle and the yaxis tick labels though.
EDIT:
You can "brute force" Stata to do what you like but you will never get exactly what you want. This is because of the variable ytitle length, which affects the entire graph area.
A quick solution is the following:
sysuse auto, clear
graph drop _all
# delimit ;
* First 2 figures;
twoway (line weight mpg if foreign == 1,
sort ytitle("Some longer ytitle", orientation(h))
title("Foreign", box bexpand) yla(, ang(h)) xtitle("")
xlabel(,noticks) name(A1, replace ) graphregion(color(gs16)));
twoway (line weight mpg if foreign == 1, sort
ytitle(" short", orientation(h)) yla(, ang(h)) xtitle("")
xlabel(,noticks) name(A2, replace ) graphregion(color(gs16)));
* Second 2 figures;
twoway (line weight mpg if foreign == 0, sort ytitle("")
title("Domestic", box bexpand) xtitle("") xlabel(,noticks)
name(B1, replace ) graphregion(color(gs16)) );
twoway (line weight mpg if foreign == 0, sort ytitle("") xtitle("")
xlabel(,noticks) name(B2, replace ) graphregion(color(gs16)));
* Combining the 4 graphs;
graph combine A1 B1 A2 B2, xsize(7);
Notice the changes in the code, which are indicated in bold.
You can also play around with the values and see if you can improve things a bit:
Specifying a right margin in the graphregion option of the second 2 figures also improves things:
twoway (line weight mpg if foreign == 0, sort ytitle("")
title("Domestic", box bexpand) xtitle("") xlabel(,noticks)
name(B1, replace ) graphregion(color(gs16) margin(r=22)));
twoway (line weight mpg if foreign == 0, sort ytitle("") xtitle("")
xlabel(,noticks) name(B2, replace ) graphregion(color(gs16) margin(r=22)));
Related
I have a huge Order-Master-Table that contains the complete manufacturing routing for each order.
Now I would like to determine the last tracked quantity in this table.
The table has a structure like:
Order Nr
Prio 1 (Number)
Prio 2 (Number)
Prio 3 (Text)
Quantity
The measure should look at Prio 1 at first, then at Prio 2, and at last on Prio 3.
Prio and Prio 2 are Numbers between 2 and 99 possible.
In Prio 3 are different Text strings possible, but for the tracked quantity are just "OX" and "OK" necessary. Here has "OX" always more priority than "OK".
How would you build that as a DAX Measure?
MAXX, CALCULATEDTABLE, RANK, TOPN?
This is my first approach:
VAR Max1 =
MAXX (
ALLSELECTED ( 'Order Master' ),
'Order Master'[Prio 1]
)
VAR Max2 =
MAXX (
ALLSELECTED ( 'Order Master' ),
'Order Master'[Prio 2]
)
RETURN
MAXX(
FILTER(
'Order Master',
'Order Master'[Prio 1] = Max1 && 'Order Master'Prio 2] = Max2
),
'Order Master'[Quantity]
)
I would translate "OX" and "OK" into numbers, just like the other priorities. This can be done very simply using an IF-formula:
=IF(A1="OX";0;IF(A1="OK";1;-1))
(Just as an example, obviously)
If, however, by "OX" you mean every possible value for "X", like "O1", "O32", ..., then I'd propose you to write a VBA function for performing the mentioned translation.
I have a data structure like this:
red vertices are POSITION
cyan vertices are BIKE
pink vertex is USER
GRAPH TD
((BIKE)) -- LOCATED -->((POSITION))
where BIKE is one with many POSITION associated
every position have latitude and longitude property and a timestamp
I would like to find all bikes who's position is near 40Km from a coordinate pair, not older than 48 Hrs
what I'm doing so far is:
FOR pos IN NEAR(positions, 45.5063575, 9.24157653499384, 40, "distance")
FILTER pos.timestamp >= DATE_SUBTRACT(DATE_NOW(), "PT48H")
SORT pos.timestamp DESC, pos.distance DESC
RETURN {'position': pos,'bike':(FOR bike IN OUTBOUND pos located RETURN bike)}
but this query is returning all positions with their bikes, I would like the last position (closest in time) and the belonging bike
thanks for your help
You need something like that:
FOR pos IN WITHIN(positions, 45.5063575, 9.24157653499384, 40, "distance")
FILTER pos.timestamp >= DATE_SUBTRACT(DATE_NOW(), "PT48H")
SORT pos.timestamp DESC
FOR b IN OUTBOUND pos located
COLLECT bike = b INTO bike_positions
RETURN {'bike': bike, 'position': bike_positions[0].pos}
Note: You should use WITHIN instead of NEAR.
UPDATE after #Loki comment:
FOR pos IN positions
FILTER pos.timestamp >= DATE_SUBTRACT(DATE_NOW(), "PT200H") AND
GEO_DISTANCE([45.47942614827045, 9.24157653499384], pos.coordinates)
SORT pos.timestamp DESC
FOR b IN OUTBOUND pos located
COLLECT bike = b INTO bike_positions
RETURN {'bike': bike, 'position': bike_positions[0].pos}
I have a table with 2 rows:
Percent Value
--------- -------
99.95 230
99.92 130
99.05 94
I want to change this so that if there are gaps in the percent column (e.g. 99.94, 99.93, 99.91...), I want to create that row with the value from the previous row. so for example, 99.94 and 99.93 would have value of 130 and 99.91 would have value of 94.
Windowing function requires knowing fixed offset and also i don't think i can use it to populate new set of tables with more # of rows.
I think i can make it work by generating a number sequence table and cross join with this table, however, I don't know how to generate a dummy CTE with number sequence from 00.00 to 100.00 at 0.01 increment.
any help would be appreciated
As you suggested in your question, you can do it with a sequence table (by unnesting the output of the sequence function) and the lag window function like this:
WITH data(p, v) AS (VALUES
(99.95, 230),
(99.92, 130),
(99.05, 94)
),
sequence(p) AS (
SELECT x/100.00 FROM unnest(sequence(1, 10000)) t(x)
)
SELECT
sequence.p,
coalesce(v, lag(v) IGNORE NULLS OVER (ORDER BY sequence.p))
FROM data RIGHT JOIN sequence ON data.p = sequence.p
I am working with three datasets in MATLAB, e.g.,
Dates:
There are D dates that are chars each, but saved in a cell array.
{'01-May-2019','02-May-2019','03-May-2019'....}
Labels:
There are 100 labels that are strings each, but saved in a cell array.
{'A','B','C',...}
Values:
[0, 1, 2,...]
This is one row of the Values matrix of size D×100.
I would like the following output in Excel:
date labels Values
01-May-2019 A 0
01-May-2019 B 1
01-May-2019 C 2
till the same date repeats itself 100 times. Then, the next date is added (+ repeated 100 times) onto the subsequent row along with the 100 labels in the second column and new values from 2nd row of Values matrix transposed in third column. This repeats until the date length D is reached.
For the first date, I used:
c_1 = {datestr(datenum(dates(1))*ones(100,1))}
c_2 = labels
c_3 = num2cell(Values(1,:)')
xlswrite('test.xls',[c_1, c_2, c_3])
but, unfortunately, this seemed to have put everything in one column, i.e., the date, then, labels, then, 1st row of values array. I need these to be in three columns.
Also, I think that the above needs to be in a for loop over each day that I am considering. I tried using the table function, but, didn't have much luck with it.
How to solve this efficiently?
You can use repmat and reshape to build your columns and (optionally) add them to a table for exporting.
For example:
dates = {'01-May-2019','02-May-2019'};
labels = {'A','B', 'C'};
values = [0, 1, 2];
n_dates = numel(dates);
n_labels = numel(labels);
dates_repeated = reshape(repmat(dates, n_labels, 1), [], 1);
labels_repeated = reshape(repmat(labels, n_dates, 1).', [], 1);
values_repeated = reshape(repmat(values, n_dates, 1).', [], 1);
full_table = table(dates_repeated, labels_repeated, values_repeated);
Gives us the following table:
>> full_table
full_table =
6×3 table
dates_repeated labels_repeated values_repeated
______________ _______________ _______________
'01-May-2019' 'A' 0
'01-May-2019' 'B' 1
'01-May-2019' 'C' 2
'02-May-2019' 'A' 0
'02-May-2019' 'B' 1
'02-May-2019' 'C' 2
Which should export to a spreadsheet with writetable as desired.
What we're doing with repmat and reshape is "stacking" the values and then converting them into a single column:
>> repmat(dates, n_labels, 1)
ans =
3×2 cell array
{'01-May-2019'} {'02-May-2019'}
{'01-May-2019'} {'02-May-2019'}
{'01-May-2019'} {'02-May-2019'}
We transpose the labels and values so they get woven together (e.g [0, 1, 0, 1] vs [0, 0, 1, 1]), as repmat is column-major.
If you don't want the intermediate table, you can use num2cell to create a cell array from values so you can concatenate all 3 cell arrays together for xlswrite (or writematrix, added in R2019a, which also deprecates xlswrite):
values_repeated = num2cell(reshape(repmat(values, n_dates, 1).', [], 1));
full_array = [dates_repeated, labels_repeated, values_repeated];
Based on this paper:
IEEE TRANSACTIONS ON PAITERN ANALYSIS : Computation of Normalized Edit Distance and Applications In this paper Normalized Edit Distance as followed:
Given two strings X and Y over a finite alphabet, the normalized edit
distance between X and Y, d( X , Y ) is defined as the minimum of W( P
) / L ( P )w, here P is an editing path between X and Y , W ( P ) is
the sum of the weights of the elementary edit operations of P, and
L(P) is the number of these operations (length of P).
Can i safely translate the normalized edit distance algorithm explained above as this:
normalized edit distance =
levenshtein(query 1, query 2)/max(length(query 1), length(query 2))
You are probably misunderstanding the metric. There are two issues:
The normalization step is to divide W(P) which is the weight of the edit procedure over L(P), which is the length of the edit procedure, not over the max length of the strings as you did;
Also, the paper showed that (Example 3.1) normalized edit distance cannot be simply computed with levenshtein distance. You probably need to implement their algorithm.
An explanation of Example 3.1 (c):
From aaab to abbb, the paper used the following transformations:
match a with a;
skip a in the first string;
skip a in the first string;
skip b in the second string;
skip b in the second string;
match the final bs.
These are 6 operations which is why L(P) is 6; from the matrix in (a), matching has cost 0, skipping has cost 2, thus we have total cost of 0 + 2 + 2 + 2 + 2 + 0 = 8, which is exactly W(P), and W(P) / L(P) = 1.33. Similar results can be obtained for (b), which I'll left to you as exercise :-)
The 3 in figure 2(a) refers to the cost of changing "a" to "b" or the cost of changing "b" to "a". The columns with lambdas in figure 2(a) mean that it costs 2 in order to insert or delete either an "a" or a "b".
In figure 2(b), W(P) = 6 because the algorithm does the following steps:
keep first a (cost 0)
convert first b to a (cost 3)
convert second b to a (cost 3)
keep last b (cost 0)
The sum of the costs of the steps is W(P). The number of steps is 4 which is L(P).
In figure 2(c), the steps are different:
keep first a (cost 0)
delete first b (cost 2)
delete second b (cost 2)
insert a (cost 2)
insert a (cost 2)
keep last b (cost 0)
In this path there are six steps so the L(P) is 6. The sum of the costs of the steps is 8 so W(P) is 8. Therefore the normalized edit distance is 8/6 = 4/3 which is about 1.33.