How do I interpret the includedQuantity in Azure's rate card API? - azure

I am trying to calculate the cost incurred on my Azure Pay-As-You-Go subscription using the usage and rate-card API's. For this I came across this parameter includedQuantity in the rate-card API's which, according to the documentation, refers to "The resource quantity that is included in the offer at no cost. Consumption beyond this quantity will be charged."
Consider an example, where the usageQuantity is 700 and the rate-card is as follows:
0 : 20
101 : 15
501 : 10
and the includedQuantity is 200.
My assumption was, the calculation would be as one of the following:
Quantity = (700 - 200) = 500
Hence, cost = 100 * 20 + 400 * 15 = 8000
New rate card:
0 : 0
101 : 0
201 : 15
501 : 10
So, cost = 300 * 15 + 200 * 10 = 6500
I have seen this question, but it does not clarify the includedQuantity properly.

Great question! So I checked with Azure Billing team on this and what they told me is that they will first take off the included units (200 in your example) and then apply graduated pricing on the remaining units.
Based on this, your cost would be 4500:
Total units consumed: 700
Included units: 200
Tiered pricing: {0-100 = 0; 101-200 = 0; 201-500=15; 501-No Upper Limit=10}
4500 = 0 x 100 + 0 x 100 + 15 x 300

Related

Cassandra READ Where In performance

I have a Cassandra cluster of 6 nodes, each one has 96 CPU/800 RAM.
My table for performance tests is:
create table if not exists space.table
(
id bigint primary key,
data frozen<list<float>>,
updated_at timestamp
);
Table contains 150.000.000 rows.
When I was testing it with query:
SELECT * FROM space.table WHERE id = X
I even wasn't able to overload cluster, the client was overloaded by itself, RPS to cluster were 350.000.
Now I'm testing a second test case:
SELECT * FROM space.table WHERE id in (X1, X2 ... X3000)
I want to get 3000 random rows from Cassandra per request.
Max RPS in this case 15 RPS after that occurs a lot of pending tasks in Cassandra thread pool with type: Native-Transport-Requests.
Isn't it the best idea to get big resultsets from cassandra? What is the best practice, for sure I can divide 3000 rows to separate requests, for example 30 request each with 100 ids.
Where can I find info about it, maybe WHERE IN operation is not good from performance perspective?
Update:
Want to share my measurements for getting 3000 rows by different chunk size from Cassandra:
Test with 3000 ids per request
Latency: 5 seconds
Max RPS to cassandra: 20
Test with 100 ids per request (total 300 request each by 100 ids)
Latency at 350 rps to service (350 * 30 = 10500 requests to cassandra): 170 ms (q99), 95 ms (q90), 75 ms(q50)
Max RPS to cassandra: 350 * 30 = 10500
Test with 20 ids per request (total 150 request each by 20 ids)
Latency at 250 rps to service(250 * 150 = 37500 requests to cassandra): 49 ms (q99), 46 ms (q90), 32 ms(q50)
Latency at 600 rps to service(600 * 150 = 90000 requests to cassandra): 190 ms (q99), 180 ms (q90), 148 ms(q50)
Max RPS to cassandra: 650 * 150 = 97500
Test with 10 ids per request (total 300 request each by 10 ids)
Latency at 250 rps to service(250 * 300 = 75000 requests to cassandra): 48 ms (q99), 31 ms (q90), 11 ms(q50)
Latency at 600 rps to service(600 * 300 = 180000 requests to cassandra): 159 ms (q99), 95 ms (q90), 75 ms(q50)
Max RPS to cassandra: 650 * 300 = 195000
Test with 5 ids per request (total 600 request each by 5 ids)
Latency at 550 rps to service(550 * 600 = 330000 requests to cassandra): 97 ms (q99), 92 ms (q90), 60 ms(q50)
Max RPS to cassandra: 550 * 660 = 363000
Test with 1 ids per request (total 3000 request each by 1 ids)
Latency at 190 rps to service(250 * 3000 = 750000 requests to cassandra): 49 ms (q99), 43 ms (q90), 30 ms(q50)
Max RPS to cassandra: 190 * 3000 = 570000
The IN is really not recommended to use, especially for so many individual partition keys. The problem is that when you send query with IN:
query sent to the any node (coordinator node), not necessary node that is owning the data
then that coordinator node identifies which nodes are owning data for specific partition keys
queries are sent to identified nodes
coordinator node collects results from all nodes
result is consolidated and sent back
This puts a lot of load onto the coordinator node, and making the whole query as slow as the slowest node in the cluster.
The better solution would be to use prepared queries and sent individual async requests for each of partition keys, and then collect data in your application. Just take into account that there are limits on how many in-flight queries could be per connection.
P.S. It should be possible to optimize that further, by looking into your values, finding if different partition keys are in the same token range, generate the IN query for all keys in the same token range, and send that query setting the routing key explicitly. But it requires more advanced coding.

SAS Proc IML Simulate from empirical data with limits

This might sound bonkers, but looking to see if there are any ideas on how to do this.
I have N categories (say 7) where a set number of people (say 1000) have to be allocated. I know from historical data the minimum and maximum for each category (there is limited historical data, say 15 samples, so I have data that looks like this - if I had a larger sample, I would try to generate a distribution for each category from all the samples, but there isn't.
-Year 1: [78 97 300 358 132 35 0]
-Year 2: [24 74 346 300 148 84 22]
-.
-.
-Year 15:[25 85 382 302 146 52 8]
The min and max for each category over these 15 years of data is:
Min: [25 74 252 278 112 27 0 ]
Max: [132 141 382 360 177 84 22]
I am trying to scale this using simulation - by allocating 1000 to each category within the min and max limits, and repeating it. The only condition is that the sum of the allocation across the seven categories in each simulation has to sum to 1000.
Any ideas would be greatly appreciated!
The distribution you want is called the multinomial distribution. You can use the RandMultinomial function in SAS/IML to produce random samples from the multinomial distribution. To use the multinomial distribution, you need to know the probability of an individual in each category. If this probability has not changed over time, the best estimate of this probability is to take the average proportion in each category.
Thus, I would recommend using ALL the data to estimate the probability, not just max and min:
proc iml;
X = {...}; /* X is a 15 x 7 matrix of counts, each row is a year */
mean = mean(X);
p = mean / sum(mean);
/* simulate new counts by using the multinomial distribution */
numSamples = 10;
SampleSize = 1000;
Y = randmultinomial(numSamples, SampleSize, p);
print Y;
Now, if you insist on using the max/min, you could use the midrange to estimate the most likely value and use that to estimate the probabilty, as follows:
Min = {25 74 252 278 112 27 0};
Max = {132 141 382 360 177 84 22};
/* use midrange to estimate probabilities */
midrange = (Min + Max)/2;
p = midrange / sum(midrange);
/* now use RandMultinomial, as before */
If you use the second method, there is no guarantee that the simulated values will not exceed the Min/Max values, although in practice many of the samples will obey that criterion.
Personally, I advocate the first method, which uses the average count. Or you can use a time-weighted count, if you think recent observations are more relevant than observations from 15 years ago.

Using rdrobust to calculate 2sls and getting error message stating "should be set within the range of x"

I am trying to calculate a two stage least squares in Stata. My dataset looks like the following:
income bmi health_index asian black q_o_l age aide
100 19 99 1 0 87 23 1
0 21 87 1 0 76 29 0
1002 23 56 0 1 12 47 1
2200 24 67 1 0 73 43 0
2076 21 78 1 0 12 73 1
I am trying to use rdrobust to estimate the treatment effect. I used the following code:
rdrobust q_o_l aide health_index bmi income asian black age, c(10)
I varied the income variable with multiple polynomial forms and used multiple bandwidths. I keep getting the same error message stating:
c() should be set within the range of aide
I am assuming that this has to do with the bandwidth. How can I correct it?
You have two issues with the syntax. You wrote:
rdrobust q_o_l aide health_index bmi income asian black age, c(10)
This will ignore health_index-age variables, since you can only have one running variable. It will then try to use a cutoff of 10 for aide (the second variable is always the running one). Since aide is binary, Stata complains.
It's not obvious to me what makes sense in your setting, but here's an example demonstrating the problem and the two remedies:
. use "http://fmwww.bc.edu/repec/bocode/r/rdrobust_senate.dta", clear
. rdrobust vote margin, c(0) covs(state year class termshouse termssenate population)
Covariate-adjusted sharp RD estimates using local polynomial regression.
Cutoff c = 0 | Left of c Right of c Number of obs = 1108
-------------------+---------------------- BW type = mserd
Number of obs | 491 617 Kernel = Triangular
Eff. Number of obs | 309 279 VCE method = NN
Order est. (p) | 1 1
Order bias (q) | 2 2
BW est. (h) | 17.669 17.669
BW bias (b) | 28.587 28.587
rho (h/b) | 0.618 0.618
Outcome: vote. Running variable: margin.
--------------------------------------------------------------------------------
Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+------------------------------------------------------------
Conventional | 6.8862 1.3971 4.9291 0.000 4.14804 9.62438
Robust | - - 4.2540 0.000 3.78697 10.258
--------------------------------------------------------------------------------
Covariate-adjusted estimates. Additional covariates included: 6
. sum margin
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
margin | 1,390 7.171159 34.32488 -100 100
. rdrobust vote margin state year class termshouse termssenate population, c(7) // margin rang
> es from -100 to 100
Sharp RD estimates using local polynomial regression.
Cutoff c = 7 | Left of c Right of c Number of obs = 1297
-------------------+---------------------- BW type = mserd
Number of obs | 744 553 Kernel = Triangular
Eff. Number of obs | 334 215 VCE method = NN
Order est. (p) | 1 1
Order bias (q) | 2 2
BW est. (h) | 14.423 14.423
BW bias (b) | 24.252 24.252
rho (h/b) | 0.595 0.595
Outcome: vote. Running variable: margin.
--------------------------------------------------------------------------------
Method | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------------+------------------------------------------------------------
Conventional | .1531 1.7487 0.0875 0.930 -3.27434 3.58053
Robust | - - -0.0718 0.943 -4.25518 3.95464
--------------------------------------------------------------------------------
. rdrobust vote margin state year class termshouse termssenate population, c(-100) // nonsensical
> cutoff for margin
c() should be set within the range of margin
r(125);
end of do-file
r(125);
You might also find this answer interesting.

Azure SQL Full Text Index initial population slow

I have a table with approximately 4.7 million records. I created a full text index on it.
I am experiencing slow initial population of the full text index.
Initial pricing tier that i had was S1, I upgraded it to S3 but i did not get better performance.
DTU and CPU are not high (usually staying around 0% ), the current velocity is about 175000 records per hour.
What can i do to speed this up?
Thanks in advance.
LE.
I tried same operation on a local instalation of SQL Server 2014, i had no problems with indexing the data.
Update 14.11.2016
Output to dm_Exec_requests
session_id request_id start_time status command sql_handle statement_start_offset statement_end_offset plan_handle database_id user_id connection_id blocking_session_id wait_type wait_time last_wait_type wait_resource open_transaction_count open_resultset_count transaction_id context_info percent_complete estimated_completion_time cpu_time total_elapsed_time scheduler_id task_address reads writes logical_reads text_size language date_format date_first quoted_identifier arithabort ansi_null_dflt_on ansi_defaults ansi_warnings ansi_padding ansi_nulls concat_null_yields_null transaction_isolation_level lock_timeout deadlock_priority row_count prev_error nest_level granted_query_memory executing_managed_code group_id query_hash query_plan_hash statement_sql_handle statement_context_id dop parallel_worker_count external_script_request_id
90 0 57:45.2 running SELECT 0x020000004D4F6005A3E8119F3DD3297095832ABE63E312F20000000000000000000000000000000000000000 0 66 0x060005004D4F6005D04F998A6E00000001000000000000000000000000000000000000000000000000000000 5 1 70A61674-396D-47EB-82C7-F3C13DAA2AD0 0 NULL 0 MEMORY_ALLOCATION_EXT 0 1 141037 0x380035003100450039003200350032002D0045003700450032002D0034003600320041002D0039004200390041002D003200310037004400300036003700430032004100360039 0 0 1 1 0 0x7A218C885C2F7437 0 0 228 2147483647 us_english mdy 7 1 1 1 0 1 1 1 1 2 -1 0 1 0 0 0 0 2000000026 0xC1681A4180C2C052 0x63AD167562BDAE5D 0x0900A3E8119F3DD3297095832ABE63E312F20000000000000000000000000000000000000000000000000000 7 1 NULL NULL
As i can see on P1 this seems much faster though. It is strange because it is not much more powerfull than S3.
I will mark it as solved because, it seems this is an issue related to service tier levels.
If you bump up the service tier of Azure database the full text indexing will run much faster than at standard level.
I could not sense a diference between S1 and S3, but P1 versus S3 is much faster.
I do not know the resoning behind this, even though the diference in DTU's is only 25 (S3: 100 DTU, P1: 125 DTU )

Screen resolution 1024px or 960px? [duplicate]

This question already has answers here:
Why width 960px?
(7 answers)
Closed 8 years ago.
Back in the day we use to design website with resolution:
1024px(width) by 768px(height)
but to avoid getting scrollbar from appearing from both sides we use a slightly smaller resolution, may be:
1000px(width) by 620px(height)---rectify me if I'm wrong on this one
but my main concern is how did 960 grid system comes to place? I know it is good for laying out contents accordingly within the 960px grid, but if so why not just use 1000px instead? Since it is mostly use size during that time?
The reasoning is like this:
1024 x 768 was a common resolution at which designs were aimed.
Subtract 24px for the scroll bar - 1000px;
960px leaves additional breathing room and importantly - has many more factors (division) than 1000px. 960 is therefore the ideal choice for a grid system.
Here are the factors:
960
1 2 3 4 5 6 8 10 12 15 16 20 24 30 32 40 48 60 64
80 96 120 160 192 240 320 480 960
1000
1 2 4 5 8 10 20 25 40 50 100 125 200 250 500 1000
Credit to the answers on this post: Why width 960px?

Resources