I have an app (nodejs / express) that needs to find a routing rule to apply based on time of day and day of week.
So for example, I have the following business rules:
on Mondays and Tuesdays between 09:00 GMT and 12:00 GMT, I need route object ABC to "location x".
on Tuesdays between 13:00 and 13:30 I need to route ABC to "location y".
( For the purposes of this discussion, it really doesn't matter what object ABC is. )
I am debating between two options, as far as how I should design my keys in my REDIS database
Option 1
Make the day information a part of the object data, like this:
HMSET routing_rules:objectABC_09:00_12:00 days 'mon tues' location X
HMSET routing_rules:objectABC_13:00_13:30 days 'tues' location Y
Advantage of this method - When it's time to update the days list I can simply do this:
HMSET routing_rules:objectABC_09:00_12:00 days 'mon tues thu'
The downside here is that in order to find the right rule, I have to make two queries.... First do a SCAN command to find the right time range... and then if there's a match... do another query to find the days value.
Option 2
Include the day of week information as a part of the key
HMSET routing_rules:objectABC_09:00_12:00_mt location X
HMSET routing_rules:objectABC_13:00_13:30_t location Y
I would use a naming convention like
m = monday
t = tuesday
w = wed
r = thursday
etc.
The advantage to option 2 is that in order to find the right routing rule based on current time and day, I only need to run one SCAN command (we can assume that my SCAN command will return all results in one shot)
But the downside to option 2 is that when I need to add a new day to the key, I think i need to delete the key and value... and then recreate it. Is this correct?
And as of right now, the only way I know how to delete is by doing an HDEL for each value in the object, and then the key is deleted.
So for example, I've been doing something like this:
127.0.0.1:6379> HDEL routing_rules:objectABC_00:00_00:00 days 'mon tues' location x
where I have to list all the values in the object to delete the entire key / value pair.
In this example, it's not so bad because I only have two values for this key - the location and the days fields. But if there was a lot more data, it'd be a little cumbersome. And I'm not sure if there are other considerations to factor in besides the number of fields associated with this key.
If you have any suggestions on how to design this key for optimal performance and maintenance, I'm all ears. The way I see it, there is no way to avoid running the scan at least once. But this is my first redis database attempt so I apologize in advance for remedial questions / noob mistakes.
EDIT 1
Assuming that I have enough memory and assuming that I only have to save one field / value per key, let's say I create my keys like this:
SET routing_rules:objectABC_09:00_12:00_m X
SET routing_rules:objectABC_09:00_12:00_t X
SET routing_rules:objectABC_13:00_13:30_t Y
And now a request comes in for object ABC and it's Monday at UTC 11. Because my keys represent start times and end times (aka a range ), I don't see how I can find the right key / value pair without doing a scan.
Am I missing something?
I would go with a sorted set solution, a set for each object, the value should be the location*, and the score should be the minute in the week this rule expires.
e.g. (week starts at Monday 00:00)
on Mondays and Tuesdays between 09:00 GMT and 12:00 GMT, I need route
object ABC to "location x".
Monday 12:00 => 720
Tuesday 12:00 => 2160
ZADD ABC_rules 720 x 2160 x
There are two issues here, 1st your example shows times that there are no rules, so this must be taken into account. 2nd and more major, set objects must be unique, and x cannot be stored twice. Both together are the * reason above, the way to solve it is to apped/prepend the value with the minute in the week the rule starts:
Monday 9:00 => 540
Tuesday 9:00 => 1980
ZADD ABC_rules 720 x:540 2160 x:1980
To query, all you need is to use ZRANGEBYSCORE with the minute in the week, and make sure the time that you get appended to the location is before the time you sent.
Query for Monday 10:00 (600):
ZRANGEBYSCORE ABC_rules 600 +inf LIMIT 1
The result will be x:540 and since 540 is lower than 600 you know x is a valid answer.
Query for Monday 13:00 (780):
ZRANGEBYSCORE ABC_rules 780 +inf LIMIT 1
The result will be x:1980, and since 1980 is larger than your query (780) this result is invalid and you should take your default routing (or whatever your solution is to the unmapped times in your schedule).
To deleting rules you must remove the location appended with the start time:
ZREM ABC_rules x:540
You can also use the ZRANGEBYSCORE to get all rules that apply in a specific day, and you can write a LUA script that clears them.
I would not use any SCAN command in this case (and in most cases). You probably would have to invoke it multiple times to scan the whole key space, while there are other alternatives providing a direct access to the data you are looking for -which is what makes a K/V store performant.
By example, with your first solution, put all values in a hash, and get all routes in one request with HGETALL. Then, you will have to iterate on the values in your application to select the right one.
Another solution, which do not require any iteration on application side, is to create a route per day and per hour range:
SET routing_rules:objectABC_09:00_12:00_m location X
SET routing_rules:objectABC_13:00_13:30_t location Y
...
Then in one GET request you have the value you are looking for.
Adding a day does just require a SET.
The drawback compared to your solution is memory usage: it multiples entries. You didn't give any clue about number of entries, but if it's very high, it could be a problem. To reduce the memory required, you can start by using shorter key names, like r:objectABC_09001200m instead of routing_rules:objectABC_09:00_12:00_m).
Update
Given the fact the time ranges does not seem the be constant, and assuming there is no algorithm to deduce the time range from the current time, the first solution, based on using hash, seems to be better than he second, based on GET/SET. But I would name the fields according to the time ranges:
HSET routing_rules:objectABC 09:00_12:00 X
HSET routing_rules:objectABC 12:00_12:30 Y
Then I would get all Fields for a given object using HGETALL routing_rules:objectABC and iterate over the member keys to find the right one.
Related
I have a column that needs split based on "morning" and "evening" although the morning and evening times move every day (it's based on sidereal day). Calling them morning and evening is a little deceiving though because eventually the time will creep into the next day and I want to keep the groups distinct. It is more accurate to call them group 1 and group 2. It just so happens that they are around 12 hours apart so it looks like you can just separate based on time of day but once the later group creeps into the AM hours, it would start to get counted as "morning" and the earlier group would roll into the afternoon and be counted as "afternoon" See screenshot below for example data.
I need them split so I can perform operations on the value column so I can distinguish the values in the first group from the values in the second group. I thought of doing some sort of flip flop algorithm based on the previous cell but there may be a more elegant way to do it. Also, it's not shown on the example data but sometimes the day may skip but the times more or less continue in the same pattern of increasing by 3-5 minutes each day.
A date with a time stamp is stored as a number in Excel. Days are stored as whole numbers, time is stored in decimals. So, disregarding the date part, look at the decimal of the number and determine if that is before or after the time you want.
0.5 for example is midday, or 12 noon. So if the decimal part of A1 is less than 0.5, the time stamp would be in the morning.
=if(A1-int(A1)<0.5,"before noon","after noon")
It is not clear from your question how sidereal relates to the data in your sample.
I have data of customer care executives which tells about how many calls they have attend from one time another time continuously. I need to find out whether particular executive is either busy or free in a particular period. The timings for the office is 10:00 to 17:00, so I have sliced the time with one hour in each slice from 10:00 to 17:00.
The data that I have would look like as:
Note:
The data given here is some part of original data and we have 20 executives and they have 5 to 10 rows of data. For simplification we have used 3 executives with less than 5 rows for each one.
The start timings do not follow any ascending or descending order
Please suggest the formulas with out any sorting and filtering on the given data
Required: The result table should give whether particular executive is busy or free in every hour. If he is on call for one minute it should give busy for that entire one hour period
The result should be like:
The same file is attached here:
Thanks In Advance!!!
You need to put in an extra logical test in your OR function that tests for start times less than the time interval start and end times greater than the time interval end. So in cell G31 your formula should read:
=IF(OR(COUNTIFS($A$3:$A$14,A31,$C$3:$C$14,">0",$D$3:$D$14,">=14:00",$D$3:$D$14,"<15:00"),COUNTIFS($A$3:$A$14,A31,C$3:$C$14,">0",$E$3:$E$14,">=14:00",$E$3:$E$14,"<15:00"),COUNTIFS($A$3:$A$14,A31,C$3:$C$14,">0",$D$3:$D$14,"<14:00",$E$3:$E$14,">=15:00")),"Busy","Free")
In assessing how many agents can be added to certain times of day without exceeding the number of seats in the call center, I'm trying to discern how many agents are scheduled for each half hour interval on each day of the week.
Using the =SUMPRODUCT(((A$2:A$1000<=D2)+(B$2:B$1000>D2)+(A$2:A$1000>B$2:B$1000)=2)+0) formula I've been able to identify how many total agents work for each interval, however this doesn't take the day of week into account.
I currently have my spreadsheet setup this way:
K is the start time of the shift, L is the end time of the shift, M to S pulls data from another sheet that shows a 1 if the agent works on that day of the week and 0 if they do not, and then U has all the time intervals listed out. In the example, it's cut off but the columns continue down as needed. U goes to 49 and I've just been using a range from 2 to 500 for the others as we currently do not have that many shifts and I'm leaving space for the moment.
After some Googling, I tried =SUMPRODUCT(--(M2:M500="1"),(((K$2:K$1000<=U2)+(L$2:L$1000>U2)+(K$2:K$1000>L$2:L$1000)=2)+0)) but it only returns #VALUE! so I'm not sure what I'm doing wrong.
Any suggestions of how I can make this work? Please let me know if more information would be useful. Thanks.
=sumproduct(($K$2:$K$1000<=U2)*($L$2:$L$1000>=U2))
That will count the number of occurrences where the start time is less than or equal to the time in U2 AND the end time is greater than or equal to U2. It will check time from row 2 to row 1000. Any time one condition is checked and its true the comparison will result in value of TRUE and FALSE when its not true. The * act like an AND condition while at the same time converts TRUE and FALSE values to 1 and 0. So both conditions have to be true for a value of 1 to result. Sumproduct then totals up all the 1 and 0 to get you a count.
In order to consider the days of the week, you will need one thing to be true. Your headers in M1:S1 will need to be unique (which I believe they are). You will need to either repeat them in adjacent columns to M or in say V1 you have a cell that can change to the header of the day of the week you are interested in. I would assume the former so you can see each day at the same time.
In order to do this you want to add more conditions and pay attention to you reference locks $.
In V2 you could use the following formula and copy down and to the right as needed:
=sumproduct(($K$2:$K$1000<=$U2)*($L$2:$L$1000>=$U2)*(M$2:M$1000))
UPDATE #1
ISSUE 1 Time ranges ending the following day (after midnight)
It is based on the assumption that the ending time is later than the start time. When a shift starts at 22:00 and end at 6:30 as an example, our mind says that the 0630 is later than 22:00 because it is the following day. Excel however has no clue that 0630 is the following day and without extra information assumes it to be the same day. In excel date is stored as an integer and time is stored as a decimal. When you only supply the time during entry it uses 0 for the date.
In excel the date is the number of days since 1900/01/00. So one way to deal with your time out is to add a day to it. This way excel will know your out time is after your in time when the hour is actually earlier in the day.
See your sample data below.
Using your sample data, I did a straight copy of the value in L and placed it in M (=L3 and copy down). I then changed the cell display format from time to general. This allows you to see how excel sees the time. Note how all the time is less than 1.
In column N I added 1 to the value when the out time was less than the in time to indicate that it was the following day and we had not actually invented time travel. I also used the trick of a math operation will convert a TRUE/FALSE result to 1 or 0 to shorten the equation. I used =M3+(L3<K3) and copied down. You will notice in the green bands that the values are now greater than 1.
In the next column I just copied the values from N over using =N3 copied down, and then I applied just a time display format for the cell. Because it is only time format, the date is ignored and visually your time in column O looks the same as column L. The big difference is excel now knows that it is the following day.
you can quickly fix your out times by using the following formula in a blank column and then copying the results and pasting them back over the source column using paste special values.
=M2+(L2<K2)
The next part is for your time check. When looking at the 12:00 time you need to look at 1200 of the current day incase a shift started at 12:00 and you need to look at the 1200 period of the following day. In order to do that we need to modify the the original formula as follows:
=sumproduct(($K$2:$K$1000<=$U2)*($L$2:$L$1000>=$U2)*(M$2:M$1000)+($K$2:$K$1000<=$U2+1)*($L$2:$L$1000>=$U2+1)*(M$2:M$1000))
Note that the + in the middle of (M$2:M$1000) + ($K$2:$K$1000<=$U2+1)? This + acts like an OR function.
Issue 2 Time In/Out 15 minute increments, range 30 minute increments
You may be able to achieve this with the ROUNDDOWN Function or MROUND. I would combine this with the TIME function. Basically you want to have any quarter hour start time be treated as 15 minutes sooner.
=ROUNDDOWN(E3/TIME(0,30,0),0)*TIME(0,30,0)
Where E3 is your time to be converted
So your formula may wind up looking something like:
=sumproduct((ROUNDDOWN($K$2:$K$1000/TIME(0,30,0),0)*TIME(0,30,0)<=$U2)*($L$2:$L$1000>=$U2)*(M$2:M$1000)+((ROUNDDOWN($K$2:$K$1000/TIME(0,30,0),0)*TIME(0,30,0)<=$U2+1)*($L$2:$L$1000>=$U2+1)*(M$2:M$1000))
similar option could be used for the leaving time and rounding it up to the next 30 minute interval. In which case just use the ROUNDUP function. Though I am not sure it is required.
I'm trying to get 2 date limits from date restrict to calculate the difference between them.
So for example, I get January 1 and January 7, and subtract the second result from the first one.
Theoretically I should end up with the amount of new references to the given searched words in the week, right?
The thing is, sometimes I get a negative result for this, like -60000. How come 60000 references could disappear from one week to another?
Am i misinterpreting the way dateRestrict works?
my url so far is like:
first query:
https://www.googleapis.com/customsearch/v1?q=MYSEARCH&dateRestrict=d209&key=MYKEY&cx=MYID
second query:
https://www.googleapis.com/customsearch/v1?q=MYSEARCH&dateRestrict=d203&key=MYKEY&cx=MYID
The field of the response I'm using is the totalResults.
We are using couchbase as our nosql store and loving it for its capabilities.
There is however an issue that we are running in with creating associations
via view collation. This can be thought of akin to a join operation.
While our data sets are confidential I am illustrating the problem with this model.
The volume of data is considerable so cannot be processed in memory.Lets say we have data on ice-creams, zip-code and average temperature of the day.
One type of document contains a zipcode to icecream mapping
and the other one has transaction data of an ice-cream being sold in a particular zip.
The problem is to be able to determine a set of top ice-creams sold by the temperature of a given day.
We crunch this corpus with a view to emit two outputs, one is a zipcode to temperature mapping , while the other
represents an ice-cream sale in a zip code. :
Key Value
[zip1] temp1
[zip1,ice_cream1] 1
[zip2,ice_cream2] 1
The view collation here is a mechanism to create an association between the ice_cream sale, the zip and the average temperature ie a join.
We have a constraint that the temperature lookup happens only once in 24 hours when the zip is first seen and that is the valid
avg temperature to use for that day. eg lookup happened at 12:00 pm on Jan 1st, next lookup does not happen till 12:00 pm Jan 2nd.
However the avg temperature that is accepted in the 1st lookup is valid only for Jan 1st and that on the 2nd lookup only for Jan 2
including the first half of the day.
Now things get complicated when I want to do the same query with a time component involved, concretely associating the average temperature of a
day with the ice-creams that were sold on that day in that zip.eg. x vanilla icecreams were sold when the average temperature for that day is 70 F
Key Value
[y,m,d,zip1] temp1
[y,m,d,zip2,ice_cream2 ] 1
[y,m,d2,zip1,ice_cream1] 1
This has an interesting impact on the queries, say I query for the last 1 day I cannot make any associations between the ice-cream and temperature before the
first lookup happens, since that is when the two keys align. The net effect being that I lose the ice-cream counts for that day before that temperature lookup
happens. I was wondering if any of you have faced similar issues and if you are aware of a pattern or solution so as not to lose those counts.
First, welcome to StackOverflow, and thank you for the great question.
I understand the specific issue that you are having, but what I don't understand is the scale of your data - so please forgive me if I appear to be leading down the wrong path with what I am about to suggest. We can work back and forth on this answer depending on how it might suit your specific needs.
First, you have discovered that CB does not support joins in its queries. I am going to suggest that this is not really an issue if when CB is used properly. The conceptual model for how Couchbase should be used to filter out data is as follows:
Create CB view to be as precise as possible
Select records as precisely as possible from CB using the view
Fine-filter records as necessary in data-access layer (also perform any joins) before sending on to rest of application.
From your description, it sounds to me as though you are trying to be too clever with your CB view query. I would suggest one of two courses of action:
Manually look-up the value that you want when this happens with a second view query.
Look up more records than you need, then fine-filter afterward (step 3 above).