Amazon Cloud Search - get places by time and date - search

I am using Amazon CloudSearch to store a large set of places.
Each place has a opening time and a closing time, for each day of the week.
I need to retrieve places by current time. How do you suggest to model the index?
I am thinking to solve the problem by creating 7 text indexes in which I specify, for each day of the week, the valid hours.
For example, if a place is opened from 9 am to 13 am, in the index "monday" I will write the string "9-10-11-12". Then, filtering by bq=monday:'10' or bq=monday:'16' I will have only the places that at the specified time are opened.
Any other idea? My solution seems working but would suggest me another approach?

First, I wouldn't use multiple indexes.
You could use your approach, but just make the time in hours from the start of the week. So, Monday would be 0-23, Tuesday 24-47, etc. Or you could just have 7 fields, "monday_hours", "tuesday_hours", …
You could also use uints, instead of strings. Not better, but different, might be worth benchmarking.
With uints you can use range queries. If the document contained the fields "open" and "close" and you want to know if it's open between 10 and 12.
&bq=(and open:..12 close:10..)
One issue remaining is that CloudSearch's range searches are inclusive of endpoints. So I think this will show a false positive if the store opens at twelve. Technically, the ranges overlap, but not usefully. To fix that, I'd do two things. First, I wouldn't go by hours, I'd use minute-of-the-day as the value in the field (0 to 1439). Then add one to the starting range, and subtract one from the end.
Using uints will perform differently from using text fields. I'd definitely benchmark them to see which one works better for you.

Related

How to identified the smaller number only from specific lines of a table

I am not familiar with excel as you can probaly guess by my question so I am sorry if it's a silly question but I have been googling for a long time and I can't do it.
I manage to do it in excel 365 with the function filter, but I can't on excel 2019 (I am required to do it in excel 2019)
I want to identifed the smaller number of a specific combination of cells using two table.
Table1 has name of people and places as well as a number. (the number shows the difference of the last time a person went to a place and the [in months])
(In this project the inspector cannot go to the same place twice unless 4 months have pass, thus why I want the smaller number, using the date of the last visited and the fcuntion now I get teh number of months that have pass)
Table2 has only the name of one person out of these people but has the name of all places. I want to get the smaller number for every place.
This is my table1: (I hided other peoples names so I can show a more compact examlplo)
And this is my table2:
I thought that I could use a function aggregate with a function if inside of it to get only the values that I desire.
It did not worked thou. Was I had miss undertand the fact that function if only gives me true or false. But thought that the aggregate function could wordk. It did not as well
=AGGREGATE(5;3;A2&B2=Table1[#Place]&Table1[#name];1).
overall my question could be summarize to which funtion should I used?
Which function should I use?
obs: In excel 365 I used concat to make a code an thus only used one cell, but I don't see why it wouldn't work if I just select two cells insted of one (teh concat cell)

Criteria cutoffs for INDEX

I'm not even sure how to ask this so please excuse the roundabout manner forthcoming.
I have a list of tasks and would like to use =INDEX to create my array. However, there are multiple different versions of the task that could show up, and I would like to have all possible avenues covered when creating (only 4 differences).
The name of the range is TaskCode. I want to have it so I can return the first seven numbers, the period, and then only the digits directly after the period. So in case 1, I would want 0527011.3, in case 2 I would want 0527011.01, in case 3 I would want 0527011.23, and in case 4 I'd want 0527011.3.
I initially did =LEFT(TaskCode,10) but that will obviously not work in case 1 or 4. Basically I need to say cut off EITHER at the second period OR the first blank.
Thanks
=LEFT(A2,FIND("|",SUBSTITUTE(A2&".",".","|",2))-1)

How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d
I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.
I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

Fast repeated row counting in vast data - what format?

My Node.js app needs to index several gigabytes of timestamped CSV data, in such a way that it can quickly get the row count for any combination of values, either for each minute in a day (1440 queries) or for each hour in a couple of months (also 1440). Let's say in half a second.
The column values will not be read, only the row counts per interval for a given permutation. Reducing time to whole minutes is OK. There are rather few possible values per column, between 2 and 10, and some depend on other columns. It's fine to do preprocessing and store the counts in whatever format suitable for this single task - but what format would that be?
Storing actual values is probably a bad idea, with millions of rows and little variation.
It might be feasible to generate a short code for each combination and match with regex, but since these codes would have to be duplicated each minute, I'm not sure it's a good approach.
Or it can use an embedded database like SQLite, NeDB or TingoDB, but am not entirely convinced since they don't have native enum-like types and might or might not be made for this kind of counting. But maybe it would work just fine?
This must be a common problem with an idiomatic solution, but I haven't figured out what it might be called. Knowing what to call this and how to think about it would be very helpful!
Will answer with my own findings for now, but I'm still interested to know more theory about this problem.
NeDB was not a good solution here as it saved my values as normal JSON behind the hood, repeating key names for each row and adding unique IDs. It wasted lots of space and would surely have been too slow, even if just because of disk I/O.
SQLite might be better at compressing and indexing data, but I have yet to try it. Will update with my results if I do.
Instead I went with the other approach I mentioned: assign a unique letter to each column value we come across and get a short string representing a permutation. Then for each minute, add these strings as keys iff they occur, with the number of occurrences as values. We can later use our dictionary to create a regex that matches any set of combinations, and run it over this small index very quickly.
This was easy enough to implement, but would of course have been trickier if I had had more possible column values than the about 70 I found.

Any solution to the Today Calculated Column problem is SharePoint?

I would like to be able to use today's date in a calculated column in a SharePoint list to, for example, determine whether a task is overdue. There is a well-documented trick that involves creating a dummy column named "Today," using it in a formula, and then deleting it, thereby "tricking" SharePoint into using the Today function.
The problem is that this method does not work reliably -- the calculation is not dynamic; it is only made when the item is saved, and therefore the Today "column" effectively becomes the Modified Date. (This is probably why SharePoint won't let you use the Today function in a straight-forward way.)
Has anyone found a solution that works? I know I can use javascript to get the actual date on the client side and display colors, flags, whatever, but I am looking for a "server side" solution.
For reference, the Today column trick and its problems are described fairly well at these two posts and associated comments:
http://blogs.msdn.com/cjohnson/archive/2006/03/16/552314.aspx and http://pathtosharepoint.wordpress.com/2008/08/14/calculated-columns-the-useless-today-trick/
There simply isn't a work around for this. As the values for the list are stored in the database and returned "as is" to other featurs such as the search crawler, a dynamic field cannot be created.
It is possible to create a custom field that will display the value using todays date in its calculation.
In addition to Christophe's (PathToSharePoint)'s article this also covers the Today trick and why it doesn't work
The Truth about using Today in calculated columns
There are a number of fudges, probably the best one is Dessie's console app (mentioned above by MNM)
Dynamically updating a SharePoint calculated column containing a Today reference
Its good but its not perfect, for example you may have to worry about different timezones.
Before going down this route you should ask yourself if you really, really need to do this. For example :-
If you want a countdown (days overdue/days left to complete a task) then you can use SPD and a XLST Data View web part
If you want a view to show overdue items or items created in the last X days ec then you can use [Today] in a views filter 2
If you create a Today column it needs to be updated. You can do that with either a timer job or by placing a jquery script on a page that is hit by the user. The script could call SPServices.SPUpdateMultipleListItems to do the update. Pass a CAML clause so that you only update the list items where the Today value needs to be updated, e.g. once per day.
My advice is to create your on field that does this calculation for you and then reference it in your SharePoint list. Not a simple implementation but it would work.
I have been looking for a solution either, still no luck.. The Today column trick has the limitation of not being dynamic.
I do have one suggestion though, why don't we create a timer job that will update a certain a certain column with the current date every day at 12 AM. I know some of you all might think it an over head. Just my suggestion :D!!
I came up with a very rough, but working solution to this problem without having to do any coding. I'll explain both how i made the today column and how i worked that in to an overdue column, becuase that column was a pain to find out how to do as well.
First, I made a column named "today" (gasp!). Next I made a column named "Days Overdue". I then opened up sharepoint designer and created a new workflow. I set it to run every time an item is edited/updated (keep in mind I turned off versioning for this list, otherwise I would have had to resort to coding to avoid a bunch of useless data building up on our server). I set the actions to simply store the modified date in a workflow variable, then change the value of the today column to that variable. although the modified column is a date/time and my today column is just a date, it transfers just fine. I then set the workflow to pause for 2 hours. you can set this to whatever amount of time you want obviously, it will just change the latest possible time for your today column to update, i.e. 2AM in my case.
on to the days overdue column. this is the code for that guy -
=IF([Due Date]>Today,"None",IF([Date Closed]=0,Today-[Due Date],IF([Due Date]>[Date Closed],"None",IF(Today>=[Date Closed],[Date Closed]-[Due Date],IF([Due Date]<Today,Today-[Due Date])))))
This shows the days overdue in number form in days, or if its not overdue, it shows "None". You can use either a number format or a string format, but NOT A DATE FORMAT. Well, I hope this helps anyone who is running into this problem and doesn't want to have to delve into coding.
EDIT: I forgot to say that in the code above for the days overdue column, I put in that if today is past the date closed, to use the date closed minus the due date instead of today minus due date, to ensure that the calculation doesnt keep occurring after an item has been closed. you probably would have noticed that in the code, but i felt i should point it out just in case.
EDIT 2: The code I had in before my 2nd edit for my calculated column didn't calculate the days overdue properly after an issue had been marked "closed." I put in the updated code. The last part of the code doesn't make sense, as it is the same logic as the beginning, but it worked so I didn't want to take any chances! :)
Peace.
I've used the following and had no problems.
Field Name: Overdue
Field Type: Calculated
Data Type Returned: Yes/No
Formula:
=AND([Due Date]<NOW(),Status<>"Completed",[Due Date]<>"")
Here is a workaround:
Create a date column called Today.
Use this column in your calculated formula (ignore the fact that the formula returns a wrong value).
After you are done with the formula, delete the Today column from your list.
For some reason it works this way! Now Sharepoint treats the Today in your formula as today's date.
Note: If you decide you want to change the formula, you have to create the Today column again. Otherwise, it wouldn't recognize Today as a valid column.
I Tried #Farzad's approach and it seems to be working perfectly. I wanted to do a custom count on Days Elapsed so added a calculated column which previously I was using a difference between the Created Date and Modified Date Columns, which was only showing up whenever a user updated the post, much to my dismay.
I now have a formula which works as I would want to and uses the Today column, and here it is for anyone who would like to use it. I also have a Status column on the basis of which a base of On Hold is used, and the remaining formula are based on the date difference of Today - Created.
=IF(Status="On Hold","On Hold",IF(AND(Today=Created,(DATEDIF(Created,Today,"D")=0)),"New",IF(AND(Today<>Created,(DATEDIF(Created,Today,"D")=0)),"New (updated)",IF(DATEDIF(Created,Today,"d")>3,"Need Update Immediately",IF(DATEDIF(Created,Today,"d")=1,"One day old",IF(DATEDIF(Created,Today,"d")=2,"Two days old",""))))))
Basically its just a bunch of nested IF conditions which get me labels on the basis of which I can add a group to my view and filter out data if needed. Hope this helps anyone looking for an answer!

Resources