Availability tracking with Algolia - search

I am working on an Airbnb-like website and I am in the process of rewriting our in-house, SQL-based search system with Algolia. It's been a really pleasant journey so far, as I have managed to remove a lot of legacy code and outsource it, with awesome results. However, there is one critical piece of our search system which I am not sure can be implemented with Algolia.
Internally, we store the availability/unavailability (and price) of each date for each asset as a single row in the database. This means our availabilities table looks like this:
asset_id | date | status | price_cents
-------- | ---------- | ----------- | -----------
1 | 2017-02-09 | available | 15000
1 | 2017-02-10 | available | 15000
1 | 2017-02-11 | unavailable | NULL
1 | 2017-02-12 | available | 20000
When a user searches for available properties, they enter a date range and, optionally, a price range.
What we're doing now is simply querying the availabilities table and making sure that all dates in the date range are available for that asset (i.e. the count of available dates is equal to the number of days in the range). If the user enters a price range, we also make sure that the average price for those dates is within the requested range. The SQL query is fairly complex, but this is what it does at the end of the day.
I have been trying to replicate this with Algolia, but couldn't find any documentation about a similar feature. In fact, I am facing two separate issues right now:
I have no way to ensure all dates in the provided date range are available, because Algolia has little to no knowledge about associations, and
I have no way to calculate (and query) the average price for the provided date range, because it depends on user input (i.e. the date range).
Is there a way to achieve this with Algolia? If not, is it feasible to use SQL or another tool in combination with Algolia to achieve the desired result? Of course, I could do all of this with Elasticsearch, but Algolia is so fast and easy that I'd hate to step away from it because of these issues.

This use-case is definitely complex, and Algolia needs precomputed data in order to work.
Edit 2020 (better solution)
In each item, you could simply store the list of days where the location is available, e.g.
{
name: "2 bedroom appartment",
location: "Paris",
availabilities: ['2020-04-27', '2020-04-28', '2020-04-30']
price_cents: 30000
}
You could then, at search time, generate the list of all the availabilities you require your items to have, e.g. (available from April 28th to April 30th):
index.search('', {
filters: '' +
'availabilities:2020-04-28 AND availabilities:2020-04-29 AND availabilities:2020-04-30 AND ' +
'price_cents >= ' + lowPriceRange + ' AND price_cents <= ' + highPriceRange
})
In this example, the record wouldn't match as it lacks 2020-04-29.
Another solution, which works more generically, but requires way more records:
I'm assuming there is a cap of the amount of days in advance you can book, I'll assume here it's 90 days.
You could generate every date range possible inside those 90 days.
This would mean generating 90 + 89 + ... = 90 * 91 / 2 = 4095 date ranges.
Then for each of those ranges, and each of the flats you're offering on your service, you could generate an object like this:
{
name: "2 bedroom appartment",
location: "Paris",
availability_range: "2017-02-09 -> 2017-02-10",
availability_start_timestamp: 10001000,
availability_end_timestamp: 10002000,
price_cents: 30000
}
With those objects, then searching for an date range would be as easy as:
index.search('', {
filters: '' +
'availability_range:"' + startDate + ' -> ' + endDate + '" AND ' +
'price_cents >= ' + lowPriceRange + ' AND price_cents <= ' + highPriceRange
})
You would only be indexing available time ranges, so this should greatly reduce the amount of objects, but it would still be probably huge.
Finally, the timestamps in the object would be here to know which ones to delete when a booking is made.
The call would be something like:
index.deleteByQuery('', {
filters: 'availability_start_timestamp < ' + booking_end_timestamp + ' AND availability_end_timestamp > ' + booking_start_timestamp
})

Related

Excel: Spill out all matching rows

For an Excel documenten I am fitlering the data to create a "view". I got several rows of data containing the following data
| type | sender | duration | price |
In my view I want the following columns:
| sender | duration | price |
Type = data / call
Sender = phone number (several different)
Duration = time in seconds
Price = is total price for seconds
In the view I want the unique list of phone numbers if type is data, then I want the total duration and total price. The latter of these is done using SUMIFS
I know that there's an option by filtering by hand. But I assume you already found that I want this in code.
I already tried XLOOKUP but this only returns one result as cell reference. XMATCH isn't the holy grail either.
I ended up using =UNIQUE(FILTER(...)) I'd hoped that XLOOKUP and XMATCH were there to help with this.
Feel free to feedback if you have a better formula, please!

PowerBI: How to calculate/convert Time into Percentage using DAX/Measure

I managed to get in Excel desired % of time difference from column E, easy job just changed the Data Type to Percentage. What are we calculating is % of these TimeDifferences, one per one (other columns inconsiderable).
The same thing isn't in PowerBI, where I am not able to calculate it properly, always getting "1" before comma and then the result - you can compare it in both tables/columns what I am talking about.
I am looking for the way/DAX/measure how to properly calculate it, no matter in decimals or directly to percentage, as long as the % is the same as in Excel column. Any ideas?
P.S Left is Excel and right is PowerBI!
Seems Excel is basing the percentage on 24 hours, this I used in the calculation (24 hours = 24 * 3600 seconds).
I started combining in power query the date and the time, this has to do with the fact that you go over the day and your calculation still needs to be correct.
Go to query editor. select both columns, combine them. Next change the type to Date/Time, result:
Save and close editor.
In Power Bi, add a column:
NextDate = LOOKUPVALUE(Explog[Date];Explog[Index];Explog[Index] + 1)
This is picking up the next Date based on Index + 1
Add another column TimeDiffSec, calculating the datediff in seconds:
TimeDiffSec = DATEDIFF(Explog[Date];Explog[NextDate];SECOND)
Last step is adding a column for percentage:
% of time difference =
var perc = Explog[TimeDiffSec]/ (24*3600)
return if(perc >= 1; perc - 1; perc)
End result:
Note: If you have a situation you do not want to mix the System (STYRAX - scrubber) you can use the following for the NextDate:
NextDate =
var nextIndex =
CALCULATE(MIN(Explog[Index]);
FILTER(Explog;Explog[Index] > EARLIER(Explog[Index]) && Explog[System] = EARLIER(Explog[System])))
return
LOOKUPVALUE(Explog[Date];Explog[Index];nextIndex; Explog[System];Explog[System])

Matching, splitting, converting and summing string in Excel / Numbers

I'm trying to do a match-and-calculate formula in Excel (or in Numbers for Mac, is the same for me: I try them both as they seem equal, also function names are equal!).
This is what I have:
| 1 | 2 | 3 |
|-----------+-----------+-----------|
| Category |other stuff| duration |
|-----------+-----------+-----------|
| A + .... ... + 00:01:23 |
|-----------+-----------+-----------|
| A + .... ... + 00:30:19 |
|-----------+-----------+-----------|
| B + ......... + ......... |
|-----------+-----------+-----------|
| A + .... ... + 00:22:12 |
... ... ....
So, in column 3 I have a duration in time in this format "hh:mm:ss" and in column 1 are stored all of my categories.
I want to search for all rows in my table that are matching with the category "A" in column 1 and take the relative column 3, splitting the string and converting chars to numbers (in particular I'm interested in converting them to secs, so hh*3600+mm*60+ss) and finally sum up all these values. Is it possible?
I'm new with Excel and Numbers, but I'm pretty familiar with coding in programming languages generally: this is what I'd do in programming:
global_secs=0;
for(row r=top to end){
if(r.get_column(1).content_equals("A")){
cell c=r.get_column(3);
string=split(c.get_content(),":")
global_secs+=int(string[1])*3600+int(string[2])*60+int(string[3])
}
}
Is there a way to achieve this in Excel sheet (or Numbers)?
I'd like to do all of this in one, or more, formula only in Excel or Numbers.
One more thing: I do not want to change cells format because this should be an automatic process without human interaction, so unless there is a function to change a range of cells format dynamically I prefer not to do that (I know I can make "duration" as format and sum up without converting to integer, but originally my data is in hh:mm:ss format)
Thanks so much!
The formula you are looking for is
=SUMIF(A2:A5,"A",C2:C5)
The easiest way to get the result in seconds would have been to format the cell as [ss] in Custom category. But as you don't want to do formatting , the other way could be
=HOUR(result) * 3600 + MINUTE(result) * 60 + SECOND(result)
So formula becomes
=HOUR(SUMIF(A2:A5,"A",C2:C5)) * 3600 + MINUTE(SUMIF(A2:A5,"A",C2:C5)) * 60 + SECOND(SUMIF(A2:A5,"A",C2:C5))
See image for referecne
Looks like a matrix formula
=SUM(N($A$2:$A$8="A")*$B$2:$B$8)
where column A contains the category and column C the duration. Note you need to press ctrl shift enter to make it work.
To convert the result to seconds, an alternative approach to #Mrig' solution would be to format the result and convert it back to a number, i.e.
=VALUE(TEXT(SUM(N($A$2:$A$8="A")*$B$2:$B$8),"[ss]"))

PowerPivot Calculated Column Circular Dependency

I'm really spinning my wheels on this one. I'm trying to add 2 calculated columns in a Power Pivot table (in Excel 2013) to a loaded single column.
Setup (just first row shown):
Prd | Beg | End
1 | =CALCULATE(SUM([End]),Table[Prd]=EARLIER([Prd])-1) | =[Beg]+[Prd]
I want it to calculate like this:
Prd | Beg | End
1 | 1 | 2
2 | 2 | 4
3 | 4 | 7
But no matter what I do, I get a circular reference error because the [End] calculation is pointing to the [Beg] calculation and vice versa. I'm trying to get it to perform a rolling calculation where the [Beg] amount always equals the [End] amount from the prior [Prd].
I tried various calculations using SUMX and ALLEXCEPT, but I'm not getting this one right. I even tried designating the Row Identifier in the Table Behavior tab based on this but it's not working with that either.
Appreciate any suggestions!
I would suggest you to base your formula for [Beg] column on previous values of [Prd] column. Therefore
Beg=SUMX(
FILTER(
ALL(Table[Prd]),
Table[Prd] < EARLIER(Table[Prd])
),
Table[Prd]
) + 1
Explanation:
It sums up all the previous values for [Prd] column and adds 1 (if you take a look at the generated values, you'll see the pattern).
But the formula for [End] should also be fixed so you won't run into the same exception. So you'll have the following (this will sum values from current row for [Beg] and [Prd]):
End=SUMX(
FILTER(
ALL(Table[Beg], Table[Prd]),
Table[Prd]=EARLIER(Table[Prd])),Table[Beg]
) + Table[Prd]
Explanation:
In your case, avoiding to use CALCULATE and using instead just SUMX and EARLIER for [End] will help you to get rid of circular dependency.

Summing up a related table's values in PowerPivot/DAX

Say I have two tables. attrsTable:
file | attribute | value
------------------------
A | xdim | 5
A | ydim | 6
B | xdim | 7
B | ydim | 3
B | zdim | 2
C | xdim | 1
C | ydim | 7
sizeTable:
file | size
-----------
A | 17
B | 23
C | 34
I have these tables related via the 'file' field. I want a PowerPivot measure within attrsTable, whose calculation uses size. For example, let's say I want xdim+ydim/size for each of A, B, C. The calculations would be:
A: (5+6)/17
B: (7+3)/23
C: (1+7)/34
I want the measure to be generic enough so I can use slicers later on to slice by file or attribute. How do I accomplish this?
I tried:
dimPerSize := CALCULATE([value]/SUM(sizeTable[size])) # Calculates 0
dimPerSize := CALCULATE([value]/SUM(RELATED(sizeTable[size]))) # Produces an error
Any idea what I'm doing wrong? I'm probably missing some fundamental concepts here of how to use DAX with relationships.
Hi Redstreet,
taking a step back from your solution and the one proposed by Jacob, I think it might be useful to create another table that would aggregate all the calculations (especially given you probably have more than 2 tables with file-specific attributes).
So I have created one more table that contains (only) unique file names, and thus the relationships could be visualized this way:
It's much simpler to add necessary measures (no need for calculated columns). I have actually tested 2 scenarios:
1) create simple SUM measures for both Attribute Value and File Size. Then divide those two measures and job done :-).
2) use SUMX functions to have a bit more universal solution. Then the final formula for DimPerSize calculation could look like this:
=DIVIDE(
SUMX(DISTINCT(fileTable[file]),[Sum of AttrValue]),
SUMX(DISTINCT(fileTable[file]),[Sum of FileSize]),
BLANK()
)
With [Sum of AttrValue] being:
=SUM(attrsTable[value])
And Sum of FileSize being:
=SUM(sizeTable[size])
This worked perfectly fine, even though SUMX in both cases goes over all instances of given file name. So for file B it also calculates with zdim (if there is a need to filter this out, then use simple calculate / filter combination). In case of file size, I am using SUMX as well, even though it's not really needed since the table contains only 1 record for each file name. If there would be 2 instances, then use SUMX or AVERAGEX depending on the desired outcome.
This is the link to my source file in Excel (2010).
Hope this helps.
You look to have the concept of relationships OK but you aren't on the right track in terms of CALCULATE() either in terms of the structure or the fact that you can't simply use 'naked' numerical columns, they need to be packaged in some way.
Your desired approach is correct in that once you get a simple version of the thing running, you will be able to slice and dice it over any of your related dimensions.
Best practice is probably to build this up using several measures:
[xdim] = CALCULATE(SUM('attrstable'[value]), 'attrstable'[attribute] = "xdim")
[ydim] = CALCULATE(SUM('attrstable'[value]), 'attrstable'[attribute] = "ydim")
[dimPerSize] = ([xdim] + [ydim]) / VALUES('sizeTable'[size])
But depending on exactly how your pivot is set up, this is likely to also throw an error because it will try and use the whole 'size' column in your totals. There are two main strategies for dealing with this:
Use an 'iterative' formula such as SUX() or AVERAGEX() to iterate individually over the 'file' field and then adds up or averages for the total e.g.
[ItdimPerSize] = AVERAGEX(VALUES('sizeTable'[file]), [dimPerSize])
Depending on the maths you want to use, you might find that produce a useful average that you need to use SUMX but devide by the number of cases i.e. COUNTROWS('sizeTable'[file]).
You might decide that the totals are irrelevant and simply introduce an error handling element that will make them blank e.g.
[NtdimPerSize] = IF(HASONEVALUE('sizeTable'[file]),[dimPerSize],BLANK())
NB, all of this assumes that when you are creating your pivot that you are 'dragging in' the file field from the 'sizetable'.

Resources