I have the following problem:
I need to get a Single<> request using Retrofit and combine it with a Flowable<> (Room). Once both complete, I display data on the UI. And I want to get continuous updates from Room.
I've tried using zip operator since that is a common way to do this, but issue with zip is that it waits for data from both sources - meaning each time I get fresh data from the Room, zip doesn't propagate any new emmisions since it also needs fresh data from Retrofit.
My current solution is this one, using combineLatest:
Flowable<UiModel> getData() {
return Flowable.combineLatest(networkService.getUsers().toFlowable(),
roomDao.getBooks(), UiModel::success)
.onErrorReturn(UiModel::error)
.subscribeOn(Schedulers.io())
.observeOn(AndroidSchedulers.mainThread())
.startWith(UiModel.loading());
}
This works, but there's a couple of minor issues. First, there's multiple emissions of UiModel. This is to be expected with combineLatest - I get the first model containing data from database and then the second one when network request completes. I know I could use skip(1) operator to skip the first emission, but I tried replacing my network service with local cache and combineLatest seems to emit only a single UiModel when both data sources finish at the same (similar?) time.
Is there a better way to achieve what I want?
EDIT: I've added .distinctUntilChanged operator before onErrorReturn. That should actually solve any issues, I think.
What if you split the subscription to the Room with Flowable and the network call?
Get cached data from Room when your view is created, if there is data you can display it to the user (with some timestamp to indicate that data was cached)
At the same time make the request to the network (This one can stay on the background thread) and have your network service insert the returned data into the table. If data is different it will update the entry and Flowable will emit, if data is the same table doesn't get updated and Flowable doesn't emit.
The reason for split is to have a better control of observable streams, where one only emits once and the other one emits every time a change occurs.
Related
I am working on an application in which client(android/reactjs) clicks a button and five operations takes place, let say,
add a new field
update the old field
upload a photo
upload some text
delete some old fields.
Now sometimes due to network issue or any another issue only some operations takes place and db gets corrupted. So my question is how can I make all this transactions one i.e. atomic i.e. either all will complete or the done operations will be rollback. And where should I do this in client(reactjs/android) or in backend(nodejs) with API ? I thought of making an API on backend(since chances of backend goes down is rare) and keep the track of the operations done(statelessly like using arrays). If in any case transaction get stopped, roll back all the done operations. But I found this expensive and it not covers the risk of server error. Can you suggest how can I implement/design this ?
The stack
Express.js API server for CRUD operations over data.
MongoDB database.
Moongose interface for MongoDB for schemas.
The probem
In order to handle duplicates in just one point, I want to do it in the only possible entry point: The API.
Definition: duplicate
A duplicate is an entity which already exists in the data base, so the
new POST request is the same entity with exact the same data, or it is
the same entity with updated data.
The API design is meant to handle the new http2 protocol.
Bulk importers have been written. This programs get the data from a given source, transform the data to our specific format, and make POST request to save it. This importers are designed to handle every entity in parallel.
The API already has a duplication handler which works great when a given entity already exists in the database. The problem comes when the bulk importers make several POST requests for the same entity at the same time, and the entity doesn't exist in the database yet.
....POST/1 .databaseCheck.......DataBaseResult=false..........DatabaseWrite
......POST/2 .databaseCheck.......DataBaseResult=false..........DatabaseWrite
........POST/3 .databaseCheck.......DataBaseResult=false..........DatabaseWrite
.....................POST/N .databaseCheck.......DataBaseResult=false..........DatabaseWrite
This situation produces the creation of the same entity several times, because the database checks haven't finished when the rest of the POST requests arrive.
Only if the number of POST requests is big enough, the first write operation would have already finished, and the databaseCheck of the Nth request will return true.
What would be the correct solution for handle this?
If I'm not wrong, what I'm looking for has the name of transaction, and I don't know if this is something that the database should offer by default, or if it is something that I have to implement.
Solutions I have already considered:
1. Limit the requests, just one each time.
This is the simplest solution, but if the API remains blocked when the bulk importers make several requests, then the frontend client would get very slow, and it is meant to be fast, and multiplayer. So this, in fact, is not a solution.
2. Special bulk API endpoint for each entity.
If an application needs to make bulk requests, then make just one huge POST request with all the data as body request.
This solution doesn't block the API, and can handle duplicates very well, but what I don't like is that I would go against the http2 protocol, where many and small request are desired.
And the problem persists and other future clients may have this problem if they don't notice that there is available a bulk endpoint. But maybe this is not a problem.
3. Try to use the possible MongoDB transaction implementation
I've read a little bit about this, but I don't know if it would be possible to handle this problem with the MongoDB and Mongoose tools. I've done some search, but I haven't find anything, because before to try to insert many documents, I need to generate the data for each document, and that data is coming inside each POST request.
4. Drop MongoDB and use a transaction friendly database.
This would have a big cost at this point because the whole stack is already finished, and we are near to launch. We aren't afraid of refactor. But I think here would apply the 3rd solution considerations.
5. Own transactions implementation at the API level?
I've designed a solution that may work for every cases, and that I call the pool stream.
This is the design:
When a POST request arrives, a timer of a fixed amount of milliseconds starts. That amount of time would be big enough to catch several requests, and small enough in order to do not cause a noticeable delay.
Inside each chunk of requests, the data is processed trying to merge duplicates before writing in the database. So if inside a chunk n requests have been catch, n - m (where m <= n) unique candidates are generated. A hash function is applied to each candidate in order to assign the hash result to each request-response pair. Then the write operation to the database of the candidates is done in parallel, and the current duplicates handler would work for this at the write time.
When the writes for the current chunk finish, the response is sent to each request-response pair of the chunk, then the next chunk is processed. While a chunk is in the queue waiting for the write operation, could be doing the unique candidates process, in order to accelerate the whole process.
What do you think?
Thank you.
In my CouchDB database I'd like all documents to have an 'updated_at' timestamp added when they're changed (and have this enforced).
I can't modify the document with validation functions
updates functions won't run unless they're called specifically (so it'd be possible to update the document and not call the specific update function)
How should I go about implementing this?
There is no way to do this now without triggering _update handlers. This is nice idea to track documents changing time, but it faces problems with replications.
Replications are working on top of public API and this means that:
In case of enforcing such trigger you'll have replications broken since it will be impossible to sync data as it is without document modification. Since document get modified, he receives new revision which may easily lead to dead loop if you replicate data from database A to B and B to A in continuous mode.
In other case when replications are fixed there will be always way to workaround your trigger.
I can suggest one work around - you can create a view which emits a current date as a key (or a part of it):
function( doc ){
emit( new Date, null );
}
This will assign current dates to all documents as soon as the view generation gets triggered (which happens after first request to it) and will reassign new dates on each update of a specific document.
Although the above should solve your issue, I would advice against using it for the reasons already explained by Kxepal: if you're on a replicated network, each node will assign its own dates. So taking this into account, the best I can recommend is to solve the issue on the client side and just post the documents with a date already embedded.
I am trying to write a node program that takes a stream of data (using xml-stream) and consolidates it and writes it to a database (using mongoose). I am having problems figuring out how to do the consolidation, since the data may not have hit the database by the time I am processing the next record. I am trying to do something like:
on order data being read from stream
look to see if customer exists on mongodb collection
if customer exists
add the order to the document
else
create the customer record with just this order
save the customer
My problem is that two 'nearby' orders for a customer cause duplicate customer records to be written, since the first one hasn't been written before the second one checks to see if it there.
In theory I think I could get around the problem by pausing the xml-stream, but there is a bug preventing me from doing this.
Not sure that this is the best option, but using async queue was what I ended up doing.
At the same time as I was doing that a pull request for xml-stream (which is what I was using to process the stream) that allowed pausing was added.
Is there a unique field on the customer object in the data coming from the stream? You could add a unique restriction to your mongoose schema to prevent duplicates at the database level.
When creating new customers, add some fallback logic to handle the case where you try to create a customer but that same customer is created by another save at the same. When this happens try the save again but first fetch the other customer first and add the order to the fetched customer document
I'm currently re-factoring an Android project that in a few places loads data on background threads in order to update list views. The API that is being called to collect the data has a callback mechanism, so when a lot of data is returned (which takes a long time) I can handle the results asynchronously.
In the old code, this data was packaged up as an appropriate object and passed into a handle on the UI thread, to be inserted into the list view's adapter. This worked well, but I've decided that presenting the data through a ContentProvider would make the project easier to maintain and expand.
This means I need to provide the data as a Cursor object when requested via the query method.
So far I've been unable to update the data in the Cursor after retuning it. Does this mean that all of the data needs to be collected before returning the Cursor? The Android LoaderThrottleSupport sample suggests that I don't, but I have yet to get it working for anything other than an SQL backend.
Has anyone else tried to present non-SQL backed asynchronous data in this sort of way?