Is it possible to get all errors that include a Message substring via the Track:js API? - trackjs

We use TrackJs to log JavaScript errors on Stack Overflow Talent. I want to export a csv of all errors that include the substring "%couldn't load id%" within the Message field.
The API documentation doesn't make it clear that this is possible. Is this possible?

Unfortunately we do not offer substring querying capability at this time :( The main use case for the API is bulk export to store your JS error data in other third party systems (though we are certainly amenable to supporting other use cases ;))
To that end though, the API is meant to be quick, and be able to return lots of results per query. It's no problem to retrieve up to 1000 records per page (using the size query parameter) and filter after the fact.

Related

Efficient pagination in Cosmos DB

I need to implement efficient pagination for Cosmos DB with nodejs api. There are many examples about the implementation with .NET and LINQ but I could not find anything good for nodejs. The idea is to send the pageSize and pageIndex and get the relevant result.
I already know we can always use dbClient.queryDocuments and get the queryIterator and perform the pagination but this requires always iterating from the first document in the DB. An example could be find here.
Any idea how to do it in an efficient way?
Unfortunately CosmosDB as an engine doesn’t have skip and take pagination support yet.
It is, however, a planned feature.
The blogs you’ve read provide one of the few viable workarounds for now which of course comes with a cost.
You could write something smarter and instead of iterating though every document from the beginning, you could keep the request’s continuation token and use it with your next request. That way you can have a previous and next button logic.

Why can't ContinuationToken be used for paging in Azure Search API?

Reading the documentation for the Azure Search .NET SDK, I see that the ContinuationToken property is not supposed to used for pagination (this is the same as the #odata.nextLink and #search.nextPageParameter properties in the REST API).
Note that this property is not meant to help you implement paging of search results. You can implement paging using the Top and Skip search parameters.
Source
Why can't I use it for pagination? I have a situation where I want to run a query and then step through a static copy of the results page by page. I don't want those query results to change beneath my feet, however, as I am navigating through them, as new documents are added to the underlying database. In my case, there could be hundreds or thousands of results that get added in the minute or two between submitting the initial query and navigating to another page. How could I accomplish this?
Your question can be addressed in two parts:
Why is it not recommended to use ContinuationToken to implement pagination?
How can pagination be implemented such that results remain completely stable from page to page?
These are actually unrelated questions, since nothing about ContinuationToken guarantees the stability of the search results. Azure Search makes no consistency guarantees around paging, whether you use $top and $skip or ContinuationToken.
For question #1, the reason ContinuationToken is not recommended for paging is that Azure Search controls when the token is returned, not your application code. If you make assumptions about how and when Azure Search decides to return you a token, there's a chance those assumptions may break with a future service update. The intent of ContinuationToken is to prevent requests for too many documents from overwhelming the service, so you should assume that it is entirely at the service's discretion whether it will return a token.
For question #2, since Azure Search doesn't provide consistency guarantees, you can't completely avoid issues like the same document showing up in multiple pages, missing documents, or documents that are deleted by the time they are seen in results. Even if you wanted to build your own snapshot of the results and page over them in your application code, building a consistent snapshot isn't possible in the first place. However, if your only concern is to avoid showing new documents in the results, you can include a created timestamp field in your index and filter on that in every search request.
Frankly, unless you're trying to export the entire contents of your index, I would question the need for such strong consistency guarantees around paging. Google and Bing make no such guarantees, so arguably user expectations are already set around this. If you are trying to export your data, this is unfortunately not easy with Azure Search today. In that case, please vote on this User Voice item to help the team prioritize this scenario.

REST API: Infinite scroll pagination in the GUI, but allow searching through all entries

I have Express running in a Node.js server, which serves as a backen for my React frontend application.
The frontend application fetches data from the backend (which is stored in Mongo) through a REST call, and display this data in a table.
The amount of data is growing by the day, so I though I should look into reducing the abount of data transferred to the frontend application, so avoid unnecessary strain on the backend.
I'm not sure if this is the right way to approach this, but I've been thinking I would look into having the backen fetch a limited amount of entries, so that only these data will be displayed in the frontend table.
The problem arises with searching - when the user wants to search the data in the table, I'll need to be able to search through all entries, not just the data loaded into the table.
I guess one option would be to have the search function actually query the REST API, instead of searching the table itself.
If I'm on the right track, I guess I could implement REST API pagination, somewhere along the example found in https://refactoringfactory.wordpress.com/2012/09/08/pagination-in-node-js-and-express/. Other suggestions on how to implement pagination are welcome.
I'd very much like some input on the approach I described, and suggestions for smarter ways implement this.
EDIT: I changed the title somewhat to include "Infinite scroll pagination". This is what I'm looking to implement. At the moment I have a click on pages pagination setup, but would like to replace this for the infinite scroll pagination.
I've been thinking I would look into having the backen fetch a limited amount of entries, so that only these data will be displayed in the frontend table.
This is common practice in my experience. The term for it is "pagination." Have a look at this SO question regarding best practices for pagination in REST API's: API pagination best practices.
The problem arises with searching - when the user wants to search the data in the table, I'll need to be able to search through all entries, not just the data loaded into the table.
I guess one option would be to have the search function actually query the REST API, instead of searching the table itself.
Again, you got it. Doing small filters/searches on the client is fine for a limited number of entries, but if you need to only retrieve items matching search criteria in the first place, then adding that functionality to your REST API is the right choice.
Right, you should do
pagination: you might implement it by exposing 2 arguments in the rest endpoint for the listing
?p=<number>: page number, defaults to 1
?l=<number>: number of items per page / page length, defaults to a number maybe from 10 to 100
search: implement it by exposing 1 argument in the rest endpoint for the listing
/?q=<string>: you can define to be what you want, maybe a string that matches with one or multiple fields of the data
If you want to minimize the network traffic, you might also add one more parameter to explicitly select the fields you want to be returned, like this
/?f=<string>: string could be something like id,name,age, and so the api should return only those three fields per record.
All this parameters should be accepted by a list endpoint in your RESTful API
Example:
http://example.com/api/cars/?p=2&l=15&q=toyota&f=id,brand,model,color

HBase Intra-row scanning

I am trying to find some information on intra-row scanning through HBase REST API. However, I have a hard time to see how it could be done according to the documentation, and I am wondering if anyone used this feature from the REST interface?
References
http://hadoop-hbase.blogspot.jp/2012/01/hbase-intra-row-scanning.html
http://wiki.apache.org/hadoop/Hbase/Stargate
More specifically, I am looking into how I would do this from a Node.js application.
Edit:
I found this, but I have a hard time to see how intra-row scanning would be possible...
HBase REST Filter ( SingleColumnValueFilter )
I still have to test this further, but I have written a document which gives a few examples of stringified filter as per the HBase REST Filter question.
Basically, from what I see it is possible to use FilterList and stack multiple Filters in a JSON array.
More details: https://gist.github.com/3979381
Still have to try it, but it seems like this should work. Will update once I have tried it through the REST API.

Strategies for search across disparate data sources

I am building a tool that searches people based on a number of attributes. The values for these attributes are scattered across several systems.
As an example, dateOfBirth is stored in a SQL Server database as part of system ABC. That person's sales region assignment is stored in some horrible legacy database. Other attributes are stored in a system only accessible over an XML web service.
To make matters worse, the the legacy database and the web service can be really slow.
What strategies and tips should I consider for implementing a search across all these systems?
Note: Although I posted an answer, I'm not confident its a great answer. I don't intend to accept my own answer unless no one else gives better insight.
You could consider using an indexing mechanism to retrieve and locally index the data across all the systems, and then perform your searches against the index. Searches would be an awful lot faster and more reliable.
Of course, this just shifts the problem from one part of your system to another - now your indexing mechanism has to handle failures and heterogeneous systems, but that may be an easier problem to solve.
Another factor is how often the data changes. If you have to query data in real-time that goes stale very quickly, then indexing may not be practical.
If you can get away with a restrictive search, start by returning a list based on the search criteria corresponding to the fastest data source. Then join up those records with the other systems and remove records which don't match the search criteria.
If you have to implement OR logic, this approach is not going to work.
While not an actual answer, this might at least get you partway to a workable solution. We had a similar situation at a previous employer - lots of data sources, different ways of accessing those data sources, different access permissions, military/government/civilian sources, etc. We used Mule, which is built around the Enterprise Service Bus concept, to connect these data sources to our application. My details are a bit sketchy, as I wasn't the actual implementor, just an integrator, but what we did was define a channel in Mule. Then you write a simple integration piece to go between the channel and the data source, and the application and the channel. The integration piece does the work of making the actual query, and formatting the results, so we had a generic SQL integration piece for accessing a database, and for things like web services, we had some base classes that implemented common functionality, so the actual customization of the integration piecess was a lot less work than it sounds like. The application could then query the channel, which would handle accessing the various data sources, transforming them into a normalized bit of XML, and return the results to the application.
This had a lot of advantages for our situation. We could include new data sources for existing queries by simply connecting them to the channel - the application didn't have to know or care what data sources where there, as it only looked at the data from the channel. Since data can be pushed or pulled from the channel, we could have a data source update the application when, for example, it was updated.
It took a while to get it configured and working, but once we got it going, we were pretty successful with it. In our demo setup, we ended up with 4 or 5 applications acting as both producers and consumers of data, and connecting to maybe 10 data sources.
Have you thought of moving the data into a separate structure?
For example, Lucene stores data to be searched in a schema-less inverted indexed. You could have a separate program that retrieves data from all your different sources and puts them in a Lucene index. Your search could work against this index and the search results could contain a unique identifier and the system it came from.
http://lucene.apache.org/java/docs/
(There are implementations in other languages as well)
Have you taken a look at YQL? It may not be the perfect solution but I might give you starting point to work from.
Well, for starters I'd parallelize the queries to the different systems. That way we can minimize the query time.
You might also want to think about caching and aggregating the search attributes for subsequent queries in order to speed things up.
You have the option of creating an aggregation service or middleware that aggregates all the different systems so that you can provide a single interface for querying. If you do that, this is where I'd do the previously mentioned cache and parallize optimizations.
However, with all of that it you will need weighing up the development time/deployment time /long term benefits of the effort against migrating the old legacy database to a faster more modern one. You haven't said how tied into other systems those databases are so it may not be a very viable option in the short term.
EDIT: in response to data going out of date. You can consider caching if your data if you don't need the data to always match the database in real time. Also, if some data doesn't change very often (e.g. dates of birth) then you should cache them. If you employ caching then you could make your system configurable as to what tables/columns to include or exclude from the cache and you could give each table/column a personalizable cache timeout with an overall default.
Use Pentaho/Kettle to copy all of the data fields that you can search on and display into a local MySQL database
http://www.pentaho.com/products/data_integration/
Create a batch script to run nightly and update your local copy. Maybe even every hour. Then, write your query against your local MySQL database and display the results.

Resources