Limit results on a ldap requestion with zend_ldap - pagination

I'm actually working with Zend Framework and I would like to get informations from a ldap directory. For that, I use this code :
$options = array('host' => '...', 'port' => '...', ...);
$ldap = new Zend_Ldap($options);
$query = '(username=' . $_GET['search'] . ')';
$attributes = array('id', 'username', ...);
$searchResults = $ldap->search($query, $ldap->getBaseDn(), Zend_Ldap::SEARCH_SCOPE_SUB, $attributes);
$ldap->disconnect();
There is may be many results so I would like to realize a pagination by limiting the number of results. I searched in the paramters of the search() function of Zend_Ldap which have a sort parameter but nothing to give an interval.
Do you have a solution to limit the number of results (as in sql with limit 0, 200 for example) ?
Thank you

A client-requested size limit can be used to limit the number of entries the directory server will return. The client-requested size limit cannot override any server-imposed size limit, however. The same applies to the time limit. All searches should include a non-zero size limit and time limit. Failure to include size limit and time limit is very bad form. See "LDAP: Programming Practices" and "LDAP: Search Practices" for more information.
"Paging" is accomplished using the simple paged results control extension. described in my blog entry: "LDAP: Simple Paged Results".
Alternatively, a search result listener, should your API support it, could be used to handle results as they arrive which would reduce memory requirements of your application.

Unfortunately, current releases of PHP don't support the ldap pagination functions out of the box - see http://sgehrig.wordpress.com/2009/11/06/reading-paged-ldap-results-with-php-is-a-show-stopper/
If you have control of your server environment, there's a patch you can install with PHP 5.3.2 (and possibly others) that will allow you to do this: https://bugs.php.net/bug.php?id=42060.
.... or you can wait until 5.4.0 is released for production, which should be in the next few weeks, and which include this feature.
ldap_control_paged_results() and ldap_control_paged_results_response() are the functions you'll want to use if you're going with the patch. I think they have been renamed to the singular ldap_control_paged_result() and ldap_control_paged_result_response() in 5.4.
Good luck!

Related

Searching huge number of keywords in terms filter using bool query crashes the terminal

I was able to search for a couple of keywords in 2 different fields using the code below:
curl -XGET 'localhost:9200/INDEXED_REPO/_search?pretty' -H 'Content-Type:
application/json' -d'{"query" : {"constant_score" : {"filter" : {"bool" :
{"should" : [{ "terms" : {"description" : ["heart","cancer"]}},{ "terms" :
{"title" : ["heart","cancer"]}}]}}}}}'
However, when I put 15000 keywords, the server suddenly closed my terminal. I am using Mobaxterm. What is the best solution to include this many keywords?
There is a limit to the Max number of clauses you can use in the bool query. You can change it but it has effects on the CPU usage of the Server and might cause it to crash. I think if you didn't get an error of max clauses reached you might have already crashed the server.
I would find an optimal number that your server can take or if its absolutely necessary to search for all of them at once upgrade the server. setup extra nodes and do shard / replicas properly.
To allow more bool clauses Add the following in your elasticsearch.yml file
"indices.query.bool.max_clause_count : n" (where n - new supported number of clauses)
Refer to these for more details
Max limit on the number of values I can specify in the ids filter or generally query clause?
Elasticsearch - set max_clause_count
Also Cerebro is a better alternative to Mobaxterm you can download it from here https://github.com/lmenezes/cerebro. It will give a nice interface to play with the queries before finalizing it in your code.

Documentdb performance when using pagination

I have a working code on pagination which works great with azure search and sql, but when using it on documentdb it takes up to 60 seconds to load.
We beleive it's a latency issue, but I can't find a workaround to fasten it up,
any documentation, or ideas on where to start looking?
public PagedList(IQueryable<T> superset, int pageNumber, int pageSize, string sortExpression = null)
{
if (pageNumber < 1)
throw new ArgumentOutOfRangeException("pageNumber", pageNumber, "PageNumber cannot be below 1.");
if (pageSize < 1)
throw new ArgumentOutOfRangeException("pageSize", pageSize, "PageSize cannot be less than 1.");
// set source to blank list if superset is null to prevent exceptions
TotalItemCount = superset == null ? 0 : superset.Count();
if (superset != null && TotalItemCount > 0)
{
Subset.AddRange(pageNumber == 1
? superset.Skip(0).Take(pageSize).ToList()
: superset.Skip((pageNumber - 1) * pageSize).Take(pageSize).ToList()
);
}
}
While the LINQ provider for DocumentDB translates .Take() into a "TOP" SQL clause under certain circumstances, DocumentDB has no equivalent for Skip. So, I'm a little surprised it works at all but I suspect that the provider is rerunning the query from scratch to simulate Skip. In the comments here is a discussion led by a DocumentDB product manager on why they chose not to implement SKIP. tl;dr; It doesn't scale for NoSQL databases. I can confirm this with MongoDB (which does have a skip functionality). Later pages simply scan and throw away earlier documents. The later in the list you go, the slower it gets. I suspect that the LINQ implementation is doing something similar except client-side.
DocumentDB does have a mechanism for getting documents in chunks but it works a bit differently than SKIP. It uses a continuation token. You can even set a maxPageSize, however there is no guarantee that you'll get that number back.
I recommend that you implement a client-side cache of your own and use a fairly large maxPageSize. Let's say each page in your UI is 10 rows and your cache currently has 27 rows in it. If the user selects page 1 or page 2, you have enough rows to render the result from the data already cached. If the user select page 7, then you know that you need at least 70 rows in your cache. Use the last continuation token to get more until you have at least 70 rows in your cache and then render rows 61-70. On the plus side, continuation tokens are long lived so you can use them later based upon user input.

Incremental loading in Azure Mobile Services

Given the following code:
listView.ItemsSource =
App.azureClient.GetTable<SomeTable>().ToIncrementalLoadingCollection();
We get incremental loading without further changes.
But what if we modify the read.js server side script to e.g. use mssql to query another table instead. What happens to the incremental loading? I'm assuming it breaks; if so, what's needed to support it again?
And what if the query used the untyped version instead, e.g.
App.azureClient.GetTable("SomeTable").ReadAsync(...)
Could incremental loading be somehow supported in this case, or must it be done "by hand" somehow?
Bonus points for insights on how Azure Mobile Services implements incremental loading between the server and the client.
The incremental loading collection works by sending the $top and $skip query parameters (those are also sent when you do a query by using the .Take and .Skip methods in the table). So if you want to modify the read script to do something other than the default behavior, while still maintaining the ability to use that table with an incremental loading collection, you need to take those values into account.
To do that, you can ask for the query components, which will contain the values, as shown below:
function read(query, user, request) {
var queryComponents = query.getComponents();
console.log('query components: ', queryComponents); // useful to see all information
var top = queryComponents.take;
var skip = queryComponents.skip;
// do whatever you want with those values, then call request.respond(...)
}
The way it's implemented at the client is by using a class which implements the ISupportIncrementalLoading interface. You can see it (and the full source code for the client SDKs) in the GitHub repository, or more specifically the MobileServiceIncrementalLoadingCollection class (the method is added as an extension in the MobileServiceIncrementalLoadingCollectionExtensions class).
And the untyped table does not have that method - as you can see in the extension class, it's only added to the typed version of the table.

Alternative to skip and limit for mongoose pagination with arbitrary sorting

Let me start by saying that I've read (MongoDB - paging) that using skip and limit for pagination is bad for performance and that its better to sort by something like dateCreated and modify the query for each page.
In my case, I'm letting the user specify the parameter to sort by. Some may be alphabetical. Specifying a query for this type of arbitrary sorting seems rather difficult.
Is there a performance-friendly way to do pagination with arbitrary sorting?
Example
mongoose.model('myModel').find({...})
.sort(req.sort)
...
Secondary question: At what scale do I need to worry about this?
i don't think you can do this.
but in my opinion the best way is to build you query depending on your req.sort var.
for example (it's written in coffescript)
userSort = {name:1} if req.sort? and req.sort="name"
userSort = {date:1} if req.sort? and req.sort="date"
userSort = {number:1} if req.sort? and req.sort="number"
find {}, null , {skip : 0 , limit: 0, sort : userSort } , (err,results)->

Helping Kohana 3 ORM to speed up a little

I noticed that Kohana 3 ORM runs a "SHOW FULL COLUMNS" for each of my models when I start using them:
SHOW FULL COLUMNS FROM `mytable`
This query might take a few clock cycles to execute (in the Kohana profiler it's actually the slowest of all queries ran in my current app).
Is there a way to help Kohana 3 ORM to speed up by disabling this behaviour and explicitly define the columns in my models instead?
biakaveron answered my question with a comment so I can't except the correct answer.
Taken from Wouters answer on the official Kohana forums (where biakaveron pointed to), this is the correct answer:
It's very easy, $table_columns is a
big array with a lot of info, but
actually only very little of this info
is used in ORM.
This will do:
protected $_table_columns = array(
'id' => array('type'=>'int'),
'name' => array('type'=>'string'),
'allowNull' => array('type'=>'string','null'=>TRUE),
'created' => array('type'=>'int')
);
There isn't too much overhead when that query gets executed; though you can cache them / skip the process by defining them manually (if that is really what you want override the $_table_columns in your models, though I don't see how much time you can save doing it - it's worth trying).
I proposed a caching alternative for list_columns() but it got denied as it really isn't that much of a bottleneck: http://dev.kohanaframework.org/issues/2848
do not forget underscore:
protected $_table_columns = array(
'id' => array('type'=>'int'),
'name' => array('type'=>'string'),
'allowNull' => array('type'=>'string','null'=>TRUE),
'created' => array('type'=>'int')
);
This will give you full column
info as an array:
var_export($ORM->list_columns());
Not sure how the kohana team said 'show full columns' runs as fast reading from cache for all cases. Query cache is a bottleneck on mysql for our workload due to . So we had to turn it off.
https://blogs.oracle.com/dlutz/entry/mysql_query_cache_sizing
Proof that show full columns is the most run query
https://www.dropbox.com/s/zn0pbiogt774ne4/Screenshot%202015-02-17%2018.56.21.png?dl=0
Proof of the temp tables on the disk from NewRelic mysql plugin.
https://www.dropbox.com/s/cwo09sy9qxboeds/Screenshot%202015-02-17%2019.00.19.png?dl=0
And the top offending queries (> 100ms) sorted by query count.
https://www.dropbox.com/s/a1kpmkef4jd8uvt/Screenshot%202015-02-17%2018.55.38.png?dl=0

Resources