How to search Lucene.NET without indicating "top n" hits limit? - search

There are several overloads of IndexSearcher.Search method in Lucene. Some of them require "top n hits" argument, some don't (these are obsolete and will be removed in Lucene.NET 3.0).
Those, which require "top n" argument actually cause memory preallocation for this entire posible range of results. So when you're in situation when you can't even approximately estimate count of results returned, the only opportunity is to pass a random large number to ensure that all query results will be returned. This causes severe memory pressure and leaks due to LOH fragmentation.
Is there an oficial not outdated way to search without passing "top n" argument?
Thanks in advance, guys.

I'm using Lucene.NET 2.9.2 as reference point for this answer.
You could build a custom collector which you pass to one of the search overloads.
using System;
using System.Collections.Generic;
using Lucene.Net.Index;
using Lucene.Net.Search;
public class AwesomeCollector : Collector {
private readonly List<Int32> _docIds = new List<Int32>();
private Scorer _scorer;
private Int32 _docBase;
public IEnumerable<Int32> DocumentIds {
get { return _docIds; }
}
public override void SetScorer(Scorer scorer) {
_scorer = scorer;
}
public override void Collect(Int32 doc) {
var score = _scorer.Score();
if (_lowerInclusiveScore <= score)
_docIds.Add(_docBase + doc);
}
public override void SetNextReader(IndexReader reader, Int32 docBase) {
_docBase = docBase;
}
public override bool AcceptsDocsOutOfOrder() {
return true;
}
}

Related

JsfCaptcha : show correct captcha value

I am using JsfCaptcha in an attempt to process offline captcha validation. While there is a method to validate "what the user entered matches what the captcha image has shown", I am having a hard time actually printing out what the server states is the right solution. I anticipated this being fairly easy to complete, but for the life of me, cannot figure it out. Here is how I am using the library:
import botdetect.web.jsf.JsfCaptcha;
[...]
#ManagedBean
#RequestScoped
public class MySampleBean implements Serializable {
private JsfCaptcha captcha;
private String captchaCode;
getters for above two fields
[...]
setters for above two fields
[...]
public boolean checkInputMatches() {
if (!this.captcha.validate(captchaCode)) {
return true;
}
return false;
}
}
The method checkInputMatches() demonstrates how the library validates that the user has entered in the right captcha solution. What I'd want to do now is, for debugging purposes, is to log out what the solution was ( In the event that the user entered in the wrong value ). Potentially, something like this:
final String solution = captcha.getCorrectSolutionToCaptcha();
At first, I've taken a look through all of the public getters, but none of them are blatant in providing me the data I need. After trying all of them, I went down the jdgui route, where I decompiled the libraries and tried to hunt my way around to a solution / method that would give me this data.
Sadly, the JsfCaptcha class goes under 5-6 levels of base class extending, with a multitude of protected / private methods. Obviously, a very tedious and unnecessary hunt for something very simple.
Is it possible to print out the actual JsfCaptcha value that is being validated against?
I finally managed to solve the problem with javassist, by modifying the generated bytecode of the Botdetect library. I did this because I was unable to find any getter method for accessing the actual captcha solution. Obviously, this is not a clean solution, but it is a solution given that you just want to debug your code to determine why the code you entered does not match the code that the backend server has. For now, I'll consider this as a solution until there is a cleaner alternative requiring no bytecode manipulation. Here are the details on the version that I played with and got this to work:
botdetect-4.0.beta3.5jar
botdetect-jsf20-4.0.beta3.5.jar
botdetect-servlet-4.0.beta3.5.jar
When the checkInputMatches() method executes to validate the captcha, this structure is executed on the backend with respect to the mentioned jars:
Step 1: ( botdetect-jsf20-4.0.beta3.5.jar )
com.captcha.botdetect.web.jsf.JsfCaptcha ->
public boolean validate(String paramString)
Step 2: ( botdetect-servlet-4.0.beta3.5.jar )
com.captcha.botdetect.web.servlet.Captcha ->
public boolean validate(String paramString)
Step 3: ( botdetect-jsf20-4.0.beta3.5.jar )
com.captcha.botdetect.internal.core.CaptchaBase ->
public boolean validate(String paramString1, String paramString2, ValidationAttemptOrigin paramValidationAttemptOrigin, boolean paramBoolean)
Step 4: ( botdetect-jsf20-4.0.beta3.5.jar )
com.captcha.botdetect.internal.core.captchacode.CodeCollection ->
public final boolean a(String paramString1, String paramString2, Integer paramInteger, boolean paramBoolean, ValidationAttemptOrigin paramValidationAttemptOrigin)
Step 5: Observe $3 ( third argument from Step 4 ) to show the actual code.
Here is a photo using jdgui, through which I came to this conclusion:
With that in mind, here is how you can go about printing that value out when that code is executed using javassits ( I am using javassist-3.18.1-GA.jar , on Tomcat ) :
#ManagedBean(eager = true)
#ApplicationScoped
public class CustomBean implements Serializable {
private static final long serialVersionUID = 3121378662264771535L;
private static Logger LOG = LogManager.getLogger(CustomBean.class.getName());
#PostConstruct
public void initialize() {
try {
final ClassPool classPool = new ClassPool(ClassPool.getDefault());
classPool.insertClassPath(new ClassClassPath(this.getClass()));
classPool.insertClassPath(new LoaderClassPath(Thread.currentThread().getContextClassLoader()));
final CtClass codeCollectionClass = classPool
.get("com.captcha.botdetect.internal.core.captchacode.CodeCollection");
if (!codeCollectionClass.isFrozen()) {
final CtMethod aMethod = codeCollectionClass.getDeclaredMethod("a",
new CtClass[] { classPool.get("java.lang.String"), classPool.get("java.lang.String"),
classPool.get("java.lang.Integer"), classPool.get("boolean"),
classPool.get("com.captcha.botdetect.internal.core."
+ "captchacode.validation.ValidationAttemptOrigin") });
aMethod.insertAfter("System.out.println(\"Botdetect-DEBUG: entered-captcha: \" + "
+ "$1 + \"; expected-captcha: \" + $3 + \";\" );");
codeCollectionClass.toClass();
} else {
LOG.error("Frozen class : Unable to re-compile BotDetect for debugging.");
}
} catch (final Exception e) {
LOG.error("unable to modify the bot detect java code", e);
}
}
}
Given this input and challenge:
You get a message like this in your logs:
Botdetect-DEBUG: entered-captcha: U33aZ; expected-captcha: U49a6;

How to read result of HYPERLINK() function in POI

Apache Poi can evaluate and return results of functions in formulas. However for the special function HYPERLINK(), it only returns the "display value", not the actual calculated hyperlink value.
I have an Excel file which contains complex computed hyperlinks which combine results from a number of different fields in the workbook, thus it would be nice to be able to read the resulting URL for the hyperlink, however with default formula evaluation I only get the "display value", not the actual URL.
Is there a way I can compute the formula in a way so I can read the actual URL?
Found a way, but I would probably call it "ugly workaround":
If you try to re-implement the "Hyperlink" function implementation in Apache Poi with WorkbookEvaluator.registerFunction("HYPERLINK", func) you get an error that the built-in function cannot be overwritten.
After digging into Poi a bit more, I found that I can access the list of builtin-functions by putting a class into the "org.apache.poi.ss.formula.eval" package:
package org.apache.poi.ss.formula.eval;
public class BuiltinFunctionsOverloader {
public static void replaceBuiltinFunction(int index, Function function) {
FunctionEval.functions[index] = function;
}
}
Then I can use this override a function, e.g. Hyperlink has index 359:
BuiltinFunctionsOverloader.replaceBuiltinFunction(359, func);
With a function implementation as follows, I now get the URL-value instead of the display-value:
Function func = new Function2Arg() {
#Override
public final ValueEval evaluate(ValueEval[] largs, int srcRowIndex, int srcColumnIndex) {
switch (largs.length) {
case 1:
return evaluate(srcRowIndex, srcColumnIndex, largs[0]);
case 2:
return evaluate(srcRowIndex, srcColumnIndex, largs[0], largs[1]);
}
return ErrorEval.VALUE_INVALID;
}
public ValueEval evaluate(int srcRowIndex, int srcColumnIndex, ValueEval arg0) {
return arg0;
}
#Override
public ValueEval evaluate(int srcRowIndex, int srcColumnIndex, ValueEval arg0, ValueEval arg1) {
return arg0;
}
};
Ugly, but at least does not require me to patch POI.
Anybody knows of a more "official" way of doing this?

Is this Object Casting pattern acceptable in SharePoint?

I'm creating a SharePoint application, and am trying some new things to create what amounts to an API for Data Access to maintain consistency and conventions.
I haven't seen this before, and that makes me think it might be bad :)
I've overloaded the constructor for class Post to only take an SPListItem as a parameter. I then have an embedded Generic List of Post that takes an SPListItemCollection in the method signature.
I loop through the items in a more efficient for statement, and this means if I ever need to add or modify how the Post object is cast, I can do it in the Class definition for a single source.
class Post
{
public int ID { get; set; }
public string Title { get; set; }
public Post(SPListItem item)
{
ID = item.ID;
Title = (string)item["Title"];
}
public static List<Post> Posts(SPListItemCollection _items)
{
var returnlist = new List<Post>();
for (int i = 0; i < _items.Count; i++) {returnlist.Add(new Post(_items[i]));}
return returnlist;
}
}
This enables me to do the following:
static public List<Post> GetPostsByCommunity(string communityName)
{
var targetList = CoreLists.SystemAccount.Posts(); //CAML emitted for brevity
return Post.Posts(targetList.GetItems(query)); //Call the constructor
}
Is this a bad idea?
This approach might be suitable, but that FOR loop causes some concern. _items.Count will force the SPListItemCollection to retrieve ALL those items in the list from the database. With large lists, this could either a) cause a throttling exception, or b) use up a lot of resources. Why not use a FOREACH loop? With that, I think the SPListItems are retrieved and disposed one at a time.
If I were writing this I would have a 'Posts' class as well 'Post', and give it the constructor accepting the SPListItemCollection.
To be honest, though, the few times I've seen people try and wrap SharePoint SPListItems, it's always ended up seeming more effort than it's worth.
Also, if you're using SharePoint 2010, have you considered using SPMetal?

How to force the order of Installer Execution

I have been building a new .NET solu­tion with Cas­tle per­form­ing my DI.
Its now at the stage where i would like to con­trol the order in which my installers run. I have built indi­vid­ual classes which implement IWind­sorIn­staller to han­dle my core types — eg IRepos­i­tory, IMap­per and ISer­vice to name a few.
I see that its suggested i implement my own Installer­Fac­tory (guessing i just override Select) in this class.
Then use this new factory in my call to:
FromAssembly.InDirectory(new AssemblyFilter("bin loca­tion"));
My ques­tion — when over­rid­ing the save method — what is the best way to force the order of my installers.
I know its already solved but I couldn't find any example on how to actually implement the InstallerFactory so here's a solution if anyone is googling for it.
How to use:
[InstallerPriority(0)]
public class ImportantInstallerToRunFirst : IWindsorInstaller
{
public void Install(IWindsorContainer container, Castle.MicroKernel.SubSystems.Configuration.IConfigurationStore store)
{
// do registrations
}
}
Just add the InstallerPriority attribute with a priority to your "install-order-sensitive" classes. Installers will be sorted by ascending. Installers without priority will default to 100.
How to implement:
public class WindsorBootstrap : InstallerFactory
{
public override IEnumerable<Type> Select(IEnumerable<Type> installerTypes)
{
var retval = installerTypes.OrderBy(x => this.GetPriority(x));
return retval;
}
private int GetPriority(Type type)
{
var attribute = type.GetCustomAttributes(typeof(InstallerPriorityAttribute), false).FirstOrDefault() as InstallerPriorityAttribute;
return attribute != null ? attribute.Priority : InstallerPriorityAttribute.DefaultPriority;
}
}
[AttributeUsage(AttributeTargets.Class)]
public sealed class InstallerPriorityAttribute : Attribute
{
public const int DefaultPriority = 100;
public int Priority { get; private set; }
public InstallerPriorityAttribute(int priority)
{
this.Priority = priority;
}
}
When starting application, global.asax etc:
container.Install(FromAssembly.This(new WindsorBootstrap()));
You can call your installers in the order they need to be instantiated in Global.asax.cs or e.g. in a Bootstrapper class, which is called from Global.asax.cs.
IWindsorContainer container = new WindsorContainer()
.Install(
new LoggerInstaller() // No dependencies
, new PersistenceInstaller() // --""--
, new RepositoriesInstaller() // Depends on Persistence
, new ServicesInstaller() // Depends on Repositories
, new ControllersInstaller() // Depends on Services
);
They are instantiated in this order, and you can add a breakpoint after and check the container for "Potentially misconfigured components".
If there are any, check their Status->details, if not, it's the correct order.
This solution is quick and easy, the documentation mentions using a InstallerFactory Class for tighter control over your installers so if you have a ton of installers the other solution may fit better. (Using code as convention should not require tons of installers?)
http://docs.castleproject.org/Windsor.Installers.ashx#codeInstallerFactorycode_class_4
In the end i had to use InstallerFactory and implement the ordering rules as suggested previously by returning the IEnumerable<Type> with my specific order

Locking to modify static value-type member. Is it necessary?

I have a CacheHelper class to facilitate interaction with the cache. I want to use a static int field to specify my cache timeout. The field is initially set to a const default value but I want to provide a way for the application to change the default timeout value.
Do you need to lock when modifying a static value type? Is the lock in the setter necessary? Are there any other problems you can see here? Sorry, I'm still pretty dumb when it comes to multithreading.
Thanks.
public static class CacheHelper
{
private static object _SyncRoot;
private static int _TimeoutInMinutes = CacheDefaults.TimeoutInMinutes;
public static int TimeoutInMinutes
{
get
{
return _TimeoutInMinutes;
}
set
{
lock (_SyncRoot)
{
if (_TimeoutInMinutes != value)
{
_TimeoutInMinutes = value;
}
}
}
}
public static void Insert(string key, Object data)
{
if (HttpContext.Current != null && data != null)
{
HttpContext.Current.Cache.Insert(key, data, null, Cache.NoAbsoluteExpiration, TimeSpan.FromMinutes(CacheHelper.TimeoutInMinutes));
}
}
}
You could use a volatile variable instead... but you need something, otherwise it's possible that a value written by one thread would never be seen by another.
Note that for "larger" types such as double or long you really should use a lock or the Interlocked class, as modifications to those values may not be atomic.
You don't need to lock here if the client of CacheHelper does somthing like
CacheHelper.TimeoutInMinutes = input.Value;
Since it doesn't rely on the previous value.
If your client does something like
CacheHelper.TimeoutInMinutes += input.Value;
Then you'll need to do some locking
.

Resources