Integrate solr with NLP

Integrate solr with NLP - search

I am working on solr 4.8.1 version. I want to integrate solr with NLP in order to improve the search relevancy. I am unable to
dig out a good tutorial that explains its configuration, its output, its benefit for Solr.
It tried this configuration:
<fieldType name="text_opennlp" class="solr.TextField">
<analyzer>
<tokenizer class="solr.OpenNLPTokenizerFactory"
sentenceModel="opennlp/en-sent.bin"
tokenizerModel="opennlp/en-token.bin"
/>
</analyzer>
</fieldType>
But this is reporting the following error:
Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType "text_opennlp": Plugin init failure for [schema.xml] analyzer/tokenizer: Error loading class 'solr.OpenNLPTokenizerFactory

Did you run the https://issues.apache.org/jira/browse/LUCENE-2899 patch mentioned in https://wiki.apache.org/solr/OpenNLP ?
It gives you the ability to only keep nouns and verbs.
Tip: check analysis for values in payload as they differ from Treebank example.

Related

How to correctly design application architectures on nestjs?

I just started to study my first serious framework with the code itself, there are no problems, but I can’t find examples of how to make the application architecture.
Example - I figured out the related tables in the database, but how to structure it correctly?
conditional example
this catalogs
<src>
<categories>
<dto>
create-category.dto.ts
create-subcategory.dto.ts
<models>
category.model.ts
subcategory.model.ts
<services>
category.service.ts
subcategory.service.ts
category.controller.ts
category.module.ts
OR
<src>
<categories>
<dto>
create-category.dto.ts
category.model.ts
category.service.ts
category.controller.ts
category.module.ts
<subcategory>
<dto>
create-subcategory.dto.ts
subcategory.model.ts
subcategory.service.ts
If possible, send me a link where I can read about it at all

It's a long story. Basically, (expect CRUDs) an entity doesn't equal a module. Directories you can design whatever you prefer, but you have to focus on defining modules and its boundaries. https://twitter.com/_MaciejSikorski/status/1505613059221594113

Solr Question about Loading Changes to Schema

I'm new to Solr and received the following error when adding a document through pysolr:
pysolr.SolrError: Solr responded with an error (HTTP 400): [Reason: ERROR: [doc=bc4aa768-6f35-4888-80e0-1578d9971b3c] Error adding field 'periodical_nlm'='2984692R' msg=For input string: "2984692R"]
I ended up finding out that the first periodical_nlm value added was 404536.0, so I assumed it was a type issue. In Python I then cast every periodical_nlm explicitly to string before adding 2984692R. However, the error persisted.
I Googled a bit and found that I should probably explicitly tell Solr that I want that field to be a string. I've not gotten very "hands on" with the schema yet, so I just had some questions:
(1) There appear to be two schema files: managed-schema in the directory for the core and managed-schema in the conf folder of the core. I'm assuming that the initialized schema which is in use is the one in the conf folder?
(2) Which do I update in order for things to proceed smoothly? I attempted adding the following to the schema file in the core directory but the error persisted:
field name="periodical_nlm" type="string" indexed="true" stored="true" required="false" multiValued="false" />
Do I need to rerun some initialization process or add something to the conf file separately?
Thank you so much and please let me know if you need more info. I'm running on a Windows 10 Home x64 platform (not sure if that's important if there are any command-line things I need to run...).

As long as you reload the core after changing the managed-schema file under conf, you should be fine. Be aware that you should do this before indexing content - so you might need to clean out the index by deleting everything, then changing the schema and re-indexing your content. Changing the schema does not change content that has already been indexed.
Otherwise your assumption is correct, and the schemaless mode (where the type is determined by the format of the first value submitted (not the type - as that's usually not included in any way, all values are just strings when being submitted, so Solr attempts to guess the type by applying a hierarchy of pattern matching)) is useful for prototyping - when you're moving to production you should always define the schema explicitly to avoid issues like you've seen here.

groovy RESTClient is not working on Windows 7 machine

I am trying to use the below lines to get API response, but its not working for me.
Please help me.
Groovy ver = 2.4.15
OS = windows 7
#Grab(group='org.codehaus.groovy.modules.http-builder',module='http-builder',version='0.7.1')
import groovyx.net.http.HTTPBuilder

As mentioned in the comments, I think the best place to start would be to run your script with the following flag turned on:
~> groovy -Dgroovy.grape.report.downloads=true <yourscript>
that should give you some logging indicating what the grape resolution is doing and hopefully where it tried to download the file from when it failed.
For an overview of the grape resolution mechanics, you can refer to the groovy documentation on grapes.
My guess is that groovy is trying multiple resolvers (i.e. maven central, jcenter, etc) and one of them is failing early even though a later one has the artifact. In a situation like this the resolution engine naturally should keep trying until it finds a working artifact but I have seen things fail this way before.
To modify the resolution order and behavior, you should look at the file:
<your user home dir>/.groovy/grapeConfig.xml
where, if the file does not exist, groovy uses the following default data for the file:
<ivysettings>
<settings defaultResolver="downloadGrapes"/>
<resolvers>
<chain name="downloadGrapes" returnFirst="true">
<filesystem name="cachedGrapes">
<ivy pattern="${user.home}/.groovy/grapes/[organisation]/[module]/ivy-[revision].xml"/>
<artifact pattern="${user.home}/.groovy/grapes/[organisation]/[module]/[type]s/[artifact]-[revision](-[classifier]).[ext]"/>
</filesystem>
<ibiblio name="localm2" root="file:${user.home}/.m2/repository/" checkmodified="true" changingPattern=".*" changingMatcher="regexp" m2compatible="true"/>
<!-- todo add 'endorsed groovy extensions' resolver here -->
<ibiblio name="jcenter" root="https://jcenter.bintray.com/" m2compatible="true"/>
<ibiblio name="ibiblio" m2compatible="true"/>
</chain>
</resolvers>
</ivysettings>
(from the groovy github repo)
Two things to note here:
The returnFirst attribute. The resolution engine will try the resolvers one by one and return the first hit for this specific artifact. If my hunch is correct, this is not working correctly and an early resolver is failing and not giving a later resolver a chance to resolve the artifact.
The list of resolvers is ordered so changing this order will affect the result.
So, long story short: turn on debugging and see if that gives anything.
Then either modify or create the grapeConfig.xml file and either:
change the order of the ibiblio elements to change the resolution order
add another maven resolver (i.e. add another ibiblio node) for a target you have verified has the artifact (and add it first in the chain to make sure one of the others does not fail first).
or play with the returnFirst flag to see if setting it to false resolves your issue

Xamarin.iOS versioning during build

I've been trying to get an automatic versioning system going for builds (mainly due to external crash analytics picking up each build as the same until I change the version manually). The format is simple, I take the CFBundleShortVersionString from the Info.plist, and append the current date and time (in yyyyMMddmmss format) as subversion.
The task I've put together for this:
<Project>
<Target Name="BeforeBuild">
<XmlPeek XmlInputPath="$(ProjectDir)Info.plist" Query="//dict/key[. = 'CFBundleShortVersionString']/following-sibling::string[1]">
<Output TaskParameter="Result" ItemName="VersionNumber" />
</XmlPeek>
<PropertyGroup>
<BuildNumber>$([System.DateTime]::Now.ToString(yyyyMMddmmss))</BuildNumber>
</PropertyGroup>
<XmlPoke XmlInputPath="$(ProjectDir)Info.plist" Query="//dict/key[. = 'CFBundleVersion']/following-sibling::string[1]" Value="$(VersionNumber).$(BuildNumber)" />
</Target>
</Project>
However it fails with the following error:
Target BeforeBuild:
[...]/[...].csproj(1069,5): error MSB3733: Input file "[...]/Info.plist" cannot be opened. For security reasons DTD is prohibited in this XML document. To enable DTD processing set the DtdProcessing property on XmlReaderSettings to Parse and pass the settings into XmlReader.Create method.
Done building target "BeforeBuild" in project "[...].csproj" -- FAILED.
What am I doing wrong? There's not much info about this error, at least not much that I could find and would help fixing it.

Disable StyleCop Rule with VS2012 Express

I am using Visual Studio 2012 Express and therefore don't have access to the integration for the regular version of VS. I have used the MSBuild to integrate StyleCop and the rules show up as Warnings. I want to be able to disable rules. The Disabling StyleCop rules post shows that this is possible but I can't make sense of the answer which suggests to edit the Settings.StyleCop file. However, I don't understand what is required to be added to this file to disable a rule.
If I wanted to disable rule SA1649 for instance how would I update the following file?
<StyleCopSettings Version="4.3">
<GlobalSettings>
<CollectionProperty Name="DeprecatedWords">
<Value>preprocessor,pre-processor</Value>
<Value>shortlived,short-lived</Value>
</CollectionProperty>
</GlobalSettings>
<Parsers>
<Parser ParserId="StyleCop.CSharp.CsParser">
<ParserSettings>
<CollectionProperty Name="GeneratedFileFilters">
<Value>\.g\.cs$</Value>
<Value>\.generated\.cs$</Value>
<Value>\.g\.i\.cs$</Value>
</CollectionProperty>
</ParserSettings>
</Parser>
</Parsers>
<Analyzers>
<Analyzer AnalyzerId="StyleCop.CSharp.NamingRules">
<AnalyzerSettings>
<CollectionProperty Name="Hungarian">
<Value>as</Value>
<Value>do</Value>
<Value>id</Value>
<Value>if</Value>
<Value>in</Value>
<Value>is</Value>
<Value>my</Value>
<Value>no</Value>
<Value>on</Value>
<Value>to</Value>
<Value>ui</Value>
</CollectionProperty>
</AnalyzerSettings>
</Analyzer>
</Analyzers>
</StyleCopSettings>
Note: I am using version 4.7 even though the default settings file shows 4.3

I found the following documentation to edit a stylecop rule in the XML
The XML code snippet is below.
<StyleCopSettings Version="4.3">
<Analyzers>
<Analyzer AnalyzerId="Microsoft.StyleCop.CSharp.LayoutRules">
<Rules>
<Rule Name="StatementMustNotBeOnSingleLine">
<RuleSettings>
<BooleanProperty Name="Enabled">False</BooleanProperty>
</RuleSettings>
</Rule>
<Rule Name="ElementMustNotBeOnSingleLine">
<RuleSettings>
<BooleanProperty Name="Enabled">False</BooleanProperty>
</RuleSettings>
</Rule>
</Rules>
<AnalyzerSettings />
</Analyzer>
</Analyzers>
</StyleCopSettings>
In addition, I also found out that you can drag the Settings.Sytlecop file onto the StyleCopSettingsEditor.exe which presents a GUI for enabling and disabling the rules.

Paste the below code in notepad and save as settings.StyleCop and place it in solution folder and build
<StyleCopSettings Version="105">
<GlobalSettings>
<BooleanProperty Name="RulesEnabledByDefault">False</BooleanProperty>
</GlobalSettings>
</StyleCopSettings>

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Integrate solr with NLP - search

Did you run the https://issues.apache.org/jira/browse/LUCENE-2899 patch mentioned in https://wiki.apache.org/solr/OpenNLP ? It gives you the ability to only keep nouns and verbs. Tip: check analysis for values in payload as they differ from Treebank example.

Related

How to correctly design application architectures on nestjs?

Solr Question about Loading Changes to Schema

groovy RESTClient is not working on Windows 7 machine

Xamarin.iOS versioning during build

Disable StyleCop Rule with VS2012 Express

Categories

Resources