Solr Question about Loading Changes to Schema - python-3.x

I'm new to Solr and received the following error when adding a document through pysolr:
pysolr.SolrError: Solr responded with an error (HTTP 400): [Reason: ERROR: [doc=bc4aa768-6f35-4888-80e0-1578d9971b3c] Error adding field 'periodical_nlm'='2984692R' msg=For input string: "2984692R"]
I ended up finding out that the first periodical_nlm value added was 404536.0, so I assumed it was a type issue. In Python I then cast every periodical_nlm explicitly to string before adding 2984692R. However, the error persisted.
I Googled a bit and found that I should probably explicitly tell Solr that I want that field to be a string. I've not gotten very "hands on" with the schema yet, so I just had some questions:
(1) There appear to be two schema files: managed-schema in the directory for the core and managed-schema in the conf folder of the core. I'm assuming that the initialized schema which is in use is the one in the conf folder?
(2) Which do I update in order for things to proceed smoothly? I attempted adding the following to the schema file in the core directory but the error persisted:
field name="periodical_nlm" type="string" indexed="true" stored="true" required="false" multiValued="false" />
Do I need to rerun some initialization process or add something to the conf file separately?
Thank you so much and please let me know if you need more info. I'm running on a Windows 10 Home x64 platform (not sure if that's important if there are any command-line things I need to run...).

As long as you reload the core after changing the managed-schema file under conf, you should be fine. Be aware that you should do this before indexing content - so you might need to clean out the index by deleting everything, then changing the schema and re-indexing your content. Changing the schema does not change content that has already been indexed.
Otherwise your assumption is correct, and the schemaless mode (where the type is determined by the format of the first value submitted (not the type - as that's usually not included in any way, all values are just strings when being submitted, so Solr attempts to guess the type by applying a hierarchy of pattern matching)) is useful for prototyping - when you're moving to production you should always define the schema explicitly to avoid issues like you've seen here.

Related

I want to add the the raw content which is stored in segment folder nutch version 1.17

While running this command below:
bin/nutch solrindex http://localhost:8983/solr/nutch/ testingnewline/crawldb -linkdb testingnewline/linkdb -dir testingnewline/segments/ -deleteGone -addBinaryContent
It is throwing below exception.
Error: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/nutch: ERROR: [doc=https://www.saintlukeskc.org/] Error adding field 'binaryContent'
May I know what changes need to do I need to change the schema.xml.Please help me.
The Solr schema must contain the field "binaryContent", see Nutch's default Solr schema.xml which contains all the fields eventually added by Nutch or one of the plugins.

Logstash save/ modify configuration in environment

In my system, I use logstash, filebeat and elasticsearch
Filebeat reads the logs, required fields in the logs are filtered with logstash and saved in elasticsearch.
I have a customer requirement to switch on/off saving some fields in the log by a single config change by the customer.
My planned approach is to keep the switch variable as an environment variable in "/etc/default/logstash" location and let the customer change the variables with a file operation.
But I have found out that the logtash config is not reloaded when we change that file even if we set the "config.reload.automatic: true". So I cannot continue my planned approach.
Also letting customer edit the logstast ".conf" files is not a good approach either because the code is so complex.
Please advice on this issue.
Thanks,
I have found that it is not possible to reload the value of a variable in the environment without restarting logstash. So I have used a file read solution. The config block is as below.
ruby {
code => "event.set( 'variable1',IO.readlines('/etc/logstash/input.txt')[0])"
}
This has fixed my problem. But I would like to know is there a performance impact in executing file operation in each event

"$ ./propellor --list-fields" yields "propellor: Prelude.read: no parse"

I am trying to specify a private field using Haskell's Propellor deployment library.
As context: the field in question is a file whose content I want to encrypt and have propellor place on the destination server during deployment. However, I haven't gotten nearly that far; before even attempting to set the field, I have run into an error while attempting to simply view propellor's current private fields.
Specifically, when I run the command to view fields, $ ./propellor --list-fields, it asks for my gpg key, prints some gpg key information, and then the following:
Currently set data:
Field Context Used by
----- ------- -------
propellor: Prelude.read: no parse
There should be some fields present which were set previously, but somehow they are not displayed here and instead I get only the propellor: Prelude.read: no parse error message. I have not yet attempted to add my own field.
It seems that propellor is having an issue trying to parse something, but I do not know what that could be. I realize this is not a lot to go on but am not sure what else to do. Has anyone run into a similar error with Haskell's propellor before or know what the issue could be?
Your self-answer is correct; here I will just look at the issue in a different light.
The error you got points to the read function in Prelude. read is an example of a partial function: its type...
read :: Read a => String -> a
... says that it can convert Strings into a value of any type a with a Read instance; however, we known that this does not work for all Strings, as the parsing might fail. To put it more dramatically, the type of read is a lie.
It is generally a good idea to avoid partial functions, not only because more often than not they are bugs waiting to happen (e.g. you assume the parse will never fail due to some precondition in your business logic, and then the precondition changes), but also because they tend to give extremely uninformative error messages (as you just noticed). In the case of read, for instance, a nicer alternative is readMaybe, which returns Nothing if the parsing fails. That gives an opportunity to react to the failure. In dfferent situations you might, for instance, find it appropriate to ask the user to retry, supply a default value or, if there is no other recourse, terminate the program with an error message that explains what went wrong in terms of what you are trying to do.
Sorry this question was so vague, but there was very little to go on from the error message. The issue is now resolved and here is an explanation in case it is helpful to anyone who comes across it while facing a similar error.
The code contained an instance of a configuration data type defined not in a module, but in a text file being read in via the Read class. In short, the issue was that I had altered the data type without comprehensively updating the text-defined configuration instance to accomodate the type change.
In the long-form version of the explanation the issue is sneakier, involving merging the data type change over a change to the text-configuration which was not recognized as in conflict due to no line conflicts.
But essentially the error was failure to read in a data type instance defined in text-form.
I have plans to define the configuration data instance in a module rather than reading it in from text, which should be caught by the compiler and give a more meaningful error message should a similar error arise.

Sphinx4 figuring out correct models

I am trying to use the Sphinx4 library for speech recognition, but I cannot seem to figure out the correct combination of acoustic model-dictionary-language model. I have tried out various combinations and I get a different error every time.
I am trying to follow the tutorial on http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4. I do not have a config.xml as I would if I was using ConfigurationManager instead of Configuration, because there is no perceivable way of passing the location of the config file to the Configuration itself (ConfigMgr takes it as an argument to the constructor); and that might be my problem right there. I just do not know how to point to one, and since the tutorial says "It is possible to configure low-level components of the application through XML file although you should do that ONLY IF you understand what is going on.", I assume having a config.xml file is not compulsory.
Combining the latest dictionary (7b - obtained from Sourceforge) with the latest acoustic model (cmusphinx-en-us-5.2.tar.gz - from SF again) and the language model (cmusphinx-5.0-en-us.lm.gz - from SF again) results in NullPointerException in startRecognition. The issue is similar to the problem here: sphinx-4 NullPointerException at startRecognition, but the link given in the answer no longer works. I obtained 0.7a from SF (since that is the dict the link seems to point at), but I am getting even earlier in the execution Error loading word: ;;; when I use that one. I tried downloading latest models and dict from the Github repo, that results in java.lang.IndexOutOfBoundsException: Index: 16128, Size: 16128.
Any help is much appreciated!
You need to use latest code from github
http://github.com/cmusphinx/sphinx4
as described by tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
Correct models (en-us) are already included, you should not replace anything. You should not configure any XML files, use samples as provided in the sources.

gora-mongodb.mapping.XML properties File

I'm new to Nutch (2.2.1) and trying to run it on Cygwin/Windows 7 with the latest version of Gora (0.5) so I can persist data to a MongoDB (2.6) datastore. I changed the Nutch-Site.XML File to include my Mongo property but I'm a little confused about the gora-mongodb.mapping.XML properties file here that's needed. Just wondering do I need to:
1) create a Java class within the Nutch/Gora project which I specify in class-name property in the gora-mongodb.mapping File or will Gora create this for me? The documentation doesn't appear to be very clear.
2) I created a sample File in my apache-nutch-2.2.1\runtime\local\conf folder and added the name of my MongoDB collection. When I run Nutch I get the following error:
$ ./nutch crawl urls -dir testCrawl -depth 3 -topN 5
cygpath: can't convert empty path
Exception in thread "main" org.apache.gora.util.GoraException: java.lang.IllegalStateException: A collection is not specified
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:221)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.lang.IllegalStateException: A collection is not specified
at org.apache.gora.mongodb.store.MongoMappingBuilder.build(MongoMappingBuilder.java:77)
at org.apache.gora.mongodb.store.MongoStore.initialize(MongoStore.java:168)
at org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 8 more
Any help or clarification around this file would be appreciated.
You need 2 files in nutch/conf:
gora.properties: where you declare you are going to use mongodb backend.
gora-mongodb-mapping.xml (notice the dash, not the dot you wrote): where you create a mapping between names in Gora entities and the fields in the datastore.
The version you are using I really think it is not prepared to work with Gora 0.5, but give it a shot. Copy gora-mongodb-mapping.xml from Nutch-2.3-SNAPSHOT to nutch/conf/
If it does not work, try using Nutch-2.3-SNAPSHOT instead of 2.2.1.

Resources