solr - modeling multiple values on 1:n connection - search

I try to model my db using this example from solr wiki.
I have a table called item and a table called features with id,featureName,description
here is the updated xml (added featureName)
<dataConfig>
<dataSource driver="org.hsqldb.jdbcDriver" url="jdbc:hsqldb:/temp/example/ex" user="sa" />
<document>
<entity name="item" query="select * from item">
<entity name="feature" query="select description, featureName as features from feature where item_id='${item.ID}'"/>
</entity>
</document>
Now I get two lists in the xml element
<doc>
<arr name="featureName">
<str>number of miles in every direction the universal cataclysm was gathering</str>
<str>All around the Restaurant people and things relaxed and chatted. The</str>
<str>- Do we have... - he put up a hand to hold back the cheers, - Do we</str>
</arr>
<arr name="description">
<str>to a stupefying climax. Glancing at his watch, Max returned to the stage</str>
<str>air was filled with talk of this and that, and with the mingled scents of</str>
<str>have a party here from the Zansellquasure Flamarion Bridge Club from</str>
</arr>
</doc>
But I would like to see the list together (using xml attributes) so that I dont have to join the values.
Is it possible?

I wanted to suggest the ScriptTransformer, it gives you the flexibility to alter the data as needed, but it will not work in your case since it's working at the row level.
You can always define an aggregation function for string concatenation in SQL(example), but you will potentially have performance issues.
If you would use a http/xml data source the solution would have been to use the flatten atribute.
Nevertheless the search functionality will work as expected even if you ended up with multi-valued fields. The down side would be on the client where you will concatenate them before the presentation layer, which is not really a problem if you use some sort of pagination.

Related

BDC model/search connector and multi value field with refinment

BDC model:
My BDC model's entity has a property named Color.
The TypeName is specified as System.String[].
<TypeDescriptor Name="Color" TypeName="System.String[]">
<Properties>
<Property Name="RequiredInForms" Type="System.Boolean">false</Property>
</Properties>
</TypeDescriptor>
Database:
In my database (my BDC content source) I added column values like this one:
;#Blue;#Green;#Yellow;#
Search Schema
I created a new managed property and enabled multiple values (and also refinable - active, queryable, retrievable, safe).
Search Results
Filtering on a specific color via search works.
Example: RsExpAdvWorksProductColor:"blue"
Search Refinement
However I cannot refine on colors.
Adding a refiner on my Managed Property shows up like that:
Color
;#Blue;#Green;#Yellow;#
;#Green;#Yellow;#
;#Red;#Green;#Yellow;#Blue;#Black;#Cyan;#
Obviously the single values are not treated as such - the whole "string" of "special-delimiter" separated values is being shown as a refinment criteria.
Any hints?
Update 2015-03-20: I took a closer look at the built-in multi choice columns. In search results they are being returned as "Value1;#Value2;#" and so on. Basically there is a trailing Red;#Blue;# separator - no leading ;#Red;#Blue;# one. Much to my regret that didn't solve my problem.
Update 2015-03-20: Surprise surprise. It is in fact "working as designed" (like so many things in SharePoint :P). What I am looking for has to be dealt with separately. It behaves exactly the same with built-in multi choice fields so there is nothing wrong with my BDC/Search integration.
Regarding the refiner, have a look at the following links...
http://www.eliostruyf.com/part-6-create-multi-value-search-refiner-control/
https://hyankov.wordpress.com/2014/12/15/sharepoint-2013-refiner-multi-value-contains-instead-of-an-equals/

Solr - Is there a way to speed up my import

I have a relational database model
This is the basics of my data-config.xml
<entity name="MyMainEntity" pk="pID" query="select ... from [dbo].[TableA] inner join TableB on ...">
<entity name="Entity1" pk="Id1" query="SELECT [Text] Tag from [Table2] where ResourceId = '${MyMainEntity.pId}'"></entity>
<entity name="Entity1" pk="Id2" query="SELECT [Text] Tag from [Table2] where ResourceId2 = '${MyMainEntity.pId}'"></entity>
<entity name="LibraryItem" pk="ResourceId"
query="select SKU
FROM [TableB]
INNER JOIN ...
ON ...
INNER JOIN ...
ON ...
WHERE ... AND ...'">
</entity>
</entity>
Now, this takes a lot of time.
10000 rows in the first query and then each other inner entities are fetched later (around 10 rows each).
If I use a db profiler I see a the three inner entities query running over and over (3 select sentences than again 3 select sentences over and over)
This is really not efficient.
And the import can run over 40 hrs ()
Now,
What are my options to run it faster .
Obviously there is an option to flat the tables to one big table - but that will create a lot of other side effects. I would really like to avoid that extra effort and run solr on my production relational tables.
So far it works great out of the box and I am searching here if there is a configuration tweak.
If I will flat the rows that - does the schema.xml need to be change too? or the same fields that are multivalued will keep being multivalued.
Thanks.
without changing the schema of the DB, the first thing to try is caching. If the inner entities cache well, gains will be substantial.
Maybe the wiki is not uptodate so you should check the jira issues, namely solr-2382 and maybe have a look at solr-2948 too.
A second path could be trying multithreading DIH, but it's more tricky. At one point this was optional, but later was removed cause it was buggy, and I think now there was some jira issue trying to reimplement it, try look it up, but I recommend caching first.

Solr - index JSON query string from database?

I would like to know if it is possible to index data that contains a JSON string that can be decoded and each JSON value to be indexed with the separate values.
I am using the DIH to connect to a MySQL database and able to index the individual columns.
The result would look like the following:
<response name="response" numFound="1" start="0" maxScore="2.7143538">
...
<result name="response" numFound="1" start="0" maxScore="2.7143538">
<doc>
<float name="score">2.7143538</float>
<str name="id">82</str>
<str name="name">jorge</str>
<str name="otherinfo">{"day":15,"year":1989,"month":"January"}</str>
</doc>
</result>
</response>
The problem is that "otherinfo" is a JSON string that I would like to decode and have something like the following in my index:
<response name="response" numFound="1" start="0" maxScore="2.7143538">
...
<result name="response" numFound="1" start="0" maxScore="2.7143538">
<doc>
<float name="score">2.7143538</float>
<str name="id">82</str>
<str name="name">jorge</str>
<str name="day">15</str>
<str name="year">1989</str>
<str name="month">January</str>
</doc>
</result>
</response>
Would this be possible to do at all with Solr?
Thanks in advance
I commented on this. I decided that I should answer instead.
The fix for your issue isn't at the Solr level. You shouldn't be storing your data this way in the DB to begin with. In the long run, it would be better to fix this problem there, as opposed to trying to hack this at the Solr indexing level.
Your question proves that someone, probably an end user, is interested in searching by this data. This implies that it should probably be stored in the database as an actual Date or Timestamp field so that it can be properly selected or sorted on.
I'm sure people won't like that this doesn't exactly answer your question, but someone needs to tell you this.
If you know your way around Java you could write your own, custom transformer that would handle your specific case.
Have you tried using DIH RegexTransformer to parse JSON?
I think that should be doable, especially if you have fixed json format (doesn't contain document in document in document in ...).
I've just noticed ScriptTransformer, which allows you to write your own parser. I think this is the way to go...
Is the otherinfo field in the DB a JSON string to start with?
You would need dynamic fields (docs, explanation) and client-side code to let Solr store data with arbitary schema.
You would need to define dynamic fields in your schema like:
dyn_string_*: store text as it is
dyn_text__*: store text and index it for search
etc
Then you will need to tell DIH to map DB fields to solr dynamic fields (pseudocode warning; sorry, but I am not familiar with DIH):
Select
day as dyn_number_day,
name as dyn_text_name
from
tablename
Edit
You do have the requirement to query into the data structure. This needs a schema-less datastore.
Document DBs like MongoDB offer exactly the functionality: store data on arbitary fields you determine at insert-time. And it can run any kind of ad-hoc query on your data.
I am not aware of a request handler that can index your data for that. You can write code that fetches updated (or added or removed) rows periodically, decodes the JSON field and index it to Solr.
I reccomend skinny data model to store attributes to properties independent of current DB schema. I asked a question ' Set intersection in MySQL: a clean way ' a while back.
Recap: MongoDB and friends contain exactly the functionality you need. If you want relations and referential integrity, you can keep using RDBMS. If you still want that JSON thing, develop an active system that will parse it and index it to solr. But I recommend moving to a skinny data model, since you can get the same (conditions apply!) query capabilities that Solr gives you by SQL.
Exotic technology: Graph databases like Neo4j contain document database functionality (ad-hoc queries) and relations: a relation directly links one node to another, no joins involved. So it's just one step short of referential integrity.

Solr conditional adds/updates?

I have a fairly simple need to do a conditional update in Solr, which is easily accomplished in MySQL.
For example,
I have 100 documents with a unique field called <id>
I am POSTing 10 documents, some of which may be duplicate <id>s, in which case Solr would update the existing records with the same <id>s
I have a field called <dateCreated> and I would like to only update a <doc> if the new <dateCreated> is greated than the old <dateCreated> (this applies to duplicate <id>s only, of course)
How would I be able to accomplish such a thing?
The context is trying to combat race conditions resulting in multiple adds for the same ID but executing in the wrong order.
Thanks.
I can think of two ways:
Write your own UpdateHandler and override addDoc to implement that checking.
Put the appropriate locks (critical sections) in your client code in order to fetch the stored document, compare the dates, and conditionally add the new document in a thread-safe manner.
Remember that Solr is not a database, comparing it to MySQL is comparing apples and oranges.
As of solr 4.0, optimistic concurrency is enabled via the _version_ field.
http://yonik.com/solr/optimistic-concurrency/
To enable, you need to make sure your schema.xml contains
<field name="_version_" type="long" indexed="true" stored="true"/>
and in solrconfig.xml
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<str name="dir">${solr.data.dir:}</str>
</updateLog>
</updateHandler>
With really custom addition logic like this, I find that writing your own client side updater works better. It keeps you from mucking around in Solr internals, which makes it easier to update in the future. You can definitly do this in SolrJ, but if you aren't a Java dev, there is probably a clientside library in your own preferred language... PHP, Python, Ruby, C# etc...
The rsolr Ruby gem (http://github.com/mwmitchell/rsolr/tree/master) makes it VERY easy to hack together a custom load script.

Looking for an XSD representing an 'Order' for a shopping cart

I am trying to create an XML schema representing an 'order' for a shopping cart
I want this to completely abstract away my shopping cart's implementation -- and eventually support partners sending us orders using this schema. (yes i'm planning on integrating this schema into our existing cart)
It will have original order items, repeat shipping items and domain specific things.
I'm quite capable of building this, but i was wondering if there are many things out there like this that I could at least base mine upon.
I know there are standards out there for certain schema elements like this, but I've lost track of which are the best/standard and how you might extend them etc.
obviously if i want a partner to send me an 'order' i'd like to use a standard if one exists.
UBL (Universal Business Language) defines schemas for business documents (purchase orders, invoices, etc.). It is an OASIS standard, see:
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ubl
Do you want the XML, or the XSD? For the XSD, you can generate one using Microsoft's XSD generator based off of an XML document.
If you want a generic XML document that could represent an 'order', here's one.
<?xml version="1.0"?>
<Order>
<Date>2003/07/04</Date>
<CustomerId>123</CustomerId>
<CustomerName>Acme Alpha</CustomerName>
<Item>
<ItemId> 987</ItemId>
<ItemName>Coupler</ItemName>
<Quantity>5</Quantity>
</Item>
<Item>
<ItemId>654</ItemId>
<ItemName>Connector</ItemName>
<Quantity unit="12">3</Quantity>
</Item>
<Item>
<ItemId>579</ItemId>
<ItemName>Clasp</ItemName>
<Quantity>1</Quantity>
</Item>
</Order>
From here.
If you are looking for ideas about how to structure the shopping cart:
Database Table Structure for Shopping Cart

Resources