How can I set up Solr to tokenize on whitespace and punctuation? - search

I have been trying to get my Solr schema (using Solr 1.3.0) to create terms that are tokenized by whitespace and punctuation. Here are some examples on what I would like to see happen:
terms given -> terms tokenized
foo-bar -> foo,bar
one2three4 -> one2three4
multiple words/and some-punctuation -> multiple,words,and,some,punctuation
I thought that this combination would work:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"/>
</analyzer
<fieldType>
The problem is that this results in the following for letter to number transitions:
one2three4 -> one,2,three,4
I have tried various combinations of WordDelimiterFilterFactory settings, but none have proven useful. Is there a filter or tokenizer that can handle what I require?

how about
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" splitOnNumerics="0" />
that should prevent one2three4 to be split

Related

Modern SharePoint CAML query with multiple criteria

I am new to CAML. CAML seems like a great way to filter lists, but I am struggling to write nested and/or statements.
I am trying to start out small and write one CAML query with only two conditions so I can get the hang of it. Below is my failed attempt.
<View>
<Query>
<Where>
<and>
<Contains>
<FieldRef Name='PracticeArea_x0028_s_x0029_' />
<Value Type='Text'>Lean</Value>
</Contains>
<NotInclude>
<FieldRef Name='PracticeArea_x0028_s_x0029_' />
<Value Type='Text'>,</Value>
</NotInclude>
</and>
</Where>
</Query>
</View>
Any help, support, and/or insight this community can provide is greatly appreciated.
NotInclude used for Lookup field that allows multiple values, so won't work if you filter against a text field.
https://learn.microsoft.com/en-us/sharepoint/dev/schema/notincludes-element-query
You could filter by Contains first and then filter by Linq or similar approach.
SPList list = web.Lists.TryGetList("TestFilter");
SPQuery spQuery = new SPQuery();
spQuery.Query = #"<Where>
<Contains>
<FieldRef Name='PracticeArea' />
<Value Type='Text'>Lean</Value>
</Contains>
</Where>";
var items = list.GetItems(spQuery);
var filterItems=items.Cast<SPListItem>().Where(item => string.Format("{0}",item["PracticeArea"]).IndexOf(',') <0);
Console.WriteLine(filterItems.Count());

Selecting columns not present in the dataframe

So, I am creating a dataframe from an XML file. It has some information on a dealer, and then a dealer has multiple cars - each car is an sub-element of the cars element and is represented by a value element - each cars.value element has various car attributes. So I use an explode function to create one row for each car for a dealer like follows:
exploded_dealer = df.select('dealer_id',explode('cars.value').alias('a_car'))
And now I want to get various attributes of cars.value
I do it like this:
car_details_df = exploded_dealer.select('dealer_id','a_car.attribute1','a_car.attribute2')
And that works fine. But sometimes the cars.value elements doesn't have all the attributes I specify in my query. So for example some cars.value elements might have only attribute1 - and then I will get a following error when running the above code:
pyspark.sql.utils.AnalysisException: u"cannot resolve 'attribute2'
given input columns: [dealer_id,attribute1];"
How do I ask Spark to execute the same query anyway. but just return None for the attribute2 if it is not present?
UPDATE I read my data as follows:
initial_file_df = sqlContext.read.format('com.databricks.spark.xml').options(rowTag='dealer').load('<xml file location>')
exploded_dealer = df.select('financial_data',explode('cars.value').alias('a_car'))
Since you already make specific assumptions about the schema the best thing you can do is to define it explicitly with nullable optional fields and use it when importing data.
Let's say you expect documents similar to:
<rows>
<row>
<id>1</id>
<objects>
<object>
<attribute1>...</attribute1>
...
<attributebN>...</attributeN>
</object>
</objects>
</row>
</rows>
where attribute1, attribute2, ..., attributebN may not be present in a given batch but you can define a finite set of choices and corresponding types. For simplicity let's say there are only two options:
{("attribute1", StringType), ("attribute2", LongType)}
You can define schema as:
schema = StructType([
StructField("objects", StructType([
StructField("object", StructType([
StructField("attribute1", StringType(), True),
StructField("attribute2", LongType(), True)
]), True)
]), True),
StructField("id", LongType(), True)
])
and use it with reader:
spark.read.schema(schema).option("rowTag", "row").format("xml").load(...)
It will be valid for any subset of attributes ({∅, {attribute1}, {attribute2}, {attribute1, attribute2}}). At the same time is more efficient than depending on the schema inference.

CAML filter to get the current month from the list?

In SharePoint 2010, I need to get items from a list based on a condition. Considering one of the fields to be 'Date' of type DateTime, the condition is:
Get Current Month Data.
How do I filter the list items based on this condition using CAML query?
By,
Raji
Use SPUtility.CreateISO8601DateTimeFromSystemDateTime to create relevant dateTime string
DateTime firstDay = new DateTime(DateTime.Now.Year, DateTime.Now.Month, 1);
Sting stringQuery =
String.Format(#"<And>
<Geq>
<FieldRef Name='Date' />
<Value Type='DateTime'>{0}</Value>
</Geq>
<Leq>
<FieldRef Name='Date' />
<Value Type='DateTime'>{1}</Value>
</Leq>
</And>",
SPUtility.CreateISO8601DateTimeFromSystemDateTime(firstDay),
SPUtility.CreateISO8601DateTimeFromSystemDateTime(firstDay .AddMonths(1)));
SPQuery query = new SPQuery(stringQuery);

Create a FetchXML query that uses ISNULL

I want to make a FetchXML query that is using ISNULL like in SQL query.
In SQL
SELECT * FROM Contact WHERE ISNULL(FirstName, '') = ''
Do they have any operators for it in FetchXML?
Not exactly the same but the below query should give you something to work with.
<fetch mapping="logical">
<entity name="contact">
<all-attributes />
<filter>
<condition attribute="firstname" operator="null" />
</filter>
</entity>
</fetch>

When configuring a relational store join, can I do a one-to-many join from the ActivePivot store?

Using Relational Stores, is it possible to do a one-to-many join from the ActivePivot store to a joining store. Suppose my ActivePivot store joins to another store on SOME_ID, but the key for the other store is SOME_ID,SOME_TYPE. Then it is possible to have:
AP_STORE SOME_ID | JOIN_STORE SOME_ID | JOIN_STORE SOME_TYPE
------------------------------------------------------------
1 | 1 | TYPE1
1 | 1 | TYPE2
However, when the join is attempted, the following error is raised, because there is not a unique entry in the joining store:
Caused by: com.quartetfs.fwk.QuartetRuntimeException: Impossible to find exactly 1 entry from store with key: Key
I can see why there is a problem, because there single record in the AP store that really needs to become two separate records that join to each of the records in the join store, respectively, but I guess that can't happen unless JOIN_STORE:SOME_TYPE is also a field in the AP store.
Is there a way to make such a one-to-many join from the AP store happen?
Thanks
Edit: To be clear, SOME_TYPE does not exist in the AP store (even under a different name). I have joined on all the common fields, but there are more than one matching entries in the joining store. The matching entries differ on a field that is not common and does not exist in the AP store.
If I try to add a foreign key that does not exist in the AP store (even under a different name), I get:
Caused by: com.quartetfs.fwk.QuartetRuntimeException: com.quartetfs.fwk.AgentException: On join 'AP_STORE=>JOIN_STORE' the store 'AP_STORE' does not contain the foreign key 'FIELD_ONLY_IN_JOIN_STORE' in its fields:
A relational store join does not duplicate the data.
You cannot, using the join of the relational stores, join one entry to multiple ones.
You cannot use a multiple producing calculator with the relational stores neither.
Depending of your project architecture and workflow, you can consider adding a logic in the transaction handler used to feed your AP_Store. In this transaction handler, you could retrieve the entries of your Join_Store in order to duplicate the entries of your AP_Store.
You'll first need to change your AP_Store keys by adding a new fields used to differentiate your duplicates.
AP_STORE SOME_ID | AP_STORE SOME_DUPLICATE_ID |JOIN_STORE SOME_ID | JOIN_STORE SOME_TYPE
-----------------------------------------------------------------------------------------
1 | 1 | 1 | TYPE1
1 | 2 | 1 | TYPE2
For your transaction handler you can inject the StoresUniverse in order to retrieve your Join_Store and then do a search using the SOME_ID value on the Join_Store to retrieve the number of duplicates you'll need to create:
IRelationalStore joinStore = storeUniverse.get("Join_Store");
List<IRelationalEntry> joinEntries = joinStore.search("SOME_ID",apStoreObject.get("SOME_ID"));
for(int i = 0; i < joinEntries.size(); i++) {
// Clone apStoreObject, add a SOME_DUPLICATE_ID value and add it to the list of objects to add in your AP_Store
}
To join your AP Store to the joiningStore, you need to give a set of fields, common between the 2 stores. There is no constrain like these fields being key fields of each of the store.
Then, if you have a field representing SOME_TYPE in your AP store, just add it as a foreign key.
<property name="joins">
<list>
<bean class="com.quartetfs.tech.store.description.impl.JoinDescription">
<property name="targetStoreName" value="JoiningStore" />
<property name="foreignKeys" value="SOME_TYPE" />
</bean>
</list>
</property>
If the field has different names in the joined store and the joining store, you can use a map to describe the relationship between the joined store foreign key and the associated field in the joining field:
<property name="joins">
<list>
<bean class="com.quartetfs.tech.store.description.impl.JoinDescription">
<property name="targetStoreName" value="JoiningStore" />
<property name="foreignKeyMap" >
<map>
<entry key="AP_SOME_TYPE" value="SOME_TYPE" />
</map>
</property>
</bean>
</list>
</property>

Resources