is it possible in java-lucene to map multifield values? - search

i have a single document in lucene index with multiple fields and some of the fields are multivalued for example
Document{ field1: field2: field3: .... field9:}
and fields are like
field1:some string value
field2:some int value
.
.
.
field8:(string values that are space separated and each is a token )//uid for some items
field9:(value or items whose uid is in field8 and both field8 and field9 are one to one map)
With this i am able to search and index a multivalued field in a flat document structure now i have an other field say field10: in which i have multiple values against a single uid in field8 then how can i index and search this new field in this structure using lucene
i want to index and then map the field10 values against field8, for example
field8: { uid1 | uid2 | uid3}
field10:{id1,id2,id3 | id1,id7 | id1 }
help required
regards

Here's what I think you're asking. Suppose you have a document like:
field8: { 1 | 2 | 3 }
field10: { foo,bar | baz,foo | bar,baz }
You want to have the query +field8:1 +field10:foo return this document, but +field8:3 +field10:foo shouldn't return the document.
This is a relational data model, and it's not something Lucene tries to be good at. Your best bet is to use a relational database.
If you want to stick with Lucene, you should split each one of these fields into its own document. So one doc would be field8:1, field10: foo,bar etc. Alternatively you could have your own tokenizer which used payloads or term positions to handle this. I don't know that it would be particularly easy or fast.
There are many questions on this site regarding your problem, e.g. Storing relational data in a Lucene.NET index

Related

Best way to retrieve an item from dynamoDB using attribute which is not partition key

I am new to dynamoDB and need some suggestion from experienced people here . There is a table created with below model
orderId - PartitionKey
stockId
orderDetails
and there is a new requirement to fetch all the orderIds which includes particular stockId. The item in the table looks like
{
"orderId":"ord_12234",
"stockId":[
123221,
234556,
123231
],
"orderDetails":{
"createdDate":"",
"dateOfDel":""
}
}
provided the scenario that stockId can be an array of id it cant be made as GSI .Performing scan would be heavy as the table has large number of records and keeps growing . what would be the best option here , How the existing table can be modified to achieve this in efficient way
You definitely want to avoid scanning the table. One option is to modify your schema to a Single Table Design where you have order items and order/stock items.
For example:
pk
sk
orderDetails
stockId
...
order#ord_12234
order#ord_12234
{createdDate:xxx, dateOfDel:yyy}
...
order#ord_12234
stock#123221
23221
...
order#ord_12234
stock#234556
234556
...
order#ord_12234
stock#123231
123231
...
You can then issue the following queries, as needed:
get the order details with a query on pk=order#ord_12234, sk=order#ord_12234
get the stocks for a given order with a query on pk=order#ord_12234, sk=stock#
get everything associated with the order with a query on pk=order#ord_12234

FlexibleSearch - search products & categories

I'm trying to do a flexibleSearch to retrieve Products and his Leaf Categories. The leaf categories are the last categories that have no other subcategories, and the categories must be of type "category" and "productTypeCategory". I try to make some joins but I failed. I appreciate any help!
I would suggest something like this:
SELECT {c.code}, {c.itemtype}, {ccr.source} FROM
{Category as c LEFT JOIN CategoryCategoryRelation as ccr ON {c.pk}={ccr.source}
JOIN ComposedType as ct ON {c.itemtype} = {ct.pk}}
WHERE {ccr.source} IS NULL AND {ct.code} = 'Category'
At least, it should be a beginning. It will return the LEAF cats of type Category.
The rest is Joining with Products and probably considering catalog versions. Depending on the concrete use case you haven't provided, it is maybe better to ask Solr for the final result and you use the category result of my query to pass to the solr query, instead of implementing the additional JOINs, but that depends on if you're in the storefront or if you need the info for a backoffice/cronjob etc.

How to insert an array of strings in javascript into PostgreSQL

I am building an API server which accepts file uploads using multer.
I need to store an array of all the paths to all files uploaded for each request to a column in the PostgreSQL database which I have connected to the server.
Say I have a table created with the following query
CREATE TABLE IF NOT EXISTS records
(
id SERIAL PRIMARY KEY,
created_on TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by INTEGER,
title VARCHAR NOT NULL,
type VARCHAR NOT NULL
)
How do I define a new column filepaths on the above table where I can insert a javascript string array (ex: ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']).
Also how do I retrive, update/edit the list in javascript using node-postgres
You have 2 options:
use json or jsonb type. In the case string to insert will look:
'["path-to-file-1", "path-to-file-2", "path-to-file-3"]'
I would prefer jsonb - it allows to have good indexes. Json is rather just text with some additional built-in functions.
Use array of text - something like filepaths text[]. To insert you can use:
ARRAY ['path-to-file-1', 'path-to-file-2', 'path-to-file-3']
or
'{path-to-file-1,path-to-file-2,path-to-file-3,"path to file 4"}'
You need to use " here only for elements that contain space and so on. But you fill free to use it for all elements too.
You can create a file table that has a path column and a foreign key reference to the record that it belongs to. This way you can store the path as just a text column instead of storing an array in a column, which is better practice for relational databases. You'll also be able to store additional information on a file if you need to later. And it'll be more simple to interact with the file path records since you'd add a new file path by just inserting a new row into the file table (with the appropriate foreign key) and remove by deleting a row from the file table.
For example:
CREATE TABLE IF NOT EXISTS file (
record_id integer NOT NULL REFERENCES records(id) ON DELETE CASCADE,
path text NOT NULL
);
Then to get all the files for a record you can join the two tables together and convert to an array if you want.
For example:
SELECT
records.*,
ARRAY (
SELECT
file.path
FROM
file
WHERE
records.id = file.record_id
) AS file_paths
FROM
records;
Sample input (using only the title field of records):
INSERT INTO records (title) VALUES ('A'), ('B'), ('C');
INSERT INTO file (record_id, path) VALUES (1, 'patha1'), (1, 'patha2'), (1, 'patha3'), (2, 'pathb1');
Sample output:
id | title | file_paths
----+-------+------------------------
1 | A | {patha1,patha2,patha3}
2 | B | {pathb1}
3 | C | {}

Azure CosmosDB: how to ORDER BY id?

Using a vanilla CosmosDB collection (all default), adding documents like this:
{
"id": "3",
"name": "Hannah"
}
I would like to retrieve records ordered by id, like this:
SELECT c.id FROM c
ORDER BY c.id
This give me the error Order-by item requires a range index to be defined on the corresponding index path.
I expect this is because /id is hash indexed and not range indexed. I've tried to change the Indexing Policy in various ways, but any change I make which would touch / or /id gets wiped when I save.
How can I retrieve documents ordered by ID?
The best way to do this is to store a duplicate property e.g. id2 that has the same value of id, and is indexed using a range index, then use that for sorting, i.e. query for SELECT * FROM c ORDER BY c.id2.
PS: The reason this is not supported is because id is part of a composite index (which is on partition key and row key; id is the row key part) The Cosmos DB team is working on a change that will allow sorting by id.
EDIT: new collections now support ORDER BY c.id as of 7/12/19
I found this page CosmosDB Indexing Policies , which has the below Note that may be helpful:
Azure Cosmos DB returns an error when a query uses ORDER BY but
doesn't have a Range index against the queried path with the maximum
precision.
Some other information from elsewhere in the document:
Range supports efficient equality queries, range queries (using >, <,
>=, <=, !=), and ORDER BY queries. ORDER By queries by default also require maximum index precision (-1). The data type can be String or
Number.
Some guidance on types of queries assisted by Range queries:
Range Range over /prop/? (or /) can be used to serve the following
queries efficiently:
SELECT FROM collection c WHERE c.prop = "value"
SELECT FROM collection c WHERE c.prop > 5
SELECT FROM collection c ORDER BY c.prop
And a code example from the docs also:
var rangeDefault = new DocumentCollection { Id = "rangeCollection" };
// Override the default policy for strings to Range indexing and "max" (-1) precision
rangeDefault.IndexingPolicy = new IndexingPolicy(new RangeIndex(DataType.String) { Precision = -1 });
await client.CreateDocumentCollectionAsync(UriFactory.CreateDatabaseUri("db"), rangeDefault);
Hope this helps,
J

Index multiple MongoDB fields, make only one unique

I've got a MongoDB database of metadata for about 300,000 photos. Each has a native unique ID that needs to be unique to protect against duplication insertions. It also has a time stamp.
I frequently need to run aggregate queries to see how many photos I have for each day, so I also have a date field in the format YYYY-MM-DD. This is obviously not unique.
Right now I only have an index on the id property, like so (using the Node driver):
collection.ensureIndex(
{ id:1 },
{ unique:true, dropDups: true },
function(err, indexName) { /* etc etc */ }
);
The group query for getting the photos by date takes quite a long time, as one can imagine:
collection.group(
{ date: 1 },
{},
{ count: 0 },
function ( curr, result ) {
result.count++;
},
function(err, grouped) { /* etc etc */ }
);
I've read through the indexing strategy, and I think I need to also index the date property. But I don't want to make it unique, of course (though I suppose it's fine to make it unique in combine with the unique id). Should I do a regular compound index, or can I chain the .ensureIndex() function and only specify uniqueness for the id field?
MongoDB does not have "mixed" type indexes which can be partially unique. On the other hand why don't you use _id instead of your id field if possible. It's already indexed and unique by definition so it will prevent you from inserting duplicates.
Mongo can only use a single index in a query clause - important to consider when creating indexes. For this particular query and requirements I would suggest to have a separate unique index on id field which you would get if you use _id. Additionally, you can create a non-unique index on date field only. If you run query like this:
db.collection.find({"date": "01/02/2013"}).count();
Mongo will be able to use index only to answer the query (covered index query) which is the best performance you can get.
Note that Mongo won't be able to use compound index on (id, date) if you are searching by date only. You query has to match index prefix first, i.e. if you search by id then (id, date) index can be used.
Another option is to pre aggregate in the schema itself. Whenever you insert a photo you can increment this counter. This way you don't need to run any aggregation jobs. You can also run some tests to determine if this approach is more performant than aggregation.

Resources