Distinct values in Azure Search Suggestions? - azure

I am offloading my search feature on a relational database to Azure Search. My Products tables contains columns like serialNumber, PartNumber etc.. (there can be multiple serialNumbers with the same partNumber).
I want to create a suggestor that can autocomplete partNumbers. But in my scenario I am getting a lot of duplicates in the suggestions because the partNumber match was found in multiple entries.
How can I solve this problem ?

The Suggest API suggests documents, not queries. If you repeat the partNumber information for each serialNumber in your index and then suggest based on partNumber, you will get a result for each matching document. You can see this more clearly by including the key field in the $select parameter. Azure Search will eliminate duplicates within the same document, but not across documents. You will have to do that on the client side, or build a secondary index of partNumbers just for suggestions.
See this forum thread for a more in-depth discussion.
Also, feel free to vote on this UserVoice item to help us prioritize improvements to Suggestions.

I'm facing this problem myself. My solution does not involve a new index (this will only get messy and cost us money).
My take on this is a while-loop adding 'UserIdentity' (in your case, 'partNumber') to a filter, and re-search until my take/top-limit is met or no more suggestions exists:
public async Task<List<MachineSuggestionDTO>> SuggestMachineUser(string searchText, int take, string[] searchFields)
{
var indexClientMachine = _searchServiceClient.Indexes.GetClient(INDEX_MACHINE);
var suggestions = new List<MachineSuggestionDTO>();
var sp = new SuggestParameters
{
UseFuzzyMatching = true,
Top = 100 // Get maximum result for a chance to reduce search calls.
};
// Add searchfields if set
if (searchFields != null && searchFields.Count() != 0)
{
sp.SearchFields = searchFields;
}
// Loop until you get the desired ammount of suggestions, or if under desired ammount, the maximum.
while (suggestions.Count < take)
{
if (!await DistinctSuggestMachineUser(searchText, take, searchFields, suggestions, indexClientMachine, sp))
{
// If no more suggestions is found, we break the while-loop
break;
}
}
// Since the list might me bigger then the take, we return a narrowed list
return suggestions.Take(take).ToList();
}
private async Task<bool> DistinctSuggestMachineUser(string searchText, int take, string[] searchFields, List<MachineSuggestionDTO> suggestions, ISearchIndexClient indexClientMachine, SuggestParameters sp)
{
var response = await indexClientMachine.Documents.SuggestAsync<MachineSearchDocument>(searchText, SUGGESTION_MACHINE, sp);
if(response.Results.Count > 0){
// Fix filter if search is triggered once more
if (!string.IsNullOrEmpty(sp.Filter))
{
sp.Filter += " and ";
}
foreach (var result in response.Results.DistinctBy(r => new { r.Document.UserIdentity, r.Document.UserName, r.Document.UserCode}).Take(take))
{
var d = result.Document;
suggestions.Add(new MachineSuggestionDTO { Id = d.UserIdentity, Namn = d.UserNamn, Hkod = d.UserHkod, Intnr = d.UserIntnr });
// Add found UserIdentity to filter
sp.Filter += $"UserIdentity ne '{d.UserIdentity}' and ";
}
// Remove end of filter if it is run once more
if (sp.Filter.EndsWith(" and "))
{
sp.Filter = sp.Filter.Substring(0, sp.Filter.LastIndexOf(" and ", StringComparison.Ordinal));
}
}
// Returns false if no more suggestions is found
return response.Results.Count > 0;
}

public async Task<List<string>> SuggestionsAsync(bool highlights, bool fuzzy, string term)
{
SuggestParameters sp = new SuggestParameters()
{
UseFuzzyMatching = fuzzy,
Top = 100
};
if (highlights)
{
sp.HighlightPreTag = "<em>";
sp.HighlightPostTag = "</em>";
}
var suggestResult = await searchConfig.IndexClient.Documents.SuggestAsync(term, "mysuggestion", sp);
// Convert the suggest query results to a list that can be displayed in the client.
return suggestResult.Results.Select(x => x.Text).Distinct().Take(10).ToList();
}
After getting top 100 and using distinct it works for me.

You can use the Autocomplete API for that where does the grouping by default. However, if you need more fields together with the result, like, the partNo plus description it doesn't support it. The partNo will be distinct though.

Related

What would be the reason that I can't make the ElementIDs of these objects in Revit match ones in a Revit file?

I am creating a plugin that makes use of the code available from BCFier to select elements from an external server version of the file and highlight them in a Revit view, except the elements are clearly not found in Revit as all elements appear and none are highlighted. The specific pieces of code I am using are:
private void SelectElements(Viewpoint v)
{
var elementsToSelect = new List<ElementId>();
var elementsToHide = new List<ElementId>();
var elementsToShow = new List<ElementId>();
var visibleElems = new FilteredElementCollector(OpenPlugin.doc, OpenPlugin.doc.ActiveView.Id)
.WhereElementIsNotElementType()
.WhereElementIsViewIndependent()
.ToElementIds()
.Where(e => OpenPlugin.doc.GetElement(e).CanBeHidden(OpenPlugin.doc.ActiveView)); //might affect performance, but it's necessary
bool canSetVisibility = (v.Components.Visibility != null &&
v.Components.Visibility.DefaultVisibility &&
v.Components.Visibility.Exceptions.Any());
bool canSetSelection = (v.Components.Selection != null && v.Components.Selection.Any());
//loop elements
foreach (var e in visibleElems)
{
//string guid = ExportUtils.GetExportId(OpenPlugin.doc, e).ToString();
var guid = IfcGuid.ToIfcGuid(ExportUtils.GetExportId(OpenPlugin.doc, e));
Trace.WriteLine(guid.ToString());
if (canSetVisibility)
{
if (v.Components.Visibility.DefaultVisibility)
{
if (v.Components.Visibility.Exceptions.Any(x => x.IfcGuid == guid))
elementsToHide.Add(e);
}
else
{
if (v.Components.Visibility.Exceptions.Any(x => x.IfcGuid == guid))
elementsToShow.Add(e);
}
}
if (canSetSelection)
{
if (v.Components.Selection.Any(x => x.IfcGuid == guid))
elementsToSelect.Add(e);
}
}
try
{
OpenPlugin.HandlerSelect.elementsToSelect = elementsToSelect;
OpenPlugin.HandlerSelect.elementsToHide = elementsToHide;
OpenPlugin.HandlerSelect.elementsToShow = elementsToShow;
OpenPlugin.selectEvent.Raise();
} catch (System.Exception ex)
{
TaskDialog.Show("Exception", ex.Message);
}
}
Which is the section that should filter the lists, which it does do as it produces IDs that look like this:
3GB5RcUGnAzQe9amE4i4IN
3GB5RcUGnAzQe9amE4i4Ib
3GB5RcUGnAzQe9amE4i4J6
3GB5RcUGnAzQe9amE4i4JH
3GB5RcUGnAzQe9amE4i4Ji
3GB5RcUGnAzQe9amE4i4J$
3GB5RcUGnAzQe9amE4i4GD
3GB5RcUGnAzQe9amE4i4Gy
3GB5RcUGnAzQe9amE4i4HM
3GB5RcUGnAzQe9amE4i4HX
3GB5RcUGnAzQe9amE4i4Hf
068MKId$X7hf9uMEB2S_no
The trouble with this is, comparing it to the list of IDs in the IFC file that we imported it from reveals that these IDs do not appear in the IFC file, and looking at it in Revit I found that none of the Guids in Revit weren't in the list that appeared either. Almost all the objects also matched the same main part of the IDs as well, and I'm not experienced enough to know how likely that is.
So my question is, is it something in this code that is an issue?
The IFC GUID is based on the Revit UniqueId but not identical. Please read about the Element Identifiers in RVT, IFC, NW and Forge to learn how they are connected.

Security - The view and edit id is visible in the address bar

CakePHP Version 3.5.5
The id is visible in the address bar for view and edit which for my application creates a security risk. Any logged in user at the same company can change the id in the address bar and view or edit the details
of users they are not allowed to.
IE: https://localhost/crm/users/edit/1378 can be manually changed in the address bar to https://localhost/crm/users/edit/1215 and entered. This would display the details of user 1215 which is not allowed.
To overcome this I am selecting the ids which the user is allowed to edit and checking that the id from the url is one of these ids with the following code:
public function view($id = null)
{
if ($this->request->is('get')) {
// Select the permitted ids.
if (superuser) { // example to explain only
$query = $this->Users->find()
->where(['companyid' => $cid])
->andWhere(['status' => 1])
->toArray();
}
elseif (manager) { // example to explain only
$query = $this->Users->find()
->where(['areaid' => $areaid])
->andWhere(['status' => 1])
->toArray();
}
elseif (team leader) { // example to explain only
$query = $this->Users->find()
->where(['teamid' => $teamid])
->andWhere(['status' => 1])
->toArray();
}
// Check if the edit id is in the array of permitted ids.
$ids = array_column($query, 'id');
$foundKey = array_search($id, $ids);
// If the edit id is not in the array of permitted ids redirect to blank.
if (empty($foundKey)) {
// Handle error.
}
$user = $this->Users->get($id);
$this->set('user', $user);
$this->set('_serialize', ['user']);
}
else {
// Handle error.
}
}
My question: Is the above code the best cake way of achieving this or is there a better way to do it?
This code does work but because it's to do with security I'd appreciate any input which would improve it or point out it's weakness/es.
/////////////////////////////////////////////////////////////////////////////
As requested by cgTag please see below.
My app has superusers, managers, team leaders and users.
Managers manage one area which can contain many teams.
Team Leaders lead one team and must belong to an area.
Users are assigned to an area or a team.
For example:
Area is UK
Team is England
Team is Scotland
Team is Wales
Area is USA
Team is Florida
Team is California
Team is Texas
On index - superusers see all the superusers, managers, team leaders and users in the company.
On index - managers see themself and users in their area, team leaders in their area and users in the teams.
On index - team leaders see themself and users in their team
My problem is say the manager of area UK clicks edit on one of the records and that record is displayed with a url of https://localhost/crm/users/edit/1378
Then say this disgruntled manager makes a guess and changes the url to https://localhost/crm/users/edit/1215 and submits it then this record is displayed. (This record could be anyone, a superuser, another manager, a team leader who is not in their area or a user not in their area.
This manager could then change say the email address and submit this and it's this type of situation that I need to protect against.
My fix is to reiterate the find for the superuser, manager and team leader I've done on index in the view and edit class. This ensures that say a manager can only view or edit someone in their area.
Hopefully I've explained it well enough but if not just let me know and I'll have another go.
Thanks. Z.
/////////////////////////////////////////////////////////////////////////////
Thanks cgTag, I feel a lot more confident with this approach but I cannot use this code because you have correctly assumed that I am using an id to select all the companies results but I'm using a 40 char string. I do this so I can make my sql queries more robust.
It's impossible for you to help me unless you have all the info required so I have posted an accurate representation below:
public function view($id = null)
{
if(!$this->request->is('get') || !$id) {
//throw new ForbiddenException();
echo 'in request is NOT get or id NOT set ' . '<hr />';
}
$user_id = $this->Auth->user('id');
// regular users can never view other users.
if($user_id !== $id) {
//throw new ForbiddenException();
echo 'in $user_id !== $id ' . '<hr />';
}
// Declare client id 1.
if ($this->cid1() === false) {
echo 'in throw exception ' . '<hr />';
}
else {
$c1 = null;
$c1 = $this->cid1();
}
$company_ids = $this->getCompanyIds($c1);
$area_ids = $this->getAreaIds($user_id, $c1);
$team_ids = $this->getTeamIds($user_id, $c1);
// company_id does not exist which will cause an unknown column error.
// The column I select by is cid_1 so I have changed this column to cid_1 as shown below.
$user = $this->Users->find()
->where([
'id' => $id,
'cid_1 IN' => $company_ids,
'area_id IN' => $area_ids,
'team_id IN' => $team_ids,
'status' => 1
])
->firstOrFail();
$this->set(compact('user'));
}
The functions:
public function cid1()
{
$session = $this->request->session();
if ($session->check('Cid.one')) {
$c1 = null;
$c1 = $session->read('Cid.one');
if (!is_string($c1) || is_numeric($c1) || (strlen($c1) !== 40)) {
return false;
}
return $c1;
}
return false;
}
public function getCompanyIds($c1 = null)
{
$query = $this->Users->find()
->where(['status' => 1])
->andWhere(['cid_1' => $c1]);
return $query;
}
public function getAreaIds($c1 = null, $user_id = null)
{
$query = $this->Users->find()
->where(['status' => 1])
->andWhere(['cid_1' => $c1])
->andWhere(['area_id' => $user_id]);
return $query;
}
public function getTeamIds($c1 = null, $user_id = null)
{
$query = $this->Users->find()
->where(['status' => 1])
->andWhere(['cid_1' => $c1])
->andWhere(['team_id' => $user_id]);
return $query;
}
With this code I get the following error:
Error: SQLSTATE[21000]: Cardinality violation: 1241 Operand should contain 1 column(s)
I don't know if your example will work with this new information but at least you have all the information now.
If it can be ammended great but if not I really don't mind. And I do appreciate the time you've put aside to try to help.
Thanks Z
/////////////////////////////////////////////////////////////////////////////
#tarikul05 - Thanks for the input.
Your suggestion is very similar to my first effort at addressing this security issue but I went for security through obscurity and hid the id in a 80 char string, example below.
// In a cell
public function display($id = null)
{
// Encrypt the id to pass with view and edit links.
$idArray = str_split($id);
foreach($idArray as $arrkey => $arrVal) {
$id0 = "$idArray[0]";
$id1 = "$idArray[1]";
$id2 = "$idArray[2]";
$id3 = "$idArray[3]";
}
// Generate string for the id to be obscured in.
$enc1 = null;
$enc1 = sha1(uniqid(mt_rand(), true));
$enc2 = null;
$enc2 = sha1(uniqid(mt_rand(), true));
$encIdStr = $enc1 . $enc2;
// Split the string.
$encIdArray = null;
$encIdArray = str_split($encIdStr);
// Generate the coded sequence.
$codedSequence = null;
$codedSequence = array(9 => "$id0", 23 => "$id1", 54 => "$id2", 76 => "$id3");
// Replace the id in the random string.
$idTemp = null;
$idTemp = array_replace($encIdArray, $codedSequence);
// Implode the array.
$encryptedId = null;
$encryptedId = implode("",$idTemp);
// Send the encrypted id to the view.
$this->set('encryptedId', $encryptedId);
}
And then decrypted with
// In function in the app controller
public function decryptTheId($encryptedId = null)
{
$idArray = str_split($encryptedId);
foreach($idArray as $arrkey => $arrVal) {
$id0 = "$idArray[9]";
$id1 = "$idArray[23]";
$id2 = "$idArray[54]";
$id3 = "$idArray[76]";
}
$id = null;
$id = $id0.$id1.$id2.$id3;
return $id;
}
The problem with this was that when testing I managed to get the script to error which revealed the array positions which would of undermined the security by obscurity principle and made it a lot easier for a hacker.
Your suggestion is neater than my obscurity method but I believe md5 has been cracked therefore it should not be used.
I'm no security expert but in my opinion checking the view and edit id against an array of permitted ids is the most secure way to address this.
Maybe I'm wrong but if I do it this way there's is no way a hacker no matter what they try in the address bar can see or edit data they are not meant to and it keeps the url cleaner.
What I was originally looking/hoping for was a Cake method/function which addressed this but I couldn't find anything in the cookbook.
Thanks anyway. Z.
I would simplify your code so that the SQL that fetches the user record only finds that record if the current user has permissions. When you're dependent upon associated data for those conditions. Follow this approach even if you have to use joins.
You create the SQL conditions and then call firstOrFail() on the query. This throws a NotFoundException if there is no match for the record.
public function view($id = null) {
if(!$this->request->is('get') || !$id) {
throw new ForbiddenException();
}
$user_id = $this->Auth->user('id');
// regular users can never view other users.
if($user_id !== $id) {
throw new ForbiddenException();
}
$company_ids = $this->getCompanyIds($user_id);
$area_ids = $this->getAreaIds($user_id);
$team_ids = $this->getTeamIds($user_id);
$user = $this->Users->find()
->where([
'id' => $id
'company_id IN' => $company_ids,
'area_id IN' => $area_ids,
'team_id IN' => $team_ids,
'status' => 1
])
->firstOrFail();
$this->set(compact('user'));
}
The above logic should be sound when a user belongsTo a hierarchical structure of data. Where by, they can view many users but only if those users belong to one of the upper associations they have access too.
It works because of the IN clause of the where conditions.
Note: The IN operator throws an error if the array is empty. When you have users who can see all "teams" just exclude that where condition instead of using an empty array.
The key here is to have functions which return an array of allowed parent associations such as; getCompanyIds($user_id) would return just the company IDs the current user is allowed access too.
I think if you implement it this way then the logic is easy to understand, the security is solid and a simple firstOrFail() prevents access.

Sitecore HOWTO: Search item bucket for items with specific values

I have an item bucket with more then 30 000 items inside. What I need is to quickly search items that have particular field set to particular value, or even better is to make something like SELECT WHERE fieldValue IN (1,2,3,4) statement. Are there any ready solutions?
I searched the web and the only thing I found is "Developer's Guide to Item
Buckets and Search" but there is no code examples.
You need something like this. The Bucket item is an IIndexable so it can be searched using Sitecore 7 search API.
This code snippet below can easily be adapted to meet your needs and it's just a question of modifying the where clause.if you need any further help with the sitecore 7 syntax just write a comment on the QuickStart blog post below and I'll get back to you.
var bucketItem = Sitecore.Context.Database.GetItem(bucketPath);
if (bucketItem != null && BucketManager.IsBucket(bucketItem))
{
using (var searchContext = ContentSearchManager.GetIndex(bucketItem as IIndexable).CreateSearchContext())
{
var result = searchContext.GetQueryable<SearchResultItem().Where(x => x.Name == itemName).FirstOrDefault();
if(result != null)
Context.Item = result.GetItem();
}
}
Further reading on my blog post here:
http://coreblimey.azurewebsites.net/sitecore-7-search-quick-start-guide/
Using Sitecore Content Editor:
Go to the bucket item then In search tab, start typing the following (replace fieldname and value with actual field name and value):
custom:fieldname|value
Then hit enter, you see the result of the query, you can multiple queries at once if you want.
Using Sitecore Content Search API:
using Sitecore.ContentSearch;
using Sitecore.ContentSearch.Linq;
using Sitecore.ContentSearch.SearchTypes;
using Sitecore.ContentSearch.Linq.Utilities
ID bucketItemID = "GUID of your bucket item";
ID templateID = "Guid of your item's template under bucket";
string values = "1,2,3,4,5";
using (var context = ContentSearchManager.GetIndex("sitecore_web_index").CreateSearchContext())
{
var predicate = PredicateBuilder.True<SearchResultItem>();
predicate = PredicateBuilder.And(item => item.TemplateId == new ID(templateID)
&& item.Paths.Contains(bucketItemID));
var innerPredicate = PredicateBuilder.False<SearchResultItem>();
foreach(string val in values.Split(','))
{
innerPredicate = PredicateBuilder.False<SearchResultItem>();
innerPredicate = innerPredicate.Or(item => item["FIELDNAME"] == val);
}
predicate = predicate.And(innerPredicate);
var result = predicate.GetResults();
List<Item> ResultsItems = new List<Item>();
foreach (var hit in result.Hits)
{
Item item = hit.Document.GetItem();
if(item !=null)
{
ResultsItems .Add(item);
}
}
}
The following links can give good start with the Search API:
http://www.fusionworkshop.co.uk/news-and-insight/tech-lab/sitecore-7-search-a-quickstart-guide#.VPw8AC4kWnI
https://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/06/sitecore-7-poco-explained.aspx
https://www.sitecore.net/learn/blogs/technical-blogs/sitecore-7-development-team/posts/2013/05/sitecore-7-predicate-builder.aspx
Hope this helps!

Which algorithm to find the only one duplicate word in a string?

This is very common interview question:
There's a all-english sentence which contains only a duplicate word, for example:
input string: today is a good day is true
output: is
I have an idea:
Read every character from the string, using some hash function to compute the hash value until get a space(' '), then put that hash value in a hash-table.
Repeat Step 1 until the end of the string, if there's duplicate hash-value, then return that word, else return null.
Is that practical?
Your approach is reasonable(actually the best I can think of). Still take into account the fact that a collision may appear. Even if the hashes are the same, compare the words.
It would work, but you can make your life a lot easier.
Are you bound to a specific programming language?
If you code in c# for example, i would suggest you use the
String.Split function (and split by " ") to transform your sentence into a list of words. Then you can easily find duplicates by using LINQ (see How to get duplicate items from a list using LINQ?) or by iterating through your list.
You can use the Map() function, and also return how many times the duplicate word is found in the string.
var a = 'sometimes I feel clever and sometimes not';
var findDuplicateWord = a => {
var map = new Map();
a = a.split(' ');
a.forEach(e => {
if (map.has(e)) {
let count = map.get(e);
map.set(e, count + 1);
} else {
map.set(e, 1);
}
});
let dupe = [];
let hasDupe = false;
map.forEach((value, key) => {
if (value > 1) {
hasDupe = true;
dupe.push(key, value);
}
});
console.log(dupe);
return hasDupe;
};
findDuplicateWord(a);
//output
/* Native Browser JavaScript
[ 'sometimes', 2 ]
=> true */

ServiceStack.Text: Use Linq and the ConvertAll

Iam using the ServiceStack.Text JsonObject parser to map into my domain model. I basically have anthing working, except when using Linq to filter on ArrayObject and the try to convert it using convertAll. Iam cannot come arround actuall after using link, adding element by element to an JsonArrayObjects list and then pass it.
var tmpList = x.Object("references").ArrayObjects("image").Where(y => y.Get<int>("type") != 1).ToList();
JsonArrayObjects tmpStorage = new JsonArrayObjects();
foreach (var pic in tmpList) {
tmpStorage.Add(pic);
}
if (tmpStorage.Count > 0) {
GalleryPictures = tmpStorage.ConvertAll(RestJsonToModelMapper.jsonToImage);
}
Question:
Is there a more elegant way to get from IEnumarable back to JsonArrayObjects?
Casting will not work, since where copys elements into a list, instead of manipulating the old one, therefor the result is not an downcasted JsonArrayObjects, rather a new List object.
Best
Considering this more elegant is arguable, but I would probably do:
var tmpStorage = new JsonArrayObjects();
tmpList.ForEach(pic => tmpStorage.Add(RestJsonToModelMapper.jsonToImage(pic)));
And if this kind of conversion is used frequently, you may create an extension method:
public static JsonArrayObjects ToJsonArrayObjects(this IEnumerable<JsonObject> pics)
{
var tmpStorage = new JsonArrayObjects();
foreach(var pic in pics)
{
tmpStorage.Add(RestJsonToModelMapper.jsonToImage(pic));
}
return tmpStorage;
}
This way you would end up with simpler consumer code:
var tmpStorage = x.Object("references")
.ArrayObjects("image")
.Where(y => y.Get<int>("type") != 1)
.ToJsonArrayObjects();
Like this?
var pictures = x.Object("references")
.ArrayObjects("image")
.Where(y => y.Get<int>("type") != 1)
.Select(RestJsonToModelMapper.JsonToImage)
.ToList();

Resources