SphinxQL - Perfect Search Mode for 1 to 4 words - search

I'm working on a project with Sphinx Search Engine using SphinxQL. My problem is the following:
This is my sphinxql query:
"SELECT *, country FROM all_gebrauchte_products WHERE MATCH('#searchtext (".$searchQuery.")') AND country='".$where."' ORDER BY WEIGHT() DESC LIMIT ".$page.", ".$limit." OPTION ranker=expr('sum(lcs)')"
The result differs very strongly like:
Honda => 50 results
Honda CBR => 9 results
Honda CBR 1000 => 2 results
This is my MySQL query:
SELECT COUNT(*) FROM all_gebrauchte_products WHERE MATCH(gebr_id, gebr_hersteller, gebr_modell, gebr_ukat, gebr_kat, gebr_bemerkung) AGAINST ('".$searchQuery."' IN BOOLEAN MODE);
The results is:
Honda => 67 results
Honda CBR => 67 results
Honda CBR 1000 => 84 results
The MySQL query works in Boolean Mode - so query for Honda CBR 1000 also finds Honda VTR 1000, as I think...
So, what would be the best search mode to come near to the second result set?
Can anybody explain me what would be the perfect mode and how (example) to write the sphinxql query in the right way?
Thnx. in advance...

The main difference is MySQLs 'Boolean' modes implicit operator is OR. A multi word query just requires one of the words (unless prefix with +)
But Sphinxes 'extended match' mode (which is what sphinxQL uses), the implicit operator is AND. So its requiring ALL the words.
Could use he quorum operator to get a default 'OR' behaviour
... MATCH('#searchtext (\"".$searchQuery."\"/1)') ...
ie only one of the words is required.
--
The MySQL query works in Boolean Mode - so query for Honda CBR 1000 also finds Honda VTR 1000, as I think...
Well yes. But because just one word required, also finding all the documents with say '1000' them, even if not Honda. Which is why the three word has more documents.

Related

DAX measure that returns the distinct number of values that have duplicates within the table

Fairly new and self-taught with DAX. I run an accuracy log that tracks incoming applications (Application[Application_ID]) and errors committed in processing that application (Error_Log[Application_ID]).
I want to find the number of applications that contain more than one error. For example, if 10 applications have errors, 6 of those applications have 1 error and the rest have 2 or more errors, I want to return the value of 4.
I'm trying to avoid a calculated column (like a "Multiple_Errors" TRUE/FALSE column) as it's refresh times are already longer than I'd like, but if it's unavoidable, it could be accommodated.
We were able to build an Excel formula with SUMPRODUCT for a very high level summary of the information, but I want more granularity than that formula can give me.
The online search has only led to finding articles on how to count the number of duplicates, flag the duplicates, remove duplicates or some other task, where I need to count a distinct number of values that have been duplicated within a table.
I have tried a few different DAX measures, but all of them have yielded incorrect results. For example...
=
CALCULATE (
DISTINCTCOUNT ( Error_Log[Appplication_ID] ),
FILTER ( Error_Log, COUNTA ( Error_Log[Appplication_ID] ) > 1 )
)
Drilling down into this result shows that all of the applications with errors are being pulled over, rather than only those with greater than one error.
After playing with a few options, I haven't been able to find the solution. Any help/pointers/direction would be greatly appreciated!
I think you are looking for something like this:
Measure =
COUNTROWS (
FILTER (
SUMMARIZE (
Error_Log,
Error_Log[Application_ID],
"count", COUNTROWS ( Error_Log )
),
[count] > 1
)
)
The SUMMARIZE function returns a virtual summarized table, with the count of each Application_ID in a column called "count". The outer COUNTROWS function then returns the number of rows in the virtual table where [count] is greater then 1.
Your measure is fine and works as defined. Please see the attached screen.
App ID 100 has 4 Type 1 errors, 101 has 2 Type 2 and 1 Type 3 errors but because of the distinct count, they have 1 each.
102 has single Type 3 but we are using Error Type to group the log, Type 3 show two counts (1 each for 102 and 101)
Note that COUNTA ( Error_Log[Appplication_ID] ) > 1 condition has been satisfied for 102 also because of grouping column.
We do not see Type 6 in the pivot table at the right because of COUNTA ( Error_Log[Appplication_ID] ) > 1.
So, although the measure works, we might miss interpreting the result or we might miss to use correct DAX for the requirement.

How to do a calculation by evaluating several conditions?

I have this table in access called invoices.
It has a field that performs calculations based on two fields. that is field_price*field_Quantity. Now I added two new columns containing the unit of measure for field_weight and field_quantity named field_priceUnit and field_quantityUnit.
Now instead of just taking the multiplication to perform the calculation, I want it to see if the units of measures match, it doesn't match then it should do a convertion of the field_quantity into the unit of measure of field_priceUnit.
example:
Row1:
ID:1|Field_Quantity:23|field_quantityUnit:LB|Field_weight:256|field_priceunit:KG|field_price:24| Calculated_Column:
the calculated_column should do the calculation this way.
1. if field_quantityunit=LB and field_priceunit=LB then field_quantity*field_price
else
if field_quantityUnit=LB and field_priceUnit=KG
THEN ((field_quantity/0.453592)*field_price) <<
Please help me.
I have to do this for multiple conditions.
Field_priceunit may have values as LB,KG, and MT
Field_quantityUnit may have field as LB,KG, and MT
if both units don't match, then I want to do the conversion and calculate based on the new convetion as seen in the example.
Thank you
The following formula should get you running if your units are only lb and kg and you only have to check one direction:
iif(and(field_quantityunit='LB', field_priceunit='LB'), field_quantity*field_price, (field_quantity/0.453592)*field_price)
This doesn't scale well though as you may have to convert field_price or you may add other units. This iif formula will grow WAY out of hand quickly.
Instead create a new table called unit_conversion or whatever you like:
unit | conversion
lb | .453592
kg | 1
g | 1000
mg | 1000000
Now in your query join:
LEFT OUTER JOIN unit_conversion as qty_conversion
ON field_quantityunit = qty_conversion.unit
LEFT OUTER JOIN unit_conversion as price_conversion
On field_priceUnit = price_conversion.unit
Up in your SELECT portion of the query you can now just do:
(field_quantity * qty_conversion.conversion) * (field_price * price_conversion.conversion)
And you don't have to worry what the units are. They will all convert over to a kilogram and get multiplied.
You could convert everything over to a pound or really any unit of weight here if you want but the nice thing is, you only need to add new units to your conversion table to handle them in any sql that you write this way so it's very scalable.

Vim multiple filtering of a file, with 2 filters based upon number values

I do not know if that title will sound adequate …
Let us say I have a file (> 1000 lines) with a homogeneous structure throughout consisting of three "fields" separated by a space :
1. an integer (negative or positive)
<space>
2. another integer (negative or positive)
<space>
3. some text (description)
The integers are >-10000 and < 10000
My problem is : how can I
a) filter this file with criteria such as "1st integer <= 1000" AND "2nd integer >=250" AND "text contains : Boston OR New-York"
b) and put the subset in a new buffer, allowing me to read the results and only the results of the filter(s) ?
I wish to do that with Vim only, not knowing if it is feasible or reasonable (anyway it is above my skills)
Thanks
#FDinoff : sorry, I should have done what you suggest, of course :
It could be a chronology with a StartDate, an EndDate, and a Description :
1 -200 -50 Period one in Italy
2 -150 250 Period one in Greece
3 -50 40 Period two in Italy
4 10 10 Some event in Italy
5 20 20 Event two in Greece
The filter could be : Filter the items where (to mimic SQL) StartDate <=-50 AND EndDate >=0 AND Description contains Greece, with a resulting filter => line 2
The following generic form will match the numeric parts of your format:
^\s*-\?\d\+\s\+-\?\d\+
To implement restrictions on the numbers, replace each -\?\d\+ with a more specific pattern. For example, for <= -50:
-\([5-9][0-9]\|[1-9][0-9]\{2,}\)
That is, - followed by either a 2 digit number where the first digit is >= 5, or a >= 3 digit number.
Similarly, for >= 250:
\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)
Combining the two:
^\s*-\([5-9][0-9]\|[1-9][0-9]\{2,}\)\s\+\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)
If you also need to filter by some pattern in the description, append that:
^\s*-\([5-9][0-9]\|[1-9][0-9]\{2,}\)\s\+\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)\s\+.\{-}Greece
.\{-} is the lazy version of .*.
To filter by this pattern and write the output to a file, use the following:
:g/pattern/.w filename
Thus, to filter by "first number <= -50 AND second number >= 250 AND 'Greece' in description" and write the output to greece.out:
:g/^\s*-\([5-9][0-9]\|[1-9][0-9]\{2,}\)\s\+\(2[5-9][0-9]\|[3-9][0-9]\{2,}\)\s\+.\{-}Greece/.w greece.out
More complex ranges quickly make this even more ridiculous; you're probably better off parsing the file and filtering with something other than regex.

Rounding in Google Query Language

I'd like to implement a series of queries with rounding in Google Query Language, such as:
select round(age,-1), count(id) group by round(age,-1)
or any combination of int/floor/etc.
select int(age/10)*10, count(id) group by int(age/10)*10
Is there any way to do that? I suspect no, as the list of scalar functions in GQL is very limited, but do wonder if there's a workaround.
http://code.google.com/apis/chart/interactive/docs/querylanguage.html#scalar_functions
No, I dont think we can do rounding in GQL...
The link that you have shown is not for google app engine...
Add a format at the end of the query, for example:
"select age format age '0'"
See Google Query Language Reference - Format.
Although there are no explicit round or floor functions, one can implement them using the modulus operator % and the fact that value % 1 returns the decimal part of value. So value - value % 1 is equivalent to floor(value), at least for positive values. More generally value - value % k should be equivalent to floor(value / k) * k.
In your case, your query should look like:
select age - age % 10, count(id) group by age - age % 10

Twitter api - search too complex?

Any idea why Twitter is throwing this error?
GET https://search.twitter.com/search.json?q=Middle%20Tennessee%20State%20Blue%20Raiders%20Florida%20International%20Golden%20Panthers%20win%20OR%20lose%20-rt%20-from%3Aespn&&lang=en&since=2011-02-09: 403: Sorry, your query is too complex. Please reduce complexity and try again.
As of August 2011, at least, it appears max query string length is quite high. It worked fine for me with a maxlength of 300; when I moved to 500 characters I ended up with three queries, the 2nd of which failed:
q1: OK: 491 characters, 23 OR clauses
q2: FAILED: 493 characters, 39 OR clauses
q3: OK: 203 characters, 17 OR clauses
So, it might turn out to be 492+ characters, or it might be 24+ clauses. I suspect some combination.
I've not spent more time narrowing it down further, as if the guidelines were fixed then Twitter would publish them. I'm going to split my queries at 300 characters or 20 clauses, whichever comes first, and hope that is reasonably future-proof.
UPDATE: Under "Best Practices" on https://dev.twitter.com/docs/using-search they suggest limiting a query to 10 clauses. Obviously from my above results it is not the current hard limit, but 10 should be the future-proof limit I was after.
P.S. Back to the actual question, I notice you have a "&&" typo in your query. It wonder if this could have triggered the complaint? (untested idea)
On 8th February 2018:
Twitter search limites to 50 OR clauses (Notice 50 OR clauses contains 49 OR because for example word1 OR word2 contains two clauses and one OR).
Twitter search limites to 500 characters.
My test for the OR clause is:
This search works: h1 OR h2 OR h3 OR h4 OR h5 OR h6 OR h7 OR h8 OR h9 OR h10 OR h11 OR h12 OR h13 OR h14 OR h15 OR h16 OR h17 OR h18 OR h19 OR h20 OR h21 OR h22 OR h23 OR h24 OR h25 OR h26 OR h27 OR h28 OR h29 OR h30 OR h31 OR h32 OR h33 OR h34 OR h35 OR h36 OR h37 OR h38 OR h39 OR h40 OR h41 OR h42 OR h43 OR h44 OR h45 OR h46 OR h47 OR h48 OR h49 OR h50
However, if we add: OR h51, it fails
My test for the characters count is:
This search works (500 characters): hell1 OR hell2 OR hell3 OR hell4 OR hell5 OR hell6 OR hell7 OR hell8 OR hell9 OR hell10 OR hell11 OR hell12 OR hell13 OR hell14 OR hell15 OR hell16 OR hell17 OR hell18 OR hell19 OR hell20 OR hell21 OR hell22 OR hell23 OR hell24 OR hell25 OR hell26 OR hell27 OR hell28 OR hell29 OR hell30 OR hell31 OR hell32 OR hell33 OR hell34 OR hell35 OR hell36 OR hell37 OR hell38 OR hell39 OR hell40 OR hell41 OR hell42 OR hell43 OR hell44 OR hell45 OR hell46 OR hell47 OR hell48 OR hell49 OR hell50abcdefghijklm
However, if we add the n (501 characters) to the last word (hell50abcdefghijklmn), it fails.
Notice I have just realised that if we remove the character o to the word hello, the result is hell (is the World sending me a message? xD)
Read the documentation. "Queries are limited 140 URL encoded characters." You query string is 156 characters.

Resources