How to get all combinations from Sphinx search? - search

I search via Sphinx SQL as
SELECT * FROM sphinx.articles WHERE query='something everything;mode=all';
+----------+--------+-------------------------------+
| id | weight | query |
+----------+--------+-------------------------------+
| 2324266 | 2 | something everything;mode=all |
| 6997338 | 2 | something everything;mode=all |
| 12002597 | 2 | something everything;mode=all |
| 12543040 | 2 | something everything;mode=all |
| 16314547 | 2 | something everything;mode=all |
| 19094425 | 2 | something everything;mode=all |
| 21398510 | 2 | something everything;mode=all |
| 23020445 | 2 | something everything;mode=all |
| 23040584 | 2 | something everything;mode=all |
| 24059424 | 2 | something everything;mode=all |
| 26009287 | 2 | something everything;mode=all |
| 27476187 | 2 | something everything;mode=all |
| 30488694 | 2 | something everything;mode=all |
| 30698992 | 2 | something everything;mode=all |
| 33191618 | 2 | something everything;mode=all |
| 33900227 | 2 | something everything;mode=all |
| 35671048 | 2 | something everything;mode=all |
| 39324937 | 2 | something everything;mode=all |
| 40373341 | 2 | something everything;mode=all |
| 40391221 | 2 | something everything;mode=all |
+----------+--------+-------------------------------+
20 rows in set (0.233 sec)
and get the total results by
SHOW STATUS LIKE 'Sphinx_total_found';
+--------------------+--------+
| Variable_name | Value |
+--------------------+--------+
| Sphinx_total_found | 356179 |
+--------------------+--------+
1 row in set (0.004 sec)
I wonder if it is possible to get all possible combinations indexed by Sphinx?
For example, to get the number of results for all combinations of two keywords as
+-------------------+-------------------------------+
| query | total_results |
+-------------------+-------------------------------+
| something everything;mode=all | 58844 |
| word1 word2;mode=all | 11 |
| word1 word3;mode=all | 234 |
| word2 word3;mode=all | 663 |
| word2 word4;mode=all | 9115 |
+-------------------+-------------------------------+
I understand that Sphinx dynamically finds the results by the given keywords in the query, but theoretically, all indexed keywords are known, and we can make the combinations. However, it is too slow if we do the queries individually.

Related

Calculating the values and appending them as a column in the dataframe [duplicate]

This question already has answers here:
Groupby value counts on the dataframe pandas
(6 answers)
Closed 2 years ago.
I'm working on an airline dataset. I've to calculate the number of adults, children's and infants per airline_pnr number and then append those values as a column in a data frame.
Pax Type: Passenger type(Adult(ADT), Children(CHD), Infant(INF))
+-------------+----------+
| airline_pnr |Pax_Type |
+-------------+----------+
| EIPBGB | ADT |
| EIPBGB | ADT |
| EIPBGB | CHD |
| EIPBGB | INF |
| UH7EQV | ADT |
| UH7EQV | ADT |
| YVEEW | ADT |
| YVEEW | ADT |
| DR6YWR | ADT |
| DR6YWR | ADT |
| DR6YWR | ADT |
| DR6YWR | CHD |
| DR6YWR | INF |
| QJ2ESP | ADT |
| QJ2ESP | CHD |
| JL6E9T | ADT |
| VGYD5V | ADT |
| YVEG1 | ADT |
| YVEG1 | ADT |
+-------------+----------+
Expected output:
+--------+----------+--------------+-----------------+---------------+
|air_pnr | Pax Type | no_of_adults | no_of_childrens | no_of_infants |
+--------+----------+--------------+-----------------+---------------+
| EIPBGB | ADT | 2 | 1 | 1 |
| UH7EQV | ADT | 2 | 0 | 0 |
| YVEEW | ADT | 2 | 0 | 0 |
| DR6YWR | ADT | 3 | 1 | 1 |
| QJ2ESP | ADT | 1 | 1 | 0 |
| JL6E9T | ADT | 1 | 0 | 0 |
| VGYD5V | ADT | 1 | 0 | 0 |
| YVEG1 | ADT | 2 | 0 | 0 |
+--------+----------+--------------+-----------------+---------------+
My Efforts:
df= df.value_counts(['airline_pnr', 'Pax Type'])
df = df.to_frame()
df= df.rename(columns = {0: "freq"})
But not getting the desired results
you can use groupby on the 'air_pnr' variable, and them use the size()
which counts the number of occurrences of each value.
df.groupby(['air_pnr','Pax_Type']).size()

Pandas: How to merge cells in the dataframe from a specific column using pandas?

I want to remove the duplicated names from the cells and merge them. This dataframe is generated after concatenating multiple dataframes.
My dataframe as under:
| | Customer ID | Category | VALUE |
| -:|:----------- |:------------- | -------:|
| 0 | GETO90 | Baby Sets | 1090.0 |
| 1 | GETO90 | Girls Dresses | 5357.0 |
| 2 | GETO90 | Girls Jumpers | 2823.0 |
| 3 | SETO90 | Girls Top | 3398.0 |
| 4 | SETO90 | Shorts | 7590.0 |
| 5 | SETO90 | Shorts | 7590.0 |
| 6 | RETO90 | Pants | 6590.0 |
| 7 | RETO90 | Pants | 6590.0 |
| 8 | RETO90 | Jeans | 8590.0 |
| 9 | YETO90 | Jeans | 9590.0 |
| 10| YETO90 | Jeans | 2590.0 |
I want to merge the first column and the expected dataframe is mentioned below:
| | Customer ID | Category | VALUE |
| -:|:----------- |:------------- | -------:|
| 0 | GETO90 | Baby Sets | 1090.0 |
| 1 | | Girls Dresses | 5357.0 |
| 2 | | Girls Jumpers | 2823.0 |
| 3 | SETO90 | Girls Top | 3398.0 |
| 4 | | Shorts | 7590.0 |
| 5 | | Shorts | 7590.0 |
| 6 | RETO90 | Pants | 6590.0 |
| 7 | | Pants | 6590.0 |
| 8 | | Jeans | 8590.0 |
| 9 | YETO90 | Jeans | 9590.0 |
| 10| | Jeans | 2590.0 |
Use duplicated with loc:
df.loc[df.duplicated('Customer ID'), 'Customer ID'] = ''

Remove groups from pandas where {condition}

I have dataframe like this:
+---+--------------------------------------+-----------+
| | envelopeid | message |
+---+--------------------------------------+-----------+
| 1 | d55edb65-dc77-41d0-bb53-43cf01376a04 | CMN.00002 |
| 2 | d55edb65-dc77-41d0-bb53-43cf01376a04 | CMN.00004 |
| 3 | d55edb65-dc77-41d0-bb53-43cf01376a04 | CMN.11001 |
| 4 | 5cb72b9c-adb8-4e1c-9296-db2080cb3b6d | CMN.00002 |
| 5 | 5cb72b9c-adb8-4e1c-9296-db2080cb3b6d | CMN.00001 |
| 6 | f4260b99-6579-4607-bfae-f601cc13ff0c | CMN.00202 |
| 7 | 8f673ae3-0293-4aca-ad6b-572f138515e6 | CMN.00002 |
| 8 | fee98470-aa8f-4ec5-8bcd-1683f85727c2 | TKP.00001 |
| 9 | 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00002 |
| 10| 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00004 |
+---+--------------------------------------+-----------+
I've grouped it with grouped = df.groupby('envelopeid')
And I need to remove all groups from the dataframe and stay only that groups that have messages (CMN.00002) or (CMN.00002 and CMN.00004) only.
Desired dataframe:
+---+--------------------------------------+-----------+
| | envelopeid | message |
+---+--------------------------------------+-----------+
| 7 | 8f673ae3-0293-4aca-ad6b-572f138515e6 | CMN.00002 |
| 9 | 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00002 |
| 10| 88926399-3697-4e15-8d25-6cb37a1d250e | CMN.00004 |
+---+--------------------------------------+-----------+
tried
(grouped.message.transform(lambda x: x.eq('CMN.00001').any() or (x.eq('CMN.00002').any() and x.ne('CMN.00002' or 'CMN.00004').any()) or x.ne('CMN.00002').all()))
but it is not working properly
Try:
grouped = df.loc[df['message'].isin(['CMN.00002', 'CMN.00002', 'CMN.00004'])].groupby('envelopeid')
Try this: df[df.message== 'CMN.00002']
outdf = df.groupby('envelopeid').filter(lambda x: tuple(x.message)== ('CMN.00002',) or tuple(x.message)== ('CMN.00002','CMN.00004'))
So i figured it up.
resulting dataframe will got only groups that have only CMN.00002 message or CMN.00002 and CMN.00004. This is what I need.
I used filter instead of transform.

Excel Formula to count all items in a group to see if the status is an Open status

I am working in Excel 2016. I am trying to figure out how many projects I have that have not had any part of it started. For instance if my project id is 203784 and it has 3 parts to it where 2 are Complete and 1 was Not Started. I would not want to count that. If the project had 3 parts and 2 were Not Started 1 was assigned. I would want to count that as 1. Thank you in advance you your assistance.
+----+------------+------------------+-------------+
| | A | B | C |
+----+------------+------------------+-------------+
| 1 | Project ID | Position | Status |
| 2 | 203784 | Staff | Complete |
| 3 | 203784 | Staff | Complete |
| 4 | 203784 | Staff | Not Started |
| 5 | 203785 | Maintenance | Complete |
| 6 | 203785 | Maintenance | In Progress |
| 7 | 203786 | Grounds | Complete |
| 8 | 203787 | Nurse | Complete |
| 9 | 203788 | Teacher | Complete |
| 10 | 203788 | Teacher | Complete |
| 11 | 203788 | Teacher | Complete |
| 12 | 203789 | Transportation | Complete |
| 13 | 203789 | Transportation | Complete |
| 14 | 203789 | Transportation | Complete |
| 15 | 203790 | Evacuation | Complete |
| 16 | 203790 | Evacuation | Complete |
| 17 | 203791 | Implementation | Complete |
| 18 | 203792 | Knowledge Base | Not Started |
| 19 | 203792 | Knowledge Base | Not Started |
| 20 | 203793 | Janitor | Not Started |
| 21 | 203794 | Public Relations | In Progress |
| 22 | 203795 | HR | Complete |
| 23 | 203796 | Admin | Complete |
+----+------------+------------------+-------------+
In this example. I would only want the count to show a total of 2. For project numbers 203792 and 203793.
One way would be to add a column (say Count) populated as:
=COUNTIFS(A:A,A2,C:C,"Complete")+COUNTIFS(A:A,A2,C:C,"In Progress")
and then create a PivotTable with Count as Filters and Project ID for Rows. Select 0 for the filter.

Blending Model: Oil Production

Oil Blending
An oil company produces three brands of oil: Regular, Multigrade, and
Supreme. Each brand of oil is composed of one or more of four crude stocks, each having a different lubrication index. The relevant data concerning the crude stocks are as follows.
+-------------+-------------------+------------------+--------------------------+
| Crude Stock | Lubrication Index | Cost (€/barrell) | Supply per day (barrels) |
+-------------+-------------------+------------------+--------------------------+
| 1 | 20 | 7,10 | 1000 |
+-------------+-------------------+------------------+--------------------------+
| 2 | 40 | 8,50 | 1100 |
+-------------+-------------------+------------------+--------------------------+
| 3 | 30 | 7,70 | 1200 |
+-------------+-------------------+------------------+--------------------------+
| 4 | 55 | 9,00 | 1100 |
+-------------+-------------------+------------------+--------------------------+
Each brand of oil must meet a minimum standard for a lubrication index, and each brand
thus sells at a different price. The relevant data concerning the three brands of oil are as
follows.
+------------+---------------------------+---------------+--------------+
| Brand | Minimum Lubrication index | Selling price | Daily demand |
+------------+---------------------------+---------------+--------------+
| Regular | 25 | 8,50 | 2000 |
+------------+---------------------------+---------------+--------------+
| Multigrade | 35 | 9,00 | 1500 |
+------------+---------------------------+---------------+--------------+
| Supreme | 50 | 10,00 | 750 |
+------------+---------------------------+---------------+--------------+
Determine an optimal output plan for a single day, assuming that production can be either
sold or else stored at negligible cost.
The daily demand figures are subject to alternative interpretations. Investigate the
following:
(a) The daily demands represent potential sales. In other words, the model should contain demand ceilings (upper limits). What is the optimal profit?
(b) The daily demands are strict obligations. In other words, the model should contain demand constraints that are met precisely. What is the optimal profit?
(c) The daily demands represent minimum sales commitments, but all output can be sold. In other words, the model should permit production to exceed the daily commitments. What is the optimal profit?
QUESTION
I've been able to construct the following model in Excel and solve it via OpenSolver, but I'm only able to integrate the mix for the Regular Oil.
I'm trying to work my way through the book Optimization Modeling with Spreadsheets by Kenneth R. Baker but I'm stuck with this exercise. While I could transfer the logic from another blending problem I'm not sure how to construct the model for multiple blendings at once.
I modeled the problem as a minimization problem on the cost of the different crude stocks. Using the Lubrication Index data I built the constraint for the R-Lub Index as a linear constraint. So far the answer seems to be right for the Regular Oil. However using this approach I've no idea how to include even the second Multigrade Oil.
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Decision Variables | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | C1 | C2 | C3 | C4 | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Inputs | 1000 | 0 | 1000 | 0 | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Objective Function | | | | | | Total | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Cost | 7,10 € | 8,50 € | 7,70 € | 9,00 € | | 14.800,00 € | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Constraints | | | | | | LHS | | RHS |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C1 supply | 1 | | | | | 1000 | <= | 1000 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C2 supply | | 1 | | | | 0 | <= | 1100 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C3 supply | | | 1 | | | 1000 | <= | 1200 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C4 supply | | | | 1 | | 0 | <= | 1100 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Lub Index | -5 | 15 | 5 | 30 | | 0 | >= | 0 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Output | 1 | 1 | 1 | 1 | | 2000 | = | 2000 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Blending Data | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Lub | 20 | 40 | 30 | 55 | | 25 | >= | 25 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
Here is the model with Excel formulars:
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Decision Variables | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | C1 | C2 | C3 | C4 | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Inputs | 1000 | 0 | 1000 | 0 | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Objective Function | | | | | | Total | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Cost | 7,1 | 8,5 | 7,7 | 9 | | =SUMMENPRODUKT(B5:E5;B8:E8) | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Constraints | | | | | | LHS | | RHS |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C1 supply | 1 | | | | | =SUMMENPRODUKT($B$5:$E$5;B11:E11) | <= | 1000 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C2 supply | | 1 | | | | =SUMMENPRODUKT($B$5:$E$5;B12:E12) | <= | 1100 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C3 supply | | | 1 | | | =SUMMENPRODUKT($B$5:$E$5;B13:E13) | <= | 1200 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C4 supply | | | | 1 | | =SUMMENPRODUKT($B$5:$E$5;B14:E14) | <= | 1100 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Lub Index | -5 | 15 | 5 | 30 | | =SUMMENPRODUKT($B$5:$E$5;B15:E15) | >= | 0 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Output | 1 | 1 | 1 | 1 | | =SUMMENPRODUKT($B$5:$E$5;B16:E16) | = | 2000 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Blending Data | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Lub | 20 | 40 | 30 | 55 | | =SUMMENPRODUKT($B$5:$E$5;B19:E19)/SUMME($B$5:$E$5) | >= | 25 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
A nudge in the right direction would be a tremendous help.
I think you want your objective to be Profit, which I would define as the sum of sales value - sum of cost.
To include all blends, develop calculations for Volume produced, Lube Index, Cost, and Value for each blend. Apply constraints for volume of stock used, volume produced, and lube index, and optimize for Profit.
I put together the model as follows ...
Columns A through D is the information you provided.
The 10's in G2:J5 are seed values for the stock volumes used in each blend. Solver will manipulate these.
Column K contains the total product volume produced. These will be constrained in different ways, as per your investigation (a), (b), and (c). It is =SUM(G3:J3) filled down.
Column L is the Lube Index for the product. As you noted, it is a linear blend - this is typically not true for blending problems. These values will be constrained in Solver. It is {=SUMPRODUCT(G3:J3,TRANSPOSE($B$2:$B$5))/$K3} filled down. Note that it is a Control-Shift-Enter (CSE) formula, required because of the TRANSPOSE.
Column M is the cost of the stock used to create the product. This is used in the Profit calculation. It is {=SUMPRODUCT(G3:J3,TRANSPOSE($C$2:$C$5))}, filled down. This is also a CSE formula.
Column N is the value of the product produced. This is used in the Profit calculation. It is =K3*C8 filled down.
Row 7 is the total stock volume used to generate all blends. These values will be constrained in Solver. It is =SUM(G3:G5), filled to the right.
The profit calculation is =SUM(N3:N5)-SUM(M3:M5).
Below is a snap of the Solver dialog box ...
It does the following ...
The objective is to maximize profit.
It will do this by manipulating the amount of stock that goes into each blend.
The first four constraints ($G$7 through $J$7) ensure the amount of stock available is not violated.
The next three constraints ($K$3 through $K$5) are for case (a) - make no more than product than there is demand.
The last three constraints ($L$3 through $L$5) make sure the lube index meets the minimum specification.
Not shown - I selected options for GRG Nonlinear and selected "Use Multistart" and deselected "Require Bounds on Variables".
Below is the result for case (a) ...
For case (b), change the constraints on Column K to be "=" instead of "<=". Below is the result ...
For case (c), change the constraints on Column K to be ">=". Below is the result ...
I think I came up with a solution, but I'm unsure if this is correct.
| Decision Variables | | | | | | | | | | | | | | | | |
|--------------------|---------|--------|--------|--------|-------------|--------|--------|--------|--------|--------|--------|--------|---|--------------------------------|----|------|
| | C1R | C1M | C1S | C2R | C2M | C2S | C3R | C3M | C3S | C4R | C4M | C4S | | | | |
| Inputs | 1000 | 0 | 0 | 800 | 0 | 300 | 0 | 1200 | 0 | 200 | 300 | 600 | | | | |
| | | | | | | | | | | | | | | | | |
| Objective Function | | | | | | | | | | | | | | Total Profit (Selling - Cost) | | |
| Cost | 7,10 € | 7,10 € | 7,10 € | 8,50 € | 8,50 € | 8,50 € | 7,70 € | 7,70 € | 7,70 € | 9,00 € | 9,00 € | 9,00 € | | 3.910,00 € | | |
| | | | | | | | | | | | | | | | | |
| Constraints | | | | | | | | | | | | | | LHS | | RHS |
| Regular | -5 | | | 15 | | | 5 | | | 30 | | | | 13000 | >= | 0 |
| Multi | | -15 | | | 5 | | | -5 | | | 20 | | | 0 | >= | 0 |
| Supreme | | | -30 | | | -10 | | | -20 | | | 5 | | 0 | >= | 0 |
| C1 Supply | 1 | 1 | 1 | | | | | | | | | | | 1000 | <= | 1000 |
| C2 Supply | | | | 1 | 1 | 1 | | | | | | | | 1100 | <= | 1100 |
| C3 Supply | | | | | | | 1 | 1 | 1 | | | | | 1200 | <= | 1200 |
| C4 Supply | | | | | | | | | | 1 | 1 | 1 | | 1100 | <= | 1100 |
| Regular Demand | 1 | | | 1 | | | 1 | | | 1 | | | | 2000 | >= | 2000 |
| Multi Demand | | 1 | | | 1 | | | 1 | | | 1 | | | 1500 | >= | 1500 |
| Supreme Demand | | | 1 | | | 1 | | | 1 | | | 1 | | 900 | >= | 750 |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| Selling | | | | | | | | | | | | | | | | |
| Regular | 8,50 € | x | 2000 | = | 17.000,00 € | | | | | | | | | | | |
| Multi | 9,00 € | x | 1500 | = | 13.500,00 € | | | | | | | | | | | |
| Supreme | 10,00 € | x | 900 | = | 9.000,00 € | | | | | | | | | | | |
| | | | | | 39.500,00 € | | | | | | | | | | | |

Resources