How does efficient_apriori package code itemsets? - python-3.x

Doing association rules mining using efficient_apriori package in python.Found a very useful answer on how to convert the output to a dataframe
However, struggling with the Itemset output and hoping someone can help me parse that correctly. Guessing LHS is an index value, but struggling with the decimal values in RHS. Does anyone know how the encoding is done? I have tried the same with SKU descriptions, and get the same output.
Input dataframe looks like this:
| SKU | Count | Percent |
|----------------------------------------------------------------------|-------|-------------|
| "('000000009100000749',)" | 110 | 0.029633621 |
| "('000000009100000749', '000000009100000776')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000776', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002260')" | 1 | 0.000269397 |
| "('000000009100000749', '000000009100000777', '000000009100002530')" | 1 | 0.000269397 |
Output looks like this:
| | lhs | rhs | count_full | count_lhs | count_rhs | num_transactions | confidence | support |
|---|-----------------------------|-----------------------------|------------|-----------|-----------|------------------|------------|-------------|
| 0 | "(1,)" | "(0.00026939655172413793,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 1 | "(0.00026939655172413793,)" | "(1,)" | 168 | 168 | 168 | 297 | 1 | 0.565656566 |
| 2 | "(2,)" | "(0.0005387931034482759,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 3 | "(0.0005387931034482759,)" | "(2,)" | 36 | 36 | 36 | 297 | 1 | 0.121212121 |
| 4 | "(3,)" | "(0.0008081896551724138,)" | 21 | 21 | 21 | 297 | 1 | 0.070707071 |
Could someone help me understand what's outputting in the LHS and RHS columns, and how to join that back to the 'SKU'? Ideally would like the output to have the 'SKU' instead of whatever is showing up.
I have looked at the documentation and it is quite sparse.

Related

Replace accounting notation for negative number with minus value

I have a dataframe which contains negative numbers, with accountancy notation i.e.:
df.select('sales').distinct().show()
+------------+
| sales |
+------------+
| 18 |
| 3 |
| 10 |
| (5)|
| 4 |
| 40 |
| 0 |
| 8 |
| 16 |
| (2)|
| 2 |
| (1)|
| 14 |
| (3)|
| 9 |
| 19 |
| (6)|
| 1 |
| (9)|
| (4)|
+------------+
only showing top 20 rows
The numbers wrapped in () are negative. How can I replace them to have minus values instead i.e. (5) becomes -5 and so on.
Here is what I have tried:
sales = (
df
.select('sales')
.withColumn('sales_new',
sf.when(sf.col('sales').substr(1,1) == '(',
sf.concat(sf.lit('-'), sf.col('sales').substr(2,3)))
.otherwise(sf.col('sales')))
)
sales.show(20,False)
+---------+---------+
|salees |sales_new|
+---------+---------+
| 151 | 151 |
| 134 | 134 |
| 151 | 151 |
|(151) |-151 |
|(134) |-134 |
|(151) |-151 |
| 151 | 151 |
| 50 | 50 |
| 101 | 101 |
| 134 | 134 |
|(134) |-134 |
| 46 | 46 |
| 151 | 151 |
| 134 | 134 |
| 185 | 185 |
| 84 | 84 |
| 188 | 188 |
|(94) |-94) |
| 38 | 38 |
| 21 | 21 |
+---------+---------+
The issue is that the length of sales can vary so hardcoding a value into the substring() won't work in some cases.
I have tried using regexp_replace but get an error that:
PatternSyntaxException: Unclosed group near index 1
sales = (
df
.select('sales')
.withColumn('sales_new', regexp_replace(sf.col('sales'), '(', ''))
)
This can be solved with a case statement and regular expression together:
from pyspark.sql.functions import regexp_replace, col
sales = (
df
.select('sales')
.withColumn('sales_new', sf.when(sf.col('sales').substr(1,1) == '(',
sf.concat(sf.lit('-'), regexp_replace(sf.col('sales'), '\(|\)', '')))
.otherwise(sf.col('sales')))
)
sales.show(20,False)
+---------+---------+
|sales |sales_new|
+---------+---------+
|151 |151 |
|134 |134 |
|151 |151 |
|(151) |-151 |
|(134) |-134 |
|(151) |-151 |
|151 |151 |
|50 |50 |
|101 |101 |
|134 |134 |
|(134) |-134 |
|46 |46 |
|151 |151 |
|134 |134 |
|185 |185 |
|84 |84 |
|188 |188 |
|(94) |-94 |
|38 |38 |
|21 |21 |
+---------+---------+
You can slice the string from the second character to the second last character, and then convert it to float, for example:
def convert(number):
try:
number = float(number)
except:
number = number[1:-1]
number = float(number)
return number
You can iterate through all the elements and apply this function.

How to create a calculated column in access 2013 to detect duplicates

I'm recreating a tool I made in Excel as it's getting bigger and performance is getting out of hand.
The issue is that I only have MS Access 2013 on my work laptop and I'm fairly new to the Expression Builder in Access 2013, which has a very limited function base to be honest.
My data has duplicates on the [Location] column, meaning that, I have multiple SKUs on that warehouse location. However, some of my calculations need to be done only once per [Location]. My solution to that, in Excel, was to make a formula (see below) putting 1 only on the first appearance of that location, putting 0 on next appearances. Doing that works like a charm because summing over that [Duplicate] column while imposing multiple criteria returns the number of occurrences of the multiple criteria counting locations only once.
Now, MS Access 2013 Expression Builder has no SUM nor COUNT functions to create a calculated column emulating my [Duplicate] column from Excel. Preferably, I would just input the raw data and let Access populate the calculated fields vs also inputting the calculated fields as well, since that would defeat my original purpose of reducing the computational cost of creating my dashboard.
The question is, how would you create a calculated column, in MS Access 2013 Expression Builder to recreate the below Excel function:
= IF($D$2:$D3=$D4,0,1)
In the sake of reducing the file size (over 100K rows) I even replace the 0 by a blank character "".
Thanks in advance for your help
Y
First and foremost, understand MS Access' Expression Builder is a convenience tool to build an SQL expression. Everything in Query Design ultimately is to build an SQL query. For this reason, you have to use a set-based mentality to see data in whole sets of related tables and not cell-by-cell mindset.
Specifically, to achieve:
putting 1 only on the first appearance of that location, putting 0 on next appearances
Consider a whole set-based approach by joining on a separate, aggregate query to identify the first value of your needed grouping, then calculate needed IIF expression. Below assumes you have an autonumber or primary key field in table (a standard in relational databases):
Aggregate Query (save as a separate query, adjust columns as needed)
SELECT ColumnD, MIN(AutoNumberID) As MinID
FROM myTable
GROUP BY ColumnD
Final Query (join to original table and build final IIF expression)
SELECT m.*, IIF(agg.MinID = AutoNumberID, 1, 0) As Dup_Indicator
FROM myTable m
INNER JOIN myAggregateQuery agg
ON m.[ColumnD] = agg.ColumnD
To demonstrate with random data:
Original
| ID | GROUP | INT | NUM | CHAR | BOOL | DATE |
|----|--------|-----|--------------|------|-------|------------|
| 1 | r | 9 | 1.424490258 | B6z | TRUE | 7/4/1994 |
| 2 | stata | 10 | 2.591235683 | h7J | FALSE | 10/5/1971 |
| 3 | spss | 6 | 0.560461966 | Hrn | TRUE | 11/27/1990 |
| 4 | stata | 10 | -1.499272175 | eXL | FALSE | 4/17/2010 |
| 5 | stata | 15 | 1.470269177 | Vas | TRUE | 6/13/2010 |
| 6 | r | 14 | -0.072238898 | puP | TRUE | 4/1/1994 |
| 7 | julia | 2 | -1.370405263 | S2l | FALSE | 12/11/1999 |
| 8 | spss | 6 | -0.153684675 | mAw | FALSE | 7/28/1977 |
| 9 | spss | 10 | -0.861482674 | cxC | FALSE | 7/17/1994 |
| 10 | spss | 2 | -0.817222582 | GRn | FALSE | 10/19/2012 |
| 11 | stata | 2 | 0.949287754 | xgc | TRUE | 1/18/2003 |
| 12 | stata | 5 | -1.580841322 | Y1D | TRUE | 6/3/2011 |
| 13 | r | 14 | -1.671303816 | JCP | FALSE | 5/15/1981 |
| 14 | r | 7 | 0.904181025 | Rct | TRUE | 7/24/1977 |
| 15 | stata | 10 | -1.198211174 | qJY | FALSE | 5/6/1982 |
| 16 | julia | 10 | -0.265808162 | 10s | FALSE | 3/18/1975 |
| 17 | r | 13 | -0.264955027 | 8Md | TRUE | 6/11/1974 |
| 18 | r | 4 | 0.518302149 | 4KW | FALSE | 9/12/1980 |
| 19 | r | 5 | -0.053620183 | 8An | FALSE | 4/17/2004 |
| 20 | r | 14 | -0.359197116 | F8Q | TRUE | 6/14/2005 |
| 21 | spss | 11 | -2.211875193 | AgS | TRUE | 4/11/1973 |
| 22 | stata | 4 | -1.718749471 | Zqr | FALSE | 2/20/1999 |
| 23 | python | 10 | 1.207878576 | tcC | FALSE | 4/18/2008 |
| 24 | stata | 11 | 0.548902226 | PFJ | TRUE | 9/20/1994 |
| 25 | stata | 6 | 1.479125922 | 7a7 | FALSE | 3/2/1989 |
| 26 | python | 10 | -0.437245299 | r32 | TRUE | 6/7/1997 |
| 27 | sas | 14 | 0.404746106 | 6NJ | TRUE | 9/23/2013 |
| 28 | stata | 8 | 2.206741458 | Ive | TRUE | 5/26/2008 |
| 29 | spss | 12 | -0.470694096 | dPS | TRUE | 5/4/1983 |
| 30 | sas | 15 | -0.57169507 | yle | TRUE | 6/20/1979 |
SQL (uses aggregate in subquery but can be a stored query)
SELECT r.*, IIF(sub.MinID = r.ID,1, 0) AS Dup
FROM Random_Data r
LEFT JOIN
(
SELECT r.GROUP, MIN(r.ID) As MinID
FROM Random_Data r
GROUP BY r.Group
) sub
ON r.[Group] = sub.[GROUP]
Output (notice the first GROUP value is tagged 1, all else 0)
| ID | GROUP | INT | NUM | CHAR | BOOL | DATE | Dup |
|----|--------|-----|--------------|------|-------|------------|-----|
| 1 | r | 9 | 1.424490258 | B6z | TRUE | 7/4/1994 | 1 |
| 2 | stata | 10 | 2.591235683 | h7J | FALSE | 10/5/1971 | 1 |
| 3 | spss | 6 | 0.560461966 | Hrn | TRUE | 11/27/1990 | 1 |
| 4 | stata | 10 | -1.499272175 | eXL | FALSE | 4/17/2010 | 0 |
| 5 | stata | 15 | 1.470269177 | Vas | TRUE | 6/13/2010 | 0 |
| 6 | r | 14 | -0.072238898 | puP | TRUE | 4/1/1994 | 0 |
| 7 | julia | 2 | -1.370405263 | S2l | FALSE | 12/11/1999 | 1 |
| 8 | spss | 6 | -0.153684675 | mAw | FALSE | 7/28/1977 | 0 |
| 9 | spss | 10 | -0.861482674 | cxC | FALSE | 7/17/1994 | 0 |
| 10 | spss | 2 | -0.817222582 | GRn | FALSE | 10/19/2012 | 0 |
| 11 | stata | 2 | 0.949287754 | xgc | TRUE | 1/18/2003 | 0 |
| 12 | stata | 5 | -1.580841322 | Y1D | TRUE | 6/3/2011 | 0 |
| 13 | r | 14 | -1.671303816 | JCP | FALSE | 5/15/1981 | 0 |
| 14 | r | 7 | 0.904181025 | Rct | TRUE | 7/24/1977 | 0 |
| 15 | stata | 10 | -1.198211174 | qJY | FALSE | 5/6/1982 | 0 |
| 16 | julia | 10 | -0.265808162 | 10s | FALSE | 3/18/1975 | 0 |
| 17 | r | 13 | -0.264955027 | 8Md | TRUE | 6/11/1974 | 0 |
| 18 | r | 4 | 0.518302149 | 4KW | FALSE | 9/12/1980 | 0 |
| 19 | r | 5 | -0.053620183 | 8An | FALSE | 4/17/2004 | 0 |
| 20 | r | 14 | -0.359197116 | F8Q | TRUE | 6/14/2005 | 0 |
| 21 | spss | 11 | -2.211875193 | AgS | TRUE | 4/11/1973 | 0 |
| 22 | stata | 4 | -1.718749471 | Zqr | FALSE | 2/20/1999 | 0 |
| 23 | python | 10 | 1.207878576 | tcC | FALSE | 4/18/2008 | 1 |
| 24 | stata | 11 | 0.548902226 | PFJ | TRUE | 9/20/1994 | 0 |
| 25 | stata | 6 | 1.479125922 | 7a7 | FALSE | 3/2/1989 | 0 |
| 26 | python | 10 | -0.437245299 | r32 | TRUE | 6/7/1997 | 0 |
| 27 | sas | 14 | 0.404746106 | 6NJ | TRUE | 9/23/2013 | 1 |
| 28 | stata | 8 | 2.206741458 | Ive | TRUE | 5/26/2008 | 0 |
| 29 | spss | 12 | -0.470694096 | dPS | TRUE | 5/4/1983 | 0 |
| 30 | sas | 15 | -0.57169507 | yle | TRUE | 6/20/1979 | 0 |

Blending Model: Oil Production

Oil Blending
An oil company produces three brands of oil: Regular, Multigrade, and
Supreme. Each brand of oil is composed of one or more of four crude stocks, each having a different lubrication index. The relevant data concerning the crude stocks are as follows.
+-------------+-------------------+------------------+--------------------------+
| Crude Stock | Lubrication Index | Cost (€/barrell) | Supply per day (barrels) |
+-------------+-------------------+------------------+--------------------------+
| 1 | 20 | 7,10 | 1000 |
+-------------+-------------------+------------------+--------------------------+
| 2 | 40 | 8,50 | 1100 |
+-------------+-------------------+------------------+--------------------------+
| 3 | 30 | 7,70 | 1200 |
+-------------+-------------------+------------------+--------------------------+
| 4 | 55 | 9,00 | 1100 |
+-------------+-------------------+------------------+--------------------------+
Each brand of oil must meet a minimum standard for a lubrication index, and each brand
thus sells at a different price. The relevant data concerning the three brands of oil are as
follows.
+------------+---------------------------+---------------+--------------+
| Brand | Minimum Lubrication index | Selling price | Daily demand |
+------------+---------------------------+---------------+--------------+
| Regular | 25 | 8,50 | 2000 |
+------------+---------------------------+---------------+--------------+
| Multigrade | 35 | 9,00 | 1500 |
+------------+---------------------------+---------------+--------------+
| Supreme | 50 | 10,00 | 750 |
+------------+---------------------------+---------------+--------------+
Determine an optimal output plan for a single day, assuming that production can be either
sold or else stored at negligible cost.
The daily demand figures are subject to alternative interpretations. Investigate the
following:
(a) The daily demands represent potential sales. In other words, the model should contain demand ceilings (upper limits). What is the optimal profit?
(b) The daily demands are strict obligations. In other words, the model should contain demand constraints that are met precisely. What is the optimal profit?
(c) The daily demands represent minimum sales commitments, but all output can be sold. In other words, the model should permit production to exceed the daily commitments. What is the optimal profit?
QUESTION
I've been able to construct the following model in Excel and solve it via OpenSolver, but I'm only able to integrate the mix for the Regular Oil.
I'm trying to work my way through the book Optimization Modeling with Spreadsheets by Kenneth R. Baker but I'm stuck with this exercise. While I could transfer the logic from another blending problem I'm not sure how to construct the model for multiple blendings at once.
I modeled the problem as a minimization problem on the cost of the different crude stocks. Using the Lubrication Index data I built the constraint for the R-Lub Index as a linear constraint. So far the answer seems to be right for the Regular Oil. However using this approach I've no idea how to include even the second Multigrade Oil.
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Decision Variables | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | C1 | C2 | C3 | C4 | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Inputs | 1000 | 0 | 1000 | 0 | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Objective Function | | | | | | Total | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Cost | 7,10 € | 8,50 € | 7,70 € | 9,00 € | | 14.800,00 € | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Constraints | | | | | | LHS | | RHS |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C1 supply | 1 | | | | | 1000 | <= | 1000 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C2 supply | | 1 | | | | 0 | <= | 1100 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C3 supply | | | 1 | | | 1000 | <= | 1200 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| C4 supply | | | | 1 | | 0 | <= | 1100 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Lub Index | -5 | 15 | 5 | 30 | | 0 | >= | 0 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Output | 1 | 1 | 1 | 1 | | 2000 | = | 2000 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| Blending Data | | | | | | | | |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
| R- Lub | 20 | 40 | 30 | 55 | | 25 | >= | 25 |
+--------------------+--------+--------+--------+--------+--+-------------+----+------+
Here is the model with Excel formulars:
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Decision Variables | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | C1 | C2 | C3 | C4 | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Inputs | 1000 | 0 | 1000 | 0 | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Objective Function | | | | | | Total | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Cost | 7,1 | 8,5 | 7,7 | 9 | | =SUMMENPRODUKT(B5:E5;B8:E8) | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Constraints | | | | | | LHS | | RHS |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C1 supply | 1 | | | | | =SUMMENPRODUKT($B$5:$E$5;B11:E11) | <= | 1000 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C2 supply | | 1 | | | | =SUMMENPRODUKT($B$5:$E$5;B12:E12) | <= | 1100 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C3 supply | | | 1 | | | =SUMMENPRODUKT($B$5:$E$5;B13:E13) | <= | 1200 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| C4 supply | | | | 1 | | =SUMMENPRODUKT($B$5:$E$5;B14:E14) | <= | 1100 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Lub Index | -5 | 15 | 5 | 30 | | =SUMMENPRODUKT($B$5:$E$5;B15:E15) | >= | 0 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Output | 1 | 1 | 1 | 1 | | =SUMMENPRODUKT($B$5:$E$5;B16:E16) | = | 2000 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| Blending Data | | | | | | | | |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
| R- Lub | 20 | 40 | 30 | 55 | | =SUMMENPRODUKT($B$5:$E$5;B19:E19)/SUMME($B$5:$E$5) | >= | 25 |
+--------------------+------+-----+------+----+--+----------------------------------------------------+----+------+
A nudge in the right direction would be a tremendous help.
I think you want your objective to be Profit, which I would define as the sum of sales value - sum of cost.
To include all blends, develop calculations for Volume produced, Lube Index, Cost, and Value for each blend. Apply constraints for volume of stock used, volume produced, and lube index, and optimize for Profit.
I put together the model as follows ...
Columns A through D is the information you provided.
The 10's in G2:J5 are seed values for the stock volumes used in each blend. Solver will manipulate these.
Column K contains the total product volume produced. These will be constrained in different ways, as per your investigation (a), (b), and (c). It is =SUM(G3:J3) filled down.
Column L is the Lube Index for the product. As you noted, it is a linear blend - this is typically not true for blending problems. These values will be constrained in Solver. It is {=SUMPRODUCT(G3:J3,TRANSPOSE($B$2:$B$5))/$K3} filled down. Note that it is a Control-Shift-Enter (CSE) formula, required because of the TRANSPOSE.
Column M is the cost of the stock used to create the product. This is used in the Profit calculation. It is {=SUMPRODUCT(G3:J3,TRANSPOSE($C$2:$C$5))}, filled down. This is also a CSE formula.
Column N is the value of the product produced. This is used in the Profit calculation. It is =K3*C8 filled down.
Row 7 is the total stock volume used to generate all blends. These values will be constrained in Solver. It is =SUM(G3:G5), filled to the right.
The profit calculation is =SUM(N3:N5)-SUM(M3:M5).
Below is a snap of the Solver dialog box ...
It does the following ...
The objective is to maximize profit.
It will do this by manipulating the amount of stock that goes into each blend.
The first four constraints ($G$7 through $J$7) ensure the amount of stock available is not violated.
The next three constraints ($K$3 through $K$5) are for case (a) - make no more than product than there is demand.
The last three constraints ($L$3 through $L$5) make sure the lube index meets the minimum specification.
Not shown - I selected options for GRG Nonlinear and selected "Use Multistart" and deselected "Require Bounds on Variables".
Below is the result for case (a) ...
For case (b), change the constraints on Column K to be "=" instead of "<=". Below is the result ...
For case (c), change the constraints on Column K to be ">=". Below is the result ...
I think I came up with a solution, but I'm unsure if this is correct.
| Decision Variables | | | | | | | | | | | | | | | | |
|--------------------|---------|--------|--------|--------|-------------|--------|--------|--------|--------|--------|--------|--------|---|--------------------------------|----|------|
| | C1R | C1M | C1S | C2R | C2M | C2S | C3R | C3M | C3S | C4R | C4M | C4S | | | | |
| Inputs | 1000 | 0 | 0 | 800 | 0 | 300 | 0 | 1200 | 0 | 200 | 300 | 600 | | | | |
| | | | | | | | | | | | | | | | | |
| Objective Function | | | | | | | | | | | | | | Total Profit (Selling - Cost) | | |
| Cost | 7,10 € | 7,10 € | 7,10 € | 8,50 € | 8,50 € | 8,50 € | 7,70 € | 7,70 € | 7,70 € | 9,00 € | 9,00 € | 9,00 € | | 3.910,00 € | | |
| | | | | | | | | | | | | | | | | |
| Constraints | | | | | | | | | | | | | | LHS | | RHS |
| Regular | -5 | | | 15 | | | 5 | | | 30 | | | | 13000 | >= | 0 |
| Multi | | -15 | | | 5 | | | -5 | | | 20 | | | 0 | >= | 0 |
| Supreme | | | -30 | | | -10 | | | -20 | | | 5 | | 0 | >= | 0 |
| C1 Supply | 1 | 1 | 1 | | | | | | | | | | | 1000 | <= | 1000 |
| C2 Supply | | | | 1 | 1 | 1 | | | | | | | | 1100 | <= | 1100 |
| C3 Supply | | | | | | | 1 | 1 | 1 | | | | | 1200 | <= | 1200 |
| C4 Supply | | | | | | | | | | 1 | 1 | 1 | | 1100 | <= | 1100 |
| Regular Demand | 1 | | | 1 | | | 1 | | | 1 | | | | 2000 | >= | 2000 |
| Multi Demand | | 1 | | | 1 | | | 1 | | | 1 | | | 1500 | >= | 1500 |
| Supreme Demand | | | 1 | | | 1 | | | 1 | | | 1 | | 900 | >= | 750 |
| | | | | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | | | |
| Selling | | | | | | | | | | | | | | | | |
| Regular | 8,50 € | x | 2000 | = | 17.000,00 € | | | | | | | | | | | |
| Multi | 9,00 € | x | 1500 | = | 13.500,00 € | | | | | | | | | | | |
| Supreme | 10,00 € | x | 900 | = | 9.000,00 € | | | | | | | | | | | |
| | | | | | 39.500,00 € | | | | | | | | | | | |

How use grep for that complicated expressions?

+----+-------+-----+
| ID | STORE | QTY |
+----+-------+-----+
| | | |
| 9 | 101 | 18 |
| | | |
| 8 | 154 | 19 |
| | | |
| 7 | 111 | 13 |
| | | |
| 9 | 154 | 18 |
| | | |
| 8 | 101 | 19 |
| | | |
| 7 | 101 | 13 |
| | | |
| 9 | 111 | 18 |
| | | |
| 8 | 111 | 19 |
| | | |
| 7 | 154 | 14 |
+----+-------+-----+
Suppose that I have 3 stores, and I'd like to take STORE for every id which qty is the same for every store.
e.g id 9 is in 3 stores, in every store has 18 qty,
but id 7 is in stores but in only two store has equal qty (in store 111 and 101 - in 154 - id has 14 qty); how can I get that result using grep?
Do you think that is impossible to get that one in one expressions? I thought about regex but I don't know in which way I get Qty and compare to another row. In my file it looks like:
Extract the first and last columns by cut, count the number of uniq combinations, and output only those whose count is 3 (i.e. the value is the same for all three stores):
$ cut -d\| -f2,4 | sort | uniq -c | grep '^ *3 '
3 8 | 19
3 9 | 18

How to add space between rows and sum up automatically in Excel

let's say that I have a table like the below:
| | Value 1 | Value 2 | Value 3 | |
|---|---------|---------|---------|---|
| A | 22 | 12 | 3 | |
| A | 5 | 6 | 12 | |
| A | 19 | 9 | 13 | |
| A | 22 | 43 | 31 | |
| B | 7 | 12 | 23 | |
| B | 5 | 5 | 8 | |
| B | 35 | 78 | 9 | |
| B | 45 | 1 | 8 | |
| C | 34 | 56 | 0 | |
| C | 22 | 1 | 14 | |
| C | 13 | 46 | 45 | |
and that I'd need to transform it into the below:
| | Value 1 | Value 2 | Value 3 | |
|---|---------|---------|---------|---|
| A | 22 | 12 | 3 | |
| A | 5 | 6 | 12 | |
| A | 19 | 9 | 13 | |
| A | 22 | 43 | 31 | |
| | 68 | 70 | 59 | |
| | | | | |
| B | 7 | 12 | 23 | |
| B | 5 | 5 | 8 | |
| B | 35 | 78 | 9 | |
| B | 45 | 1 | 8 | |
| | 92 | 96 | 48 | |
| | | | | |
| C | 34 | 56 | 0 | |
| C | 22 | 1 | 14 | |
| C | 13 | 46 | 45 | |
| | 69 | 103 | 59 | |
How could I obtain the desired effect automatically?
There would be n empty rows after each group and the sums of each column within the group.
You can use the Subtotal feature of Excel. Subtotal is in the "Data" tab of the ribbon. To automatically add the totals between groupings. I don't think it adds the blank row. If you absolutely need the blank row, then I can generate some VBA that will work.

Resources