The dataframe has the following features:
+--------+--------+--------+------+-------+--------+-----+-------+
| | id | weight | type | value | export | tax | total |
+--------+--------+--------+------+-------+--------+-----+-------+
| 0 | 1 | 4 | 1 | 10 | 1 | 5 | 15 |
+--------+--------+--------+------+-------+--------+-----+-------+
| 1 | 2 | 3 | 1 | 12 | 1 | 6 | 18 |
+--------+--------+--------+------+-------+--------+-----+-------+
| 2 | 3 | 8 | 2 | 15 | 0 | 0 | 15 |
+--------+--------+--------+------+-------+--------+-----+-------+
| ... | ... | ... | ... | ... | | ... | ... |
+--------+--------+--------+------+-------+--------+-----+-------+
| 123004 | 123005 | 5 | 2 | 12 | 0 | 0 | 12 |
+--------+--------+--------+------+-------+--------+-----+-------+
The tax column should be predicted. It is important to consider the relationship between tax and export .
When export == 1 then tax is there.
The following code (Random forest as an example) predicts the tax without considering this rule.
y = df['tax']
X = df.drop(columns=['tax'])
from sklearn.model_selection import train_test_split# Split the data into training and testing sets
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=42)
rf = RandomForestRegressor(max_depth=10, random_state=101, n_estimators =42)
rf.fit(train_X, train_y);
predictions = rf.predict(test_X)
Questions:
1- How to tell the algorithm to consider the above rule?
2- The tax cannot be more than the value. How is it possible to set limitations or a range for the prediction?
3- If there is other method to predict the same result please mention it. (Random forest is not a must)
4- I am beginner in this field so good ideas for this sample are very welcome.
Related
I have the following dataframe:
------------------------------------------------------------
| Month | low_temp | high_temp | Wages | Extreme |
------------------------------------------------------------
| Jan | 0 | 3 | -0.42 | 1000 |
------------------------------------------------------------
| Jan | 1 | 3 | 0.56 | 3000 |
------------------------------------------------------------
| Feb | -1 | 2 | -0.61 | 2000 |
------------------------------------------------------------
| Feb | 0 | 1 | 0.36 | 3500 |
-------------------------------------------------------------
| Mar | 1.5 | 4 | -0.25 | 3000 |
-------------------------------------------------------------
| Mar | 2 | 5 | 0.75 | 4000 |
-------------------------------------------------------------
| Apr | 3 | 5 | -0.55 | 3000 |
------------------------------------------------------------
| Apr | 3.25 | 4 | 0.24 | 6000 |
-------------------------------------------------------------
What I'm trying to do is create one continuous plot with two x axes.
So, it would have the follow features:
one y-axis for low_temp and high_temp.
two lines: low_temp, high_temp
x_axis1 (the important one): Extreme
x_axis2 (below or above the x_axis1, lines up with it but isn't used to plot anything): Wages
And then, for each Month, create this chart and then string it together horizontally
so it ends up:
y --------------------------
| | | | |
| jan | feb | mar | apr |
-------------------------
x axis 1
x axis 2
This is my code attempt but it causes the x-axis to not be in line at all!
for index, month in enumerate(Months):
a = df[df['Month']==month].sort_values(by='Extreme')
x = a['Extreme']
y_1 = a['low_temp']
y_2 = a['high_temp']
plt.subplot(1,4,index+1)
plt.plot(x,y_1,'bo-')
plt.plot(x,y_2, 'ro-')
plt.xticks(a.Wages)
plt.title(month)
plt.show()
The charts also appear in a vertical line so they aren't horizontally contiguous.
Any help is very much appreciated! thanks!
Is there a way by using a DAX measure to create the column which contain text values instead of the numeric sum/count that it will automatically give?
In the example below the first name will appear as a value (in the first table) instead of their name as in the second.
Data table:
+----+------------+------------+---------------+-------+-------+
| id | first_name | last_name | currency | Sales | Stock |
+----+------------+------------+---------------+-------+-------+
| 1 | Giovanna | Christon | Peso | 10 | 12 |
| 2 | Roderich | MacMorland | Peso | 8 | 10 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 4 | 6 |
| 1 | Giovanna | Christon | Peso | 11 | 13 |
| 2 | Roderich | MacMorland | Peso | 9 | 11 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 5 | 7 |
| 1 | Giovanna | Christon | Peso | 15 | 17 |
| 2 | Roderich | MacMorland | Peso | 10 | 12 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 6 | 8 |
| 1 | Giovanna | Christon | Peso | 17 | 19 |
| 2 | Roderich | MacMorland | Peso | 11 | 13 |
| 3 | Bond | Arkcoll | Yuan Renminbi | 7 | 9 |
+----+------------+------------+---------------+-------+-------+
No DAX needed. You should put the first_name field on Rows and not on Values. Select Tabular View for the Report Layout. Like this:
After some search I found 4 ways.
measure 1 (will return blank if values differ):
=IF(COUNTROWS(VALUES(Table1[first_name])) > 1, BLANK(), VALUES(Table1[first_name]))
measure 2 (will return blank if values differ):
=CALCULATE(
VALUES(Table1[first_name]),
FILTER(Table1,
COUNTROWS(VALUES(Table1[first_name]))=1))
measure 3 (will show every single text value), thanks # Rory:
=CONCATENATEX(Table1,[first_name]," ")
For very large dataset this concatenate seems to work better:
=CALCULATE(CONCATENATEX(VALUES(Table1[first_name]),Table1[first_name]," "))
Results:
I want to compose sales table for purchased and sold items to see total profit. It's easy to do when items are purchased and sold individually or as a lot. But how to handle situation when one buys collection of items and sells them one by one. For example, I buy a collection (C) of a hammer and a screwdriver and sell tools separately. If I would enter data into simple table as in the image, I would get wrong profit result.
When there are only two items, I could divide their purchase price randomly, but when there are many items and not all of them are yet sold, I can't easily see if this collection already made profit or not.
I expect correct output of profit. In this case collection cost was 10 and selling price of all collection items was 13. Thus it should show profit of 3, not loss of -7. I was thinking of adding 2 new column, like IsCollection, CollectionID. Then derive a formula, which would use either simple subtraction or would check price of a whole collection and subtract it from the sum of items that belong to that collection. Deriving such formula is another question... But maybe there is an easier way of accomplishing the same
I added a column COLLECTION to identify item who belong to a collection.
Then I used SUMIF to sum sell price for items which belong at the same collection.
Then I used IF in Profit column to use summed sell price or single sell price.
You need to define in some formula a range of cell (see below).
Problem: you can't add profit values to obtain Total profit.
I used opencalc (but it should be almost the same in Excel).
Content of
SUM_COLL (row2):
=SUMIF($A$1:$A$22;"="&A2;$D$1:$D$22)
SUM_COLL (row3):
=SUMIF($A$1:$A$22;"="&A3;$D$1:$D$22)
and so on.
Profit (row2):
=IF(A2<>"";E2-C2;D2-C2)
Profit (row3):
=IF(A3<>"";E3-C3;D3-C3)
+------------+-----------+-------------+------------+----------+--------+
| COLLECTION | Item name | Purch Price | Sell Price | SUM_COLL | Profit |
+------------+-----------+-------------+------------+----------+--------+
| | A | 1 | 1.5 | 0 | 0.5 |
+------------+-----------+-------------+------------+----------+--------+
| | B | 2 | 2.1 | 0 | 0.1 |
+------------+-----------+-------------+------------+----------+--------+
| C | C1 | 10 | 7 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| C | C2 | 10 | 6 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| D | D1 | 7 | 15 | 23 | 16 |
+------------+-----------+-------------+------------+----------+--------+
| | E | 8 | 12 | 0 | 4 |
+------------+-----------+-------------+------------+----------+--------+
| C | C3 | 10 | 14 | 27 | 17 |
+------------+-----------+-------------+------------+----------+--------+
| D | D2 | 7 | 8 | 23 | 16 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
| | | | | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+
Update:
I added two more column to make Profit summable:
COUNT_COLL (row2):
=COUNTIF($A$1:$A$22;"="&A2)
COUNT_COLL (row3):
=COUNTIF($A$1:$A$22;"="&A3)
Profit_SUMMABLE (row2)
=IF(A2<>"";(E2-C2)/G2;D2-C2)
Profit_SUMMABLE (row3)
=IF(A3<>"";(E3-C3)/G3;D3-C3)
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| COLLECTION | Item name | Purch Price | Sell Price | SUM_COLL | Profit | COUNT_COLL | Profit_SUMMABLE |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | A | 1 | 1.5 | 0 | 0.5 | 0 | 0.5 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | B | 2 | 2.1 | 0 | 0.1 | 0 | 0.1 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C1 | 10 | 7 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C2 | 10 | 6 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| D | D1 | 7 | 15 | 23 | 16 | 2 | 8 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | E | 8 | 12 | 0 | 4 | 0 | 4 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| C | C3 | 10 | 14 | 27 | 17 | 3 | 5.6666666667 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| D | D2 | 7 | 8 | 23 | 16 | 2 | 8 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
| | | | | 0 | 0 | 0 | 0 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
...
...
| TOTAL | | | | | 87.6 | | 37.6 |
+------------+-----------+-------------+------------+----------+--------+------------+-----------------+
I have following problem:
I want to give scores to a range of numbers from 1-10 for example:
| | A | B |
|---|------|----|
| 1 | 1209 | 1 |
| 2 | 401 | 7 |
| 3 | 123 | 9 |
| 4 | 49 | 10 |
| 5 | 30 | 10 |
(Not sure if B is 100% correct but roughly)
I got the B values with
=ABS(CEILING(A1;MAX($A$1:$A$32)/10)*10/MAX($A$1:$A$32)-11)
It seems to work but if I for example take numbers like
| | A | B |
|---|------|----|
| 1 | 100 | 1 |
| 2 | 90 | 2 |
| 3 | 80 | 3 |
| 4 | 70 | 4 |
| 5 | 50 | 6 |
But I want 50 to be 10.
I would like to have it scalable so I can do it with a 1-10 or 1-100 or 5-27 or whatever scale and with however many numbers in the list and whatever numbers to score from.
Thanks!
Use this formula:
=$E$1 + ROUND((MIN($A:$A)-A1)/((MAX($A:$A)-MIN($A:$A))/($E$1-$E$2)),0)
It is scalable. You put the max and min in E1 and E2.
I am trying to find a ratio-like formula that takes into account the difference between the number of items in stock and those that are on sale, as well as the actual number of items in stock and on sale.
The difference between 2 and 1 is not important and throws off the reports.
The difference between 20,000 and 10,000 is way more important.
However, The different between 2000 and 10 though is even more important.
Just a simple ratio would usually work but not in this case since the actual quantity of items (or those on sale) are not accounted for...I get a .4 ratio when only 5 items are in stock (Item 9 below).
I looked at using the percentage difference first but it wouldn't work even if the amount of items in stock are taken into account being that numbers can be identical (items 12 & 13 below).
Let me know if I need to clarify.
+-----------+-----------+---------+--------+-----------+
| Item Name | In stock | On Sale | Ratio | Perc Diff |
+-----------+-----------+---------+--------+-----------+
| Item 12 | 1 | 1 | 1.0000 | 0.0000 |
| Item 13 | 1 | 1 | 1.0000 | 0.0000 |
| Item 1 | 1,000 | 500 | 0.5000 | -0.5000 |
| Item 2 | 900 | 400 | 0.4444 | -0.5556 |
| Item 9 | 5 | 2 | 0.4000 | -0.6000 |
| Item 3 | 800 | 300 | 0.3750 | -0.6250 |
| Item 8 | 300 | 100 | 0.3333 | -0.6667 |
| Item 11 | 3 | 1 | 0.3333 | -0.6667 |
| Item 4 | 700 | 200 | 0.2857 | -0.7143 |
| Item 7 | 400 | 100 | 0.2500 | -0.7500 |
| Item 10 | 4 | 1 | 0.2500 | -0.7500 |
| Item 6 | 500 | 100 | 0.2000 | -0.8000 |
| Item 5 | 600 | 100 | 0.1667 | -0.8333 |
+-----------+-----------+---------+--------+-----------+
Providing a weight of the percentage difference against the absolute or instock value should improve your results, I needed to add an image to explain what I was doing properly.