Extracting indexes from a max pool over uniform data - pytorch

I'm trying to find max points in a 2D tensor for a given kernel size, but I'm having issues with a special case where all the values are uniform. For example, given the following example, I would like to mark each point as a max point:
+---+---+---+---+
| 5 | 5 | 5 | 5 |
+---+---+---+---+
| 5 | 5 | 5 | 5 |
+---+---+---+---+
| 5 | 5 | 5 | 5 |
+---+---+---+---+
| 5 | 5 | 5 | 5 |
+---+---+---+---+
If I run torch.nn.functional.max_pool2d with a kernel size=3, stride=1, and padding=1, I get the following indicies:
+---+---+---+----+
| 0 | 0 | 1 | 2 |
+---+---+---+----+
| 0 | 0 | 1 | 2 |
+---+---+---+----+
| 4 | 4 | 5 | 6 |
+---+---+---+----+
| 8 | 8 | 9 | 10 |
+---+---+---+----+
What changes do I need to account for to instead obtain the following indicies?
+----+----+----+----+
| 1 | 2 | 3 | 4 |
+----+----+----+----+
| 5 | 6 | 7 | 8 |
+----+----+----+----+
| 9 | 10 | 11 | 12 |
+----+----+----+----+
| 13 | 14 | 15 | 16 |
+----+----+----+----+

You can do the following:
a = torch.ones(4,4)
indices = (a == torch.max(a).item()).nonzero()
What this does is return a [16,2] sized tensor with the 2D coordinates of the max value(s), i.e. [0,0], [0,1], .., [3,3]. The torch.max part should be easy to understand, nonzero() considers the boolean tensor given by (a == torch.max(a).item()), takes False to be 0, and returns the non-zero indices. Hope this helps!

If you want indices in 2d shape #ccl have give you answer but for 1d indices you can first make x 1d using torch.flatten tensor and then get indices with torch.nonzero and finally convert into same shape.
x = torch.ones(4,4) * 5
(x.flatten() == x.flatten().max()).nonzero().reshape(x.shape) + 1
tensor([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])

Related

Passing range of number like 0 -100 in an example section of cucumber feature file

I am trying to pass the range of number in index from 0 - 100 or if I have n number then 0 to n. how do I do that ? Could you please help me with the sample code in cucumber / karate ?
Examples:
| index | number | em_number |
| 0 | 1 | 10 |
| 1 | 1 | 10 |
| 2 | 1 | 10 |
| 3 | 1 | 10 |
| 4 | 1 | 10 |
I think you need to spend some time on fundamentals before trying to over-complicate your tests.
That said, karate has a built-in function. Try this:
* def nums = karate.range(5, 9)
* match nums == [5, 6, 7, 8, 9]
And then please read the docs on JSON transforms: https://github.com/karatelabs/karate#json-transforms

How to predicts with conditions and limitations?

The dataframe has the following features:
+--------+--------+--------+------+-------+--------+-----+-------+
| | id | weight | type | value | export | tax | total |
+--------+--------+--------+------+-------+--------+-----+-------+
| 0 | 1 | 4 | 1 | 10 | 1 | 5 | 15 |
+--------+--------+--------+------+-------+--------+-----+-------+
| 1 | 2 | 3 | 1 | 12 | 1 | 6 | 18 |
+--------+--------+--------+------+-------+--------+-----+-------+
| 2 | 3 | 8 | 2 | 15 | 0 | 0 | 15 |
+--------+--------+--------+------+-------+--------+-----+-------+
| ... | ... | ... | ... | ... | | ... | ... |
+--------+--------+--------+------+-------+--------+-----+-------+
| 123004 | 123005 | 5 | 2 | 12 | 0 | 0 | 12 |
+--------+--------+--------+------+-------+--------+-----+-------+
The tax column should be predicted. It is important to consider the relationship between tax and export .
When export == 1 then tax is there.
The following code (Random forest as an example) predicts the tax without considering this rule.
y = df['tax']
X = df.drop(columns=['tax'])
from sklearn.model_selection import train_test_split# Split the data into training and testing sets
train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.3, random_state=42)
rf = RandomForestRegressor(max_depth=10, random_state=101, n_estimators =42)
rf.fit(train_X, train_y);
predictions = rf.predict(test_X)
Questions:
1- How to tell the algorithm to consider the above rule?
2- The tax cannot be more than the value. How is it possible to set limitations or a range for the prediction?
3- If there is other method to predict the same result please mention it. (Random forest is not a must)
4- I am beginner in this field so good ideas for this sample are very welcome.

What is the most efficient way to randomly change values into null values in pyspark?

Trying to figure out how to replace a specific column in Pyspark with null values randomly. So changing a dataframe such as this:
| A | B |
|----|----|
| 1 | 2 |
| 3 | 4 |
| 5 | 6 |
| 7 | 8 |
| 9 | 10 |
| 11 | 12 |
and randomly change 25% of the values in column 'B' to null values:
| A | B |
|----|------|
| 1 | 2 |
| 3 | NULL |
| 5 | 6 |
| 7 | NULL |
| 9 | NULL |
| 11 | 12 |
thanks to #pault I was able to answer my own question using the question he posted that you can find here
Essentially I ran something like this:
import pyspark.sql.functions as f
df1 = df.withColumn('Val', f.when(f.rand() > 0.25, df1['Val']).otherwise(f.lit(None))
Which will randomly select values with the column 'Val' and make it into a None value

Complex conditional aggregation in Pandas

In this table, I want to find the Average number of days between actions per each user.
What I mean here is, I want to group by user_id and then I want to subtract each date directly from the date before it by days per each user. and then find the average number of these days per each user (the average number of No_Action days per each user).
+---------+-----------+----------------------+
| User_ID | Action_ID | Action_At |
+---------+-----------+----------------------+
| 1 | 11 | 2019-01-31T23:00:37Z |
+---------+-----------+----------------------+
| 2 | 12 | 2019-01-31T23:11:12Z |
+---------+-----------+----------------------+
| 3 | 13 | 2019-01-31T23:14:53Z |
+---------+-----------+----------------------+
| 1 | 14 | 2019-02-01T00:00:30Z |
+---------+-----------+----------------------+
| 2 | 15 | 2019-02-01T00:01:03Z |
+---------+-----------+----------------------+
| 3 | 16 | 2019-02-01T00:02:32Z |
+---------+-----------+----------------------+
| 1 | 17 | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 2 | 18 | 2019-02-06T11:30:28Z |
+---------+-----------+----------------------+
| 3 | 19 | 2019-02-07T09:09:16Z |
+---------+-----------+----------------------+
| 1 | 20 | 2019-02-11T15:37:24Z |
+---------+-----------+----------------------+
| 2 | 21 | 2019-02-18T10:02:07Z |
+---------+-----------+----------------------+
| 3 | 22 | 2019-02-26T12:01:31Z |
+---------+-----------+----------------------+
You can do it like this (and next time, please provide the data so that it is easy to help you; it took me much longer to enter the data than to get to the solution):
df = pd.DataFrame({'User_ID': [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
'Action_ID': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
'Action_At': ['2019-01-31T23:00:37Z', '2019-01-31T23:11:12Z', '2019-01-31T23:14:53Z', '2019-02-01T00:00:30Z', '2019-02-01T00:01:03Z', '2019-02-01T00:02:32Z', '2019-02-06T11:30:28Z', '2019-02-06T11:30:28Z', '2019-02-07T09:09:16Z', '2019-02-11T15:37:24Z', '2019-02-18T10:02:07Z', '2019-02-26T12:01:31Z']})
df.Action_At = pd.to_datetime(df.Action_At)
df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).mean())
## User_ID
## 1 3 days 13:32:15.666666
## 2 5 days 19:36:58.333333
## 3 8 days 12:15:32.666666
## dtype: timedelta64[ns]
Or, if you want the solution in days:
df.groupby('User_ID').apply(lambda x: (x.Action_At - x.Action_At.shift()).dt.days.mean())
## User_ID
## 1 3.333333
## 2 5.333333
## 3 8.333333
## dtype: float64

Pandas read_csv randomly skip rows with specific entries

I have a csv file where I want to skip a random percentage of rows but only for rows where one of the columns has a specific entry. For example I might have a csv with contents below and I want to skip a certain percentage of all the apple entries:
| a | b | c | d | e |
|----|----|----|----|--------|
0| 9 | 1 | 2 | 3 | apple |
1| 8 | 4 | 5 | 6 | apple |
2| 7 | 7 | 8 | 9 | apple |
3| 6 | 10 | 11 | 12 | orange |
4| 5 | 13 | 14 | 15 | orange |
5| 4 | 16 | 17 | 18 | orange |
6| 3 | 19 | 20 | 21 | orange |
7| 2 | 22 | 23 | 24 | banana |
8| 1 | 25 | 26 | 27 | banana |
9| 0 | 28 | 29 | 30 | banana |
I know I could skip rows across the entire file with something like
df = pd.read_csv('fruit.csv', skiprows = lambda i: i>0 and random.random() > probability_value)
I know I can also select just the apple entries from the dataframe with
df2 = df.loc[df['e'] == 'apple']
But is there a simple way to select these entries when importing the csv and apply the skip rows so all the non 'apple' entries aren't affected by the skip row?
You can do it as follows, But I would prefer doing it in later stage.
df = pd.read_csv('fruit.csv').query("e != 'apple'")

Resources