Split columns based on a specific column

Split columns based on a specific column - linux

I have a number of files with 2 or more columns, and I need to split the columns that are not the first based on the first column.
Ex:
1 15 90
4 20 89
1 38 129
4 56 150
4 43 171
1 45 210
So, I need, in file1:
1 15 90
1 38 129
1 45 210
And in file 2:
4 20 89
4 56 150
4 43 170
Can anyone help?
Thanks a lot,
Pedro.

awk '{print > ("file" $1)}' file

Related

how do I do a post hoc Tukey on excel after doing a two way ANOVA with replication?

on excel I did a two way ANOVA with replication (is this the same as two way repeated measures ANOVA?) and need to do a post hoc Tukey. How do I do this in excel 2016?
days represent the score on the day the measurement was taken
treatment
day6
day7
day10
day11
1
20
30
500
490
1
2
400
900
500
1
3
32
1000
145
2
67
56
45
89
2
54
67
67
23
2
78
77
68
90
3
32
32
34
99
3
56
58
103
23
3
17
45
115
1043

Choosing the values in the column based on the maximum values of other column

I am choosing the values in Pandas DataFrame.
I would like to choose the values in the columns 'One_T','Two_T','Three_T'(which means the total counts), based on the Ratios of the columns('One_R','Two_R','Three_R').
Comparing values is done by the columns('One_R','Two_R','Three_R') and choosing values will be done by columns ('One_T','Two_T','Three_T').
I would like to find the highest values among columns('One_R','Two_R','Three_R') and put values from columns 'One_T','Two_T','Three_T' in new column 'Highest'.
For example, the first row has the highest values in One_R than Two_R and Three_R.
Then, the values in One_T will be filled the column named Highest.
The initial data frame is test below code and the desired result is the result in the below code.
test = pd.DataFrame([[150,30,140,20,120,19],[170,31,130,30,180,22],[230,45,100,50,140,40],
[140,28,80,10,60,10],[100,25,80,27,50,23]], index=['2019-01-01','2019-02-01','2019-03-01','2019-04-01','2019-05-01'],
columns=['One_T','One_R','Two_T','Two_R','Three_T','Three_R'])
One_T One_R Two_T Two_R Three_T Three_R
2019-01-01 150 30 140 20 120 19
2019-02-01 170 31 130 30 180 22
2019-03-01 230 45 100 50 140 40
2019-04-01 140 28 80 10 60 10
2019-05-01 100 25 80 27 50 23
result = pd.DataFrame([[150,30,140,20,120,19,150],[170,31,130,30,180,22,170],[230,45,100,50,140,40,100],
[140,28,80,10,60,10,140],[100,25,80,27,50,23,80]], index=['2019-01-01','2019-02-01','2019-03-01','2019-04-01','2019-05-01'],
columns=['One_T','One_R','Two_T','Two_R','Three_T','Three_R','Highest'])
One_T One_R Two_T Two_R Three_T Three_R Highest
2019-01-01 150 30 140 20 120 19 150
2019-02-01 170 31 130 30 180 22 170
2019-03-01 230 45 100 50 140 40 100
2019-04-01 140 28 80 10 60 10 140
2019-05-01 100 25 80 27 50 23 80
Is there any way to do this?
Thank you for time and considerations.

You can solve this using df.filter to select columns with the _R suffix, then idxmax. Then replace _R with _T and use df.lookup:
s = test.filter(like='_R').idxmax(1).str.replace('_R','_T')
test['Highest'] = test.lookup(s.index,s)
print(test)
One_T One_R Two_T Two_R Three_T Three_R Highest
2019-01-01 150 30 140 20 120 19 150
2019-02-01 170 31 130 30 180 22 170
2019-03-01 230 45 100 50 140 40 100
2019-04-01 140 28 80 10 60 10 140
2019-05-01 100 25 80 27 50 23 80

How to extract lines from a file when the second columns of a file matches the values in another file

I got two files.
file 1:
4
14
18
45
53
60
64
102
106
158
162
file2:
28 1 2
54 1 2
90 1 1
103 1 1
155 1 17
191 1 1
235 1 1
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
529 57 3
542 53 1
560 58 6
562 164 25
568 164 5
I want to extract the value from file2 if the second column of file two matches the value in file 1.
So the expected output will be:
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
542 53 1
I saw many of the solution online is using python or Perl, however, I want to use linux command to do this, any idea?

This should do it?
awk 'FNR==NR{a[$0]++};FNR!=NR{if($2 in a){print}}' file1 file2
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
542 53 1
Explanation:
we hand awk both files (order is important in this case!).
as long as we read the first file (FNR==NR) we store each value in an array a[$1]++
when we reach the second file we just check if values from the second file's second column ($2) are in the array; if yes, we print them.

Excel - calculate average of values in one column based on another grouping column. The number of rows is not constant per group

Two columns, one with ID and one with values. I want to calculate average per ID. The number of rows per ID is not constant. What i have:
ID Value
1 22
1 31
1 34
1 23
1 31
34 67
34 65
34 55
12 44
12 46
12 43
12 35
I want a formula which will calculate third column:
ID Value Average per id
1 22 28.2
1 31 28.2
1 34 28.2
1 23 28.2
1 31 28.2
34 67 62.3
34 65 62.3
34 55 62.3
12 44 42.0
12 46 42.0
12 43 42.0
12 35 42.0
I have tried AVERAGEIF function but i cant figure it out.

Just use these formulas:
=AVERAGEIF(A:A,A2,B:B)
or
=SUMIF(A:A,A2,B:B)/COUNTIF(A:A,A2)

Finding the median of a range of values selected using vlookup

Column A are dates and B & C are Measurements
Dates Measurements
1 56 15
2 45 25
3 62 76
4 15 42
5 165 56
6 16 79
7 45 46
8 47 79
9 24 47
10 12 14
11 147 47
12 195 19
13 443 79
14 642 43
15 462 75
16 156 87
17 794 49
Start Date:2
Measurement:45
Code used to solve for the measurement
=VLOOKUP(B21,A2:C18,2,FALSE)
end date:14
Measure:642
=VLOOKUP(B22,A2:C18,2,FALSE)
I used vlookup to find me the values that I desire, but now I want to find the median values of that range from the start to end date in each column.
How can I code it so that once it selects the values, it can select the whole range and find the median values?

Since your column A values are ordered ascendingly, we can use the very efficient:
=MEDIAN(INDEX(B2:B18,MATCH(B21,A2:A18)):INDEX(B2:B18,MATCH(B22,A2:A18,0)))
Regards

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split columns based on a specific column - linux

awk '{print > ("file" $1)}' file

Related

how do I do a post hoc Tukey on excel after doing a two way ANOVA with replication?

Choosing the values in the column based on the maximum values of other column

How to extract lines from a file when the second columns of a file matches the values in another file

Excel - calculate average of values in one column based on another grouping column. The number of rows is not constant per group

Finding the median of a range of values selected using vlookup

Categories

Resources