I have a dataset similar to below:
37 151 36 34 40 56 59 42 28 38 60
1 11 0 0 2 2 3 4 0 0 4
35 158 35 37 40 56 58 48 31 40 72
1 2 1 1 0 0 0 0 1 0 0
32 132 32 30 36 57 53 35 25 34 54
8 36 4 8 8 7 13 13 3 6 14
40 162 36 38 41 66 64 46 27 35 60
0 2 0 0 1 0 0 0 1 1 2
32 151 31 34 41 58 66 45 33 40 66
0 5 3 2 1 2 0 1 1 4 6
I want to transform the even rows to even columns like this:
37 1 151 11 36 0 34 0 40 2 56 2 59 3 42 4 28 0 38 0 60 4
35 1 158 2 35 1 37 1 40 0 56 0 58 0 48 0 31 1 40 0 72 0
32 8 132 36 32 4 30 8 36 8 57 7 53 13 35 13 25 3 34 6 54 14
40 0 162 2 36 0 38 0 41 1 66 0 64 0 46 0 27 1 35 1 60 2
32 0 151 5 31 3 34 2 41 1 58 2 66 0 45 1 33 1 40 4 66 6
Basically, I would like to take the even rows and transform them as even columns while retaining the odd rows and odd columns the way they are.
You can do it in a straight-forward manner with awk by saving each field of the the odd number rows in an array and then printing each array element in between the elements of each even number rows, e.g.
awk '
FNR%2{for(i=1;i<=NF;i++)a[i]=$i; next}
{for(i=1;i<=NF;i++) printf " %3d %3d", a[i],$i; print ""}
' file
Where the modulo of the record number (line) FNR is taken to determine odd/even and where odd, the fields are stored in the a[] array and then the next line (even) is printed where each field is printed after the corresponding field in a[].
Example Use/Output
You can adjust the output spacing as wanted. With your data in file you would get:
$ awk '
> FNR%2{for(i=1;i<=NF;i++)a[i]=$i; next}
> {for(i=1;i<=NF;i++) printf " %3d %3d", a[i],$i; print ""}
> ' file
37 1 151 11 36 0 34 0 40 2 56 2 59 3 42 4 28 0 38 0 60 4
35 1 158 2 35 1 37 1 40 0 56 0 58 0 48 0 31 1 40 0 72 0
32 8 132 36 32 4 30 8 36 8 57 7 53 13 35 13 25 3 34 6 54 14
40 0 162 2 36 0 38 0 41 1 66 0 64 0 46 0 27 1 35 1 60 2
32 0 151 5 31 3 34 2 41 1 58 2 66 0 45 1 33 1 40 4 66 6
Without Even Output Spacing:
$ awk '
FNR%2{for(i=1;i<=NF;i++)a[i]=$i; next}
{for(i=1;i<=NF;i++) printf " %d %d", a[i],$i; print ""}
' file
37 1 151 11 36 0 34 0 40 2 56 2 59 3 42 4 28 0 38 0 60 4
35 1 158 2 35 1 37 1 40 0 56 0 58 0 48 0 31 1 40 0 72 0
32 8 132 36 32 4 30 8 36 8 57 7 53 13 35 13 25 3 34 6 54 14
40 0 162 2 36 0 38 0 41 1 66 0 64 0 46 0 27 1 35 1 60 2
32 0 151 5 31 3 34 2 41 1 58 2 66 0 45 1 33 1 40 4 66 6
Related
Here is my data and the pic after plotting using Gnuplot.
I used this command to do the pic. "plot "aaa.txt" with linespoints pt 6 ps 2 "
I just wondering how can I get the pic I want.
Request:
give different symbols to the points in the first and second rows of each block.
give different names to the points in the first and second rows and display them in the upper right corner
60 18
59 20
60 18
67 25
60 18
77 11
60 18
67 10
60 18
53 18
60 18
83 6
60 18
80 16
60 18
71 14
60 18
68 1
60 18
76 3
92 95
97 94
92 95
95 98
92 95
98 76
92 95
96 96
92 95
80 97
10 71
33 80
10 71
39 95
10 71
26 78
10 71
14 61
10 71
29 80
10 71
39 98
10 71
0 46
10 71
12 71
10 71
10 96
10 71
18 66
10 71
20 89
10 71
30 97
10 71
39 84
10 71
1 45
10 71
4 62
10 71
21 65
34 40
25 53
34 40
31 60
34 40
5 31
34 40
14 31
34 40
40 52
34 40
14 32
34 40
41 30
34 40
41 53
The pic I already got:
The pic I want
How can I achieve this?
something like this is a good start:
plot 'data' w l lc rgb "black" notitle, '' every ::::0 w p pt 7 lc rgb "black" ps 2 title "data a", '' every ::1 w p pt 2 lc rgb "black" ps 2 title "data b"
which gives:
I need to Convert one large column on windows excel 2016 of numbers into multiple columns of 10 rows each.
I am currently doing this manually. Please help me Stackoverflow! :)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
The results should be this:
1 11
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
10 20
and so on....
Try:
Formula in C1:
=INDEX($A:$A,ROW()+((COLUMN(A1)-1)*10))
Drag right and 10 down.
I'm currently stuck trying to get the Hodrick-Prescott trend from different groups within a monthly dataset. Here's a replica of the dataset:
import pandas as pd
import numpy as np
import statsmodels.api as sm
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)),
columns=list('abcd'))
df['date'] = pd.date_range(start='2018-01-01',
periods=100, freq='M')
df['id'] = ['Group 1', 'Group 2', 'Group 3', 'Group 4'] * 25
df.rename({'a': 'target'}, axis=1, inplace=True)
final_df = df.groupby('id',
group_keys=False).apply(
lambda x: x.sort_values('date'))
The Dataset looks like this:
target b c d date id
0 28 45 17 46 2018-01-31 Group 1
4 58 23 34 76 2018-05-31 Group 1
8 30 98 91 79 2018-09-30 Group 1
12 15 23 25 96 2019-01-31 Group 1
16 67 45 41 38 2019-05-31 Group 1
20 28 40 36 38 2019-09-30 Group 1
24 8 95 28 86 2020-01-31 Group 1
28 14 53 58 75 2020-05-31 Group 1
32 46 3 26 61 2020-09-30 Group 1
36 50 71 80 34 2021-01-31 Group 1
40 78 38 97 75 2021-05-31 Group 1
44 15 74 83 25 2021-09-30 Group 1
48 27 43 18 84 2022-01-31 Group 1
52 84 38 11 24 2022-05-31 Group 1
56 23 29 81 22 2022-09-30 Group 1
60 87 56 92 65 2023-01-31 Group 1
64 24 99 55 86 2023-05-31 Group 1
68 16 68 36 63 2023-09-30 Group 1
72 43 29 80 44 2024-01-31 Group 1
76 0 48 35 49 2024-05-31 Group 1
80 17 50 51 51 2024-09-30 Group 1
84 17 16 40 87 2025-01-31 Group 1
88 98 13 70 27 2025-05-31 Group 1
92 21 30 96 87 2025-09-30 Group 1
96 19 35 32 47 2026-01-31 Group 1
1 21 45 34 61 2018-02-28 Group 2
5 35 15 95 11 2018-06-30 Group 2
9 3 31 94 25 2018-10-31 Group 2
13 65 89 1 7 2019-02-28 Group 2
17 77 41 12 58 2019-06-30 Group 2
... ... ... ... ... ... ...
82 32 99 54 27 2024-11-30 Group 3
86 67 5 71 44 2025-03-31 Group 3
90 79 94 34 53 2025-07-31 Group 3
94 4 60 37 85 2025-11-30 Group 3
98 20 16 32 97 2026-03-31 Group 3
3 70 63 94 98 2018-04-30 Group 4
7 2 13 14 5 2018-08-31 Group 4
11 49 44 20 27 2018-12-31 Group 4
15 11 60 39 10 2019-04-30 Group 4
19 22 96 48 5 2019-08-31 Group 4
23 23 22 30 8 2019-12-31 Group 4
27 39 11 58 89 2020-04-30 Group 4
31 61 72 68 78 2020-08-31 Group 4
35 29 20 7 30 2020-12-31 Group 4
39 53 20 32 98 2021-04-30 Group 4
43 97 31 60 74 2021-08-31 Group 4
47 46 65 15 93 2021-12-31 Group 4
51 31 24 5 75 2022-04-30 Group 4
55 42 59 87 68 2022-08-31 Group 4
59 75 50 62 60 2022-12-31 Group 4
63 5 24 15 83 2023-04-30 Group 4
67 77 12 81 44 2023-08-31 Group 4
71 74 15 11 90 2023-12-31 Group 4
75 34 0 19 81 2024-04-30 Group 4
79 2 26 36 98 2024-08-31 Group 4
83 45 66 9 23 2024-12-31 Group 4
87 74 67 35 98 2025-04-30 Group 4
91 69 78 46 7 2025-08-31 Group 4
95 66 77 91 41 2025-12-31 Group 4
99 66 11 96 91 2026-04-30 Group 4
Here's my current approach:
groups = final_df.groupby('id')
group_keys = list(groups.groups.keys())
bs = pd.DataFrame()
for key in group_keys:
g = groups.get_group(key)
target = g['target']
cycle, trend = sm.tsa.filters.hpfilter(target, lamb=129600)
g['hp_trend'] = trend
bs.append(g)
My goal is to simply generate the trend from HP-Filter for each group and append it to that group as a column such that each group will have its own trend based on the target field specified.
Currently, the bs dataframe only returns the empty dataframe that it started with. How can I get the result that I need?
Thanks for reading.
groups = final_df.groupby('id')
group_keys = list(groups.groups.keys())
bs = pd.DataFrame()
for key in group_keys:
g = groups.get_group(key).copy()
target = g['target']
cycle, trend = sm.tsa.filters.hpfilter(target, lamb=129600)
g['hp_trend'] = trend
bs = bs.append(g)
bs
How can I numerically count only lines that have words in them? In the example below, I have four lines with words in them:
100314:Status name one: 15
24 1 7 5 43 13 24 64 10 47 31 100 22 20 38 63 49 24 18 82 66 22 21 77 52 8 6 11 50 20 5 1 0
101245:Status name two: 14
2 10 2 2 25 53 3 31 30 1 21 41 9 14 18 40 6 10 18 72 20 16 33 29 19 18 12 60 48 12 8 50 43 13
103765:Yet another name here: 29
45 29 29 475 63 69 47 94 65 65 69 55 53 905 117 57 42 92 90 59 91 52 79 101 192 87 144 74 115 82 78 109 12 96 64 78 111 106 84 19 0 7
102983:Blah blah yada yada: 82
41 37 40 60 82 72 17 41 17 19 43 3
I've tried using different pipe combinations of wc -l and grep/uniq. I also tried counting only the odd lines (which works in the MWE above), but I'm looking for something more general-purpose for a large unstructured dataset.
It depends on how you define a word. If, for example, it's any two consecutive letters, you can just use something like:
grep -E '[a-zA-z]{2}' fileName | wc -l
You can simply adjust the regular expression depending on how you define a word (that one I've provided won't pick up "A" or "I" or "I'm" for example), but the concept will remain the same
I am trying to remove this :
2012-04-04 07:51:04 (2012-04-04 11:51:04.399000000Z): subject=PROD.
from the following
2012-04-04 07:51:04 (2012-04-04 11:51:04.399000000Z): subject=PROD.sdmp.o.t.0.0.0.0.NewOrderExecutionOE.?.4366.0.2.3.1.TNP.FIDESSA.IBAPPBAL504.EQUITY, message={PL=[10 3 49 46 51 18 3 84 78 80 26 7 70 73 68 69 83 83 65 34 37 79 69 45 50 48 49 50 48 52 48 52 45 48 48 48 48 52 54 55 55 52 48 50 84 82 83 70 49 46 49 46 49 45 48 48 50 49 42 19 78 101 119 79 114 100 101 114 69 120 101 99 117 116 105 111 110 79 69 50 128 8 10 189 4 8 240 46 16 0 24 0 48 0 56 0 64 0 72 2 80 2 88 2 96 0 104 216 29 112 0 120 174 11 128 1 2 136 1 136 165 215 226 231 38 144 1 223 3 152 1 1 160 1 246 181 215 226 231 38 168 1 223 3 176 1 0 194 1 7 70 73 68 69 83 83 65 200 1 1 210 1 37 79 69 45 50 48 49 50 48 52 48 52 45 48 48 48 48 52 54 55 55 52 48 50 84 82 83 70 49 46 49 46 49 45 48 48 50 49 218 1 78 83 85 70 73 32 83 83 32 49 57 48 48 32 83 84 68 32 64 32 55 46 50 55 32 85 83 68 32 65 67 67 79 85 78 84 32 67 83 87 69 45 73 78 86 45 78 77 77 32 67 79 78 84 82 65 32 65 82 67 65 32 83 70 71 45 67 79 82 82 69 83 80 79 78 68 69 78 84 224 1 203 85 242 1 22 67 65 82 77 69 76 79 46 82 85 66 65 78 79 64 83 85 70 73 46 85 83 248 1 30 128 2 213 11 146 2 4 83 85 70 73 152 2 5 168 2 5 176 2 1 186 2 7 16 11 26 3 85 83 68 192 2 142 34 202 2 7 16 3 26 3 83 84 68 146 3 7 70 73 68 69 83 83 65 162 3 46 79 114 100 101 114 73 110 115 116 114 117 99 116 105 111 110 45 50 48 49 50 48 52 48 52 45 48 48 48 48 48 48 48 48 48 50 49 78 79 83 70 49 45 49 45 49 170 3 7 70 73 68 69 83 83 65 186 3 24 79 114 100 101 114 45 50 48 49 50 48 52 48 52 45 50 49 78 79 83 70 49 45 49 192 3 14 202 3 15 65 114 99 97 69 120 69 120 101 99 86 101 110 117 101 208 3 8 218 3 8 78 89 83 69 65 114 99 97 232 3 213 11 250 3 4 83 85 70 73 128 4 5 136 4 203 85 154 4 22 67 65 82 77 69 76 79 46 82 85 66 65 78 79 64 83 85 70 73 46 85 83 160 4 30 168 4 213 11 186 4 4 83 85 70 73 192 4 5 208 4 210 13 226 4 7 85 78 75 78 79 87 78 240 4 85 130 5 4 65 82 67 65 136 5 5 144 5 85 162 5 4 65 82 67 65 168 5 5 184 5 137 244 17 194 5 12 67 83 87 69 45 73 78 86 45 78 77 77 202 5 4 73 66 82 75 208 5 214 188 12 218 5 17 83 70 71 45 67 79 82 82 69 83 80 79 78 68 69 78 84 226 5 4 73 66 82 75 216 6 0 226 6 2 16 0 128 7 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 58 11 73 66 65 80 80 66 65 76 53 48 52 64 207 247 148 134 237 137 184 231 8] ET[1]=2012-04-04 11:51:01Z}
but im not sure how to use awk or cut to do this, no matter what I do I cant get it right.
I know you have to use $i in awk, but I cant get the split right to get what I want.
any help would be appreciated.
ok, here is how I tried using cut:
cut -d "." -f3- data.log > dataCut.log
This here worked for most of the file, but at some point it cut off the sdmp., which is something I need.
If you have GNU grep then simply:
grep -Po 'PROD[.]\K.*' file
howabout a simple cut:
cut -c68- file
outputs:
sdmp.o.t.0.0.0.0.NewOrderExecutionOE.?.[cut for brevity]
same approach using sed
sed 's/^.\{67\}//' file
using colrm
$ colrm 1 67 < file
perl:
perl -pe 's/^.{67}//' file
Finally awk (can't beat #kent for brevity here)
awk 'BEGIN{FIELDWIDTHS="67 9999"}{print$2}' file
if there is only one PROD., you don't have to split with period ".", you could:
awk -F'PROD.' '$0=$2' input
or
awk -F'PROD\\.' '$0=$2' input
however I guess your PROD.foo.bar could be dynamic, e.g. TEST.foo.bar or DEV.foo.bar..., in this case, you could take the subject=
awk -F'subject=[A-Z]+\\.' '$0=$2' input
you coud adjust [A-Z] if there could be other possibilities. grep with PCRE supported (e.g. gnu grep) could work too.
One way would be to use sub.
awk '{sub(/^[0-9][-0-9:. :)(Z]* subject=PROD./, ""); print;}'
This basically matches everything from "start of line" up to "subject=PROD." and replaces it with an empty string.
While you asked for awk sed may well be THE tool for the job:
sed -r 's/^.+ subject=PROD\.(.*)/\1/' file