linux grep/sed certain lines - space removal - linux
I have a file.dat I need to sed through. I've tried this code, but it a space between characters and I can't perform any statistics using xsl (e.g. 3.5 looks like 3 . 5). Is there a way to modify the code to remove the spaces?
sed -ne 's#^\(2012Sep[^ ]*\).*\(FD12P.*\)PARS.*#\1,\2#p' file.dat | sed -e 's# *# #g'
Below is an example of the original file:
2012Sep212357 23:56:03.06250, AAA_YMDHMS, 2012, 9, 21, 23, 56, 4, POSS_71, OK , 15.0, 73.2, 0.0, 0.0, C, 0.0, 0.3, 0.6,PS711 78:218:17:41 189.9 205.5 112 1.7 7.3 60 15 51 73.2 0.0 2080,PS712,PS713,PS714,PS715,PS716 F# 51 2.03 7.39 54.98 2.06 0.89 681.1 0.3 11.0 112 14.6 C 376 0.00 0.00 0 0 2 1 112 1.7 189.8 205.4 157.1 192.0 78.5 32.0 928.0 2.0 0.0 -99.0 0.0 10.7 10.7 10000000.0 376 T 4 I 0 ,dia 2 6, 28 22 4,dia 3 5, 26 34 4,dia 4 3 18 17, 3,dia 5 4 25 13 4 1,,dia 6 5 12 7 1,dia 7 6 10 6,dia 8 5 12 4,dia 9 1 3 1,dia 10 2 3 1,dia 11 2 4,dia 12 0 0 1,dia 13 1 1,dia 14 1,dia 15 1 1,dia 16 1 1 1,dia 17 1,dia 18 1 3,dia 22 0 1,dia 23 1,dia 24 1 1,dia 25 1,dia 26 1,dia 29 1,dia 30 1,dia 31 2 2,dia 32 0 3,dia 33 1 2,dia 34 0 1 1,dia 35 1,dia 36 0 3,dia 37 0 2,dia 38 1 2,dia 39 1 2,dia 40 0 2,dia 41 2,dia 42 1 4 1,dia 43 2,dia 44 1 2,dia 45 1 2,dia 47 0 3,dia 49 2,dia 52 2,dia 59 1,dia 60 1,dia 61 1,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia , YANKEE_S/N_0706011, OK , 46.9, 48.2, -1.3, 17.0, 2.5, 1.000, -0.8, 0.0, -0.1, -0.1, -1.5, -3.5, 0.0, 0.0, 0.0, 0.0, CT25K_S/N_A42101, OK ,CT0,20,2,3,00, /////, /////, /////,00000300,100,N,100,25,82,206,-4,7,LF7HN1,6,-2,6,5,5,4,4,4,3,4,3,2,2,3,3,2,2,2,3,2,2,2,3,3,2,2,3,3,3,3,3,3,3,3,4,4,4,3,4,4,5,3,4,5,4,3,5,3,6,4,1,5,3,3,4,5,3,3,0,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, FD12P_S/N , OK , FD, 102, 26578, 16081, C, 0, 0, 0, 0.00, 41.18, 0.0, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 15.20, 4, BEL_RAIN, OK , 0.00, -0.20, -1.40, -1.80, -1.40, -1.40, -1.40, 311.0, , , 0,0.0000000,0.0000000, Snow11_Icing , OK , 33, 0909, 190, 17.03, 11111, FD, WS425_SN_, OK , 208, 210, 5, 5, 11.4, 11.4, 11.7, 5.9, 6.0, 4.9, 196.3, 6.4, 172.0, 3.5, 194.0, 6.4, 172.0, 1.8, 178.0, 4.6, 6.0, 0, 0.0, 1.6, CR3000_SN_, OK , "2012-09-21 20:01:30", "SN_1838", 12.90, 16.47, 17.06, 50.22,1001.1, 0.00, 0.00,2419.6, 432.1, -4.6, 3.1, 188.5, S78D_SN_, OK ,64403.0,64841.0, 10.0, 211.5, 5.4, 206.5, 4.3, 5.4, 209.8, 3.2, 190.3, 5.4, 214.3, 1.6, 191.1, 3.8, 23.2, 0, 0.0, 1.1, PARS_SN , OK , 0.00, 147.7, C, -10.0, 9999, 19.0, 10057, 0, 0.0, 0.0, 0.0, 0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, TB_SN_ , OK , 0.0, 0.0, 0.0, 0.0, GEONOR_SN_, OK , 1.1, -0.1, 0.0, -0.5, MP3000_SN_3031, OK , 13611, 09/21/12 23:52:03, 201, 290.8410, 54.5700, 992.6000, 224.9700, 0.0000, 13616, 09/21/12 23:52:52, 301, 2.006, 0.000, -1.000, 13629, 09/21/12 23:53:40, 301, 2.153, 0.000, -1.000, 13612, 09/21/12 23:52:49, 401, Zenith33, 290.841, 290.619, 290.324, 289.965, 289.594, 289.315, 289.019, 288.657, 288.252, 287.874, 287.473, 286.550, 285.484, 284.489, 283.557, 282.448, 281.549, 280.729, 279.510, 278.444, 277.386, 276.560, 275.714, 275.025, 274.243, 273.660, 272.269, 270.575, 269.615, 268.371, 267.229, 265.628, 264.201, 262.967, 261.651, 259.787, 258.405, 256.936, 255.301, 253.573, 251.715, 249.948, 248.309, 246.431, 244.513, 242.739, 240.876, 238.979, 237.113, 235.236, 233.310, 231.360, 229.586, 227.806, 226.248, 224.444, 222.997, 221.686, 13613, 09/21/12 23:52:50, 402, Zenith33, 8.112, 7.620, 7.356, 7.167, 7.001, 6.926, 6.905, 6.821, 6.749, 6.753, 6.634, 6.261, 5.949, 5.797, 5.612, 5.539, 5.345, 5.309, 5.308, 5.128, 4.854, 4.559, 4.412, 4.302, 4.119, 3.899, 3.280, 2.939, 2.634, 2.426, 2.251, 2.077, 1.768, 1.456, 1.191, 1.016, 0.836, 0.688, 0.572, 0.498, 0.420, 0.342, 0.315, 0.268, 0.247, 0.216, 0.169, 0.151, 0.117, 0.086, 0.079, 0.057, 0.043, 0.034, 0.027, 0.029, 0.025, 0.023, 13614, 09/21/12 23:52:50, 403, Zenith33, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.001, 0.002, 0.003, 0.004, 0.004, 0.005, 0.006, 0.006, 0.006, 0.005, 0.005, 0.006, 0.006, 0.006, 0.005, 0.004, 0.005, 0.007, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.000, 0.001, 0.003, 0.001, 0.001, 0.002, 0.004, 0.001, 0.001, 0.004, 0.002, 0.002, 0.001, 0.000, 0.002, 0.000, 0.000, 0.002, 0.001, 13615, 09/21/12 23:52:51, 404, Zenith33, 56.270, 53.729, 53.026, 52.810, 52.940, 53.694, 54.346, 54.964, 55.335, 56.838, 57.384, 58.904, 60.251, 62.294, 63.354, 65.906, 66.386, 69.111, 73.093, 75.221, 75.959, 76.108, 77.720, 79.306, 80.117, 79.051, 73.398, 70.405, 73.031, 72.626, 69.891, 68.496, 69.016, 64.496, 60.284, 56.164, 50.403, 46.814, 43.598, 40.230, 38.220, 36.595, 37.270, 35.931, 35.872, 34.573, 34.382, 31.848, 30.184, 28.592, 27.367, 26.572, 25.385, 24.167, 22.947, 22.196, 20.102, 17.273, 13617, 09/21/12 23:53:37, 401, Angle Scan32(N), 290.841, 290.341, 290.135, 289.888, 289.601, 289.359, 289.063, 288.816, 288.596, 288.292, 287.907, 286.989, 286.151, 285.159, 283.948, 282.604, 281.604, 280.663, 279.693, 278.789, 278.079, 277.346, 276.519, 275.626, 274.793, 274.033, 272.555, 271.680, 270.758, 269.641, 268.433, 266.921, 265.443, 264.229, 263.036, 261.562, 260.179, 258.611, 257.035, 255.240, 253.437, 251.850, 250.007, 248.111, 246.297, 244.386, 242.365, 240.776, 239.021, 237.268, 235.613, 233.999, 232.464, 230.806, 229.090, 227.562, 226.043, 224.730, 13620, 09/21/12 23:53:38, 402, Angle Scan32(N), 8.160, 7.857, 7.577, 7.363, 7.180, 7.123, 7.047, 7.133, 7.134, 7.075, 7.083, 6.916, 6.715, 6.682, 6.530, 6.323, 5.999, 5.897, 5.845, 5.772, 5.604, 5.312, 5.007, 4.764, 4.590, 4.224, 3.621, 3.104, 2.680, 2.432, 2.241, 2.012, 1.820, 1.490, 1.207, 1.003, 0.830, 0.694, 0.557, 0.489, 0.417, 0.340, 0.290, 0.240, 0.203, 0.165, 0.148, 0.128, 0.105, 0.104, 0.077, 0.066, 0.051, 0.031, 0.026, 0.032, 0.027, 0.023, 13623, 09/21/12 23:53:39, 403, Angle Scan32(N), 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.001, 0.002, 0.003, 0.003, 0.003, 0.004, 0.005, 0.007, 0.006, 0.005, 0.005, 0.008, 0.007, 0.004, 0.003, 0.002, 0.004, 0.003, 0.001, 0.001, 0.001, 0.000, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.001, 0.002, 0.003, 0.000, 13626, 09/21/12 23:53:40, 404, Angle Scan32(N), 57.019, 55.193, 53.564, 53.315, 53.339, 53.281, 53.637, 55.162, 56.061, 56.732, 57.804, 59.747, 61.663, 65.110, 66.650, 67.988, 67.915, 70.569, 73.641, 75.918, 76.393, 76.266, 74.408, 74.010, 74.288, 73.384, 70.577, 66.022, 65.699, 65.979, 65.527, 66.482, 68.845, 60.866, 56.617, 53.351, 47.036, 41.541, 38.110, 35.401, 34.460, 33.999, 33.480, 31.319, 31.345, 31.507, 30.179, 28.661, 26.417, 26.580, 26.578, 27.422, 26.442, 25.451, 24.835, 25.084, 22.074, 20.811, 13618, 09/21/12 23:53:37, 401, Angle Scan32(S), 290.841, 290.626, 290.240, 289.821, 289.469, 289.065, 288.771, 288.341, 288.121, 288.010, 287.614, 286.667, 285.864, 284.613, 283.457, 282.245, 281.322, 280.393, 279.502, 278.463, 277.581, 276.853, 275.805, 274.735, 273.891, 273.092, 271.414, 270.019, 269.101, 267.951, 266.361, 264.487, 262.818, 261.826, 260.810, 259.275, 257.892, 256.162, 254.619, 252.395, 250.385, 248.765, 246.572, 244.208, 241.912, 239.723, 237.677, 235.897, 233.725, 231.819, 229.917, 228.090, 226.747, 225.073, 223.535, 222.356, 221.359, 220.467, 13621, 09/21/12 23:53:38, 402, Angle Scan32(S), 7.814, 6.994, 6.357, 5.945, 5.634, 5.442, 5.438, 5.371, 5.350, 5.230, 5.403, 4.970, 4.875, 4.949, 5.079, 5.089, 4.678, 4.723, 4.734, 4.826, 4.625, 4.731, 5.251, 5.544, 5.911, 6.282, 7.083, 7.490, 7.369, 6.637, 5.359, 3.880, 2.902, 1.757, 1.279, 1.035, 0.725, 0.560, 0.455, 0.361, 0.275, 0.255, 0.190, 0.162, 0.130, 0.122, 0.103, 0.103, 0.080, 0.080, 0.062, 0.061, 0.056, 0.033, 0.026, 0.032, 0.025, 0.020, 13624, 09/21/12 23:53:39, 403, Angle Scan32(S), 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.003, 0.002, 0.003, 0.003, 0.004, 0.006, 0.006, 0.005, 0.005, 0.007, 0.006, 0.003, 0.003, 0.002, 0.004, 0.003, 0.001, 0.001, 0.001, 0.000, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.001, 0.002, 0.003, 0.000, 13627, 09/21/12 23:53:40, 404, Angle Scan32(S), 56.426, 52.621, 49.854, 47.596, 47.538, 48.077, 50.507, 49.331, 49.623, 49.122, 51.626, 50.226, 51.898, 55.446, 57.410, 61.732, 58.361, 60.002, 61.793, 65.138, 68.036, 74.196, 82.373, 85.787, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 85.190, 66.155, 53.728, 37.351, 23.545, 17.741, 14.367, 12.172, 13.330, 10.572, 10.516, 9.346, 8.075, 6.828, 5.445, 5.651, 5.252, 5.502, 6.274, 6.017, 6.092, 6.767, 6.991, 5.608, 4.837, 13619, 09/21/12 23:53:37, 401, Angle Scan32(A), 290.841, 290.480, 290.186, 289.854, 289.535, 289.214, 288.917, 288.581, 288.361, 288.152, 287.764, 286.828, 286.006, 284.883, 283.694, 282.416, 281.453, 280.518, 279.586, 278.614, 277.817, 277.085, 276.148, 275.167, 274.327, 273.550, 271.972, 270.837, 269.920, 268.786, 267.391, 265.698, 264.125, 263.022, 261.916, 260.413, 259.030, 257.379, 255.820, 253.810, 251.899, 250.295, 248.273, 246.140, 244.078, 242.025, 239.986, 238.294, 236.323, 234.485, 232.698, 230.971, 229.531, 227.863, 226.234, 224.886, 223.635, 222.540, 13622, 09/21/12 23:53:38, 402, Angle Scan32(A), 7.987, 7.424, 6.955, 6.634, 6.380, 6.247, 6.209, 6.208, 6.193, 6.097, 6.195, 5.877, 5.729, 5.754, 5.761, 5.682, 5.312, 5.292, 5.284, 5.308, 5.132, 5.058, 5.182, 5.206, 5.294, 5.250, 5.203, 4.958, 4.590, 4.145, 3.543, 2.837, 2.317, 1.620, 1.235, 1.011, 0.768, 0.615, 0.497, 0.416, 0.336, 0.293, 0.233, 0.196, 0.162, 0.141, 0.123, 0.115, 0.092, 0.091, 0.069, 0.064, 0.054, 0.033, 0.026, 0.032, 0.026, 0.021, 13625, 09/21/12 23:53:39, 403, Angle Scan32(A), 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.000, 0.000, 0.002, 0.003, 0.003, 0.003, 0.004, 0.005, 0.007, 0.006, 0.005, 0.005, 0.008, 0.006, 0.003, 0.003, 0.002, 0.004, 0.003, 0.001, 0.001, 0.001, 0.000, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.001, 0.002, 0.003, 0.000, 13628, 09/21/12 23:53:40, 404, Angle Scan32(A), 56.682, 53.972, 51.892, 50.578, 50.563, 50.727, 52.154, 52.163, 52.709, 52.844, 54.714, 54.856, 56.689, 60.229, 62.028, 64.870, 63.262, 65.752, 68.310, 71.277, 73.323, 76.555, 79.852, 81.290, 88.961, 92.860, 100.000, 100.000, 100.000, 100.000, 100.000, 99.377, 92.961, 72.472, 62.030, 54.508, 43.265, 32.818, 27.337, 23.694, 21.524, 22.368, 19.811, 19.031, 18.017, 16.873, 15.321, 13.403, 13.020, 12.580, 12.825, 13.851, 13.301, 13.059, 13.511, 13.757, 11.592, 10.430, 290.841, 54.570, 992.600, 224.970, 0.000, BACIC_, No ,belraw3110.0,belsum -1.8,fdsum 0.0,fdsnow 0.0,yksum -3.5,ykcsum 0.0,S78D- OK ,PARtotal 147.7,ParCnt 0.0,Counter 594.0,Max_ws 6.4,Dir 172.0,Delta_ws 4.6,gust 0.0
2012Sep212358 23:57:03.06250, AAA_YMDHMS, 2012, 9, 21, 23, 57, 4, POSS_71, OK , 15.0, 73.2, 0.0, 0.0, C, 0.0, 0.3, 0.6,PS711 78:218:17:42 390.3 297.2 64 .1 2.1 60 15 49 73.2 0.0 2080,PS712,PS713,PS714,PS715,PS716 F# 49 2.03 2.14 1.03 2.01 1.07 19.2 0.2 12.4 64 14.7 C 376 0.00 0.00 0 0 3 1 64 0.1 390.3 297.2 149.0 176.3 83.8 32.0 944.0 2.3 0.0 -99.0 0.0 10.7 10.7 10000000.0 376 T 1 I 0 ,dia 2 5, 26 20 3,dia 3 4, 26 20 3,dia 4 6 22 18, 1 0 0 1 0 1,dia 5 4 15 20 3 0, 1,dia 6 4 15 8 4,dia 7 8 14 7,dia 8 4 7 6 0 0 1,dia 9 4 9 4,dia 10 3 7,dia 11 3 1 1,dia 12 1 3,dia 14 2 2,dia 15 1 3,dia 16 1 2 1,dia 17 4 2,dia 21 1,dia 22 0 1,dia 23 2 1,dia 25 0 1,dia 26 1,dia 27 1,dia 29 1,dia 30 1 1,dia 31 1,dia 32 2,dia 33 1 1,dia 34 0 2,dia 36 0 1,dia 37 1 2,dia 38 1 1,dia 39 0 2,dia 40 0 0 1,dia 41 3 3 1,dia 42 1,dia 43 2 0 1,dia 45 1 1,dia 47 0 1,dia 48 1 1,dia 55 1,dia 59 1,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia ,dia , YANKEE_S/N_0706011, OK , 55.7, 56.8, -1.2, 17.0, 3.2, 1.000, -0.9, 0.0, 0.0, -0.1, -1.5, -3.5, 0.0, 0.0, 0.0, 0.0, CT25K_S/N_A42101, OK ,CT0,20,2,3,00, /////, /////, /////,00000300,100,N,100,25,82,206,-4,7,LF7HN1,6,-2,6,5,5,4,4,4,3,4,3,2,2,3,3,2,2,2,3,2,2,2,2,3,2,2,3,3,3,3,3,3,3,3,4,4,4,3,4,5,6,2,5,4,5,3,3,4,5,4,2,4,3,2,4,5,3,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,-1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, FD12P_S/N , OK , FD, 102, 20517, 16565, C, 0, 0, 0, 0.00, 41.18, 0.0, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 15.20, 4, BEL_RAIN, OK , 0.00, -0.20, -1.40, -1.80, -1.40, -1.40, -1.40, 311.0, , , 0,0.0000000,0.0000000, Snow11_Icing , OK , 33, 0909, 182, 17.07, 11111, F8, WS425_SN_, OK , 202, 197, 5, 5, 8.5, 8.5, 8.9, 4.4, 4.6, 5.2, 196.6, 6.4, 172.0, 3.6, 198.0, 6.4, 172.0, 2.0, 178.0, 4.4, 6.0, 0, 0.0, 1.2, CR3000_SN_, OK , "2012-09-21 20:02:30", "SN_1838", 12.91, 16.45, 17.06, 49.96,1001.1, 0.00, 0.00,2419.4, 432.0, -4.4, 3.2, 206.6, S78D_SN_, OK ,64628.0,64977.0, 8.0, 211.6, 4.3, 207.1, 4.6, 5.4, 209.8, 3.2, 190.3, 5.4, 214.3, 1.6, 191.1, 3.8, 23.2, 0, 0.0, 0.8, PARS_SN , OK , 0.00, 147.7, C, -10.0, 9999, 19.0, 10059, 0, 0.0, 0.0, 0.0, 0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, TB_SN_ , OK , 0.0, 0.0, 0.0, 0.0, GEONOR_SN_, OK , -0.2, -0.2, 0.0, -0.5, MP3000_SN_3031, OK , 13611, 09/21/12 23:52:03, 201, 290.8410, 54.5700, 992.6000, 224.9700, 0.0000, 13616, 09/21/12 23:52:52, 301, 2.006, 0.000, -1.000, 13629, 09/21/12 23:53:40, 301, 2.153, 0.000, -1.000, 13612, 09/21/12 23:52:49, 401, Zenith33, 290.841, 290.619, 290.324, 289.965, 289.594, 289.315, 289.019, 288.657, 288.252, 287.874, 287.473, 286.550, 285.484, 284.489, 283.557, 282.448, 281.549, 280.729, 279.510, 278.444, 277.386, 276.560, 275.714, 275.025, 274.243, 273.660, 272.269, 270.575, 269.615, 268.371, 267.229, 265.628, 264.201, 262.967, 261.651, 259.787, 258.405, 256.936, 255.301, 253.573, 251.715, 249.948, 248.309, 246.431, 244.513, 242.739, 240.876, 238.979, 237.113, 235.236, 233.310, 231.360, 229.586, 227.806, 226.248, 224.444, 222.997, 221.686, 13613, 09/21/12 23:52:50, 402, Zenith33, 8.112, 7.620, 7.356, 7.167, 7.001, 6.926, 6.905, 6.821, 6.749, 6.753, 6.634, 6.261, 5.949, 5.797, 5.612, 5.539, 5.345, 5.309, 5.308, 5.128, 4.854, 4.559, 4.412, 4.302, 4.119, 3.899, 3.280, 2.939, 2.634, 2.426, 2.251, 2.077, 1.768, 1.456, 1.191, 1.016, 0.836, 0.688, 0.572, 0.498, 0.420, 0.342, 0.315, 0.268, 0.247, 0.216, 0.169, 0.151, 0.117, 0.086, 0.079, 0.057, 0.043, 0.034, 0.027, 0.029, 0.025, 0.023, 13614, 09/21/12 23:52:50, 403, Zenith33, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.001, 0.002, 0.003, 0.004, 0.004, 0.005, 0.006, 0.006, 0.006, 0.005, 0.005, 0.006, 0.006, 0.006, 0.005, 0.004, 0.005, 0.007, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.000, 0.001, 0.003, 0.001, 0.001, 0.002, 0.004, 0.001, 0.001, 0.004, 0.002, 0.002, 0.001, 0.000, 0.002, 0.000, 0.000, 0.002, 0.001, 13615, 09/21/12 23:52:51, 404, Zenith33, 56.270, 53.729, 53.026, 52.810, 52.940, 53.694, 54.346, 54.964, 55.335, 56.838, 57.384, 58.904, 60.251, 62.294, 63.354, 65.906, 66.386, 69.111, 73.093, 75.221, 75.959, 76.108, 77.720, 79.306, 80.117, 79.051, 73.398, 70.405, 73.031, 72.626, 69.891, 68.496, 69.016, 64.496, 60.284, 56.164, 50.403, 46.814, 43.598, 40.230, 38.220, 36.595, 37.270, 35.931, 35.872, 34.573, 34.382, 31.848, 30.184, 28.592, 27.367, 26.572, 25.385, 24.167, 22.947, 22.196, 20.102, 17.273, 13617, 09/21/12 23:53:37, 401, Angle Scan32(N), 290.841, 290.341, 290.135, 289.888, 289.601, 289.359, 289.063, 288.816, 288.596, 288.292, 287.907, 286.989, 286.151, 285.159, 283.948, 282.604, 281.604, 280.663, 279.693, 278.789, 278.079, 277.346, 276.519, 275.626, 274.793, 274.033, 272.555, 271.680, 270.758, 269.641, 268.433, 266.921, 265.443, 264.229, 263.036, 261.562, 260.179, 258.611, 257.035, 255.240, 253.437, 251.850, 250.007, 248.111, 246.297, 244.386, 242.365, 240.776, 239.021, 237.268, 235.613, 233.999, 232.464, 230.806, 229.090, 227.562, 226.043, 224.730, 13620, 09/21/12 23:53:38, 402, Angle Scan32(N), 8.160, 7.857, 7.577, 7.363, 7.180, 7.123, 7.047, 7.133, 7.134, 7.075, 7.083, 6.916, 6.715, 6.682, 6.530, 6.323, 5.999, 5.897, 5.845, 5.772, 5.604, 5.312, 5.007, 4.764, 4.590, 4.224, 3.621, 3.104, 2.680, 2.432, 2.241, 2.012, 1.820, 1.490, 1.207, 1.003, 0.830, 0.694, 0.557, 0.489, 0.417, 0.340, 0.290, 0.240, 0.203, 0.165, 0.148, 0.128, 0.105, 0.104, 0.077, 0.066, 0.051, 0.031, 0.026, 0.032, 0.027, 0.023, 13623, 09/21/12 23:53:39, 403, Angle Scan32(N), 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.001, 0.002, 0.003, 0.003, 0.003, 0.004, 0.005, 0.007, 0.006, 0.005, 0.005, 0.008, 0.007, 0.004, 0.003, 0.002, 0.004, 0.003, 0.001, 0.001, 0.001, 0.000, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.001, 0.002, 0.003, 0.000, 13626, 09/21/12 23:53:40, 404, Angle Scan32(N), 57.019, 55.193, 53.564, 53.315, 53.339, 53.281, 53.637, 55.162, 56.061, 56.732, 57.804, 59.747, 61.663, 65.110, 66.650, 67.988, 67.915, 70.569, 73.641, 75.918, 76.393, 76.266, 74.408, 74.010, 74.288, 73.384, 70.577, 66.022, 65.699, 65.979, 65.527, 66.482, 68.845, 60.866, 56.617, 53.351, 47.036, 41.541, 38.110, 35.401, 34.460, 33.999, 33.480, 31.319, 31.345, 31.507, 30.179, 28.661, 26.417, 26.580, 26.578, 27.422, 26.442, 25.451, 24.835, 25.084, 22.074, 20.811, 13618, 09/21/12 23:53:37, 401, Angle Scan32(S), 290.841, 290.626, 290.240, 289.821, 289.469, 289.065, 288.771, 288.341, 288.121, 288.010, 287.614, 286.667, 285.864, 284.613, 283.457, 282.245, 281.322, 280.393, 279.502, 278.463, 277.581, 276.853, 275.805, 274.735, 273.891, 273.092, 271.414, 270.019, 269.101, 267.951, 266.361, 264.487, 262.818, 261.826, 260.810, 259.275, 257.892, 256.162, 254.619, 252.395, 250.385, 248.765, 246.572, 244.208, 241.912, 239.723, 237.677, 235.897, 233.725, 231.819, 229.917, 228.090, 226.747, 225.073, 223.535, 222.356, 221.359, 220.467, 13621, 09/21/12 23:53:38, 402, Angle Scan32(S), 7.814, 6.994, 6.357, 5.945, 5.634, 5.442, 5.438, 5.371, 5.350, 5.230, 5.403, 4.970, 4.875, 4.949, 5.079, 5.089, 4.678, 4.723, 4.734, 4.826, 4.625, 4.731, 5.251, 5.544, 5.911, 6.282, 7.083, 7.490, 7.369, 6.637, 5.359, 3.880, 2.902, 1.757, 1.279, 1.035, 0.725, 0.560, 0.455, 0.361, 0.275, 0.255, 0.190, 0.162, 0.130, 0.122, 0.103, 0.103, 0.080, 0.080, 0.062, 0.061, 0.056, 0.033, 0.026, 0.032, 0.025, 0.020, 13624, 09/21/12 23:53:39, 403, Angle Scan32(S), 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.003, 0.002, 0.003, 0.003, 0.004, 0.006, 0.006, 0.005, 0.005, 0.007, 0.006, 0.003, 0.003, 0.002, 0.004, 0.003, 0.001, 0.001, 0.001, 0.000, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.001, 0.001, 0.000, 0.001, 0.002, 0.003, 0.000, 13627, 09/21/12 23:53:40, 404, Angle Scan32(S), 56.426, 52.621, 49.854, 47.596, 47.538, 48.077, 50.507, 49.331, 49.623, 49.122, 51.626, 50.226, 51.898, 55.446, 57.410, 61.732, 58.361, 60.002, 61.793, 65.138, 68.036, 74.196, 82.373, 85.787, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 100.000, 85.190, 66.155, 53.728, 37.351, 23.545, 17.741, 14.367, 12.172, 13.330, 10.572, 10.516, 9.346, 8.075, 6.828, 5.445, 5.651, 5.252, 5.502, 6.274, 6.017, 6.092, 6.767, 6.991, 5.608, 4.837, 13619, 09/21/12 23:53:37, 401, Angle Scan32(A), 290.841, 290.480, 290.186, 289.854, 289.535, 289.214, 288.917, 288.581, 288.361, 288.152, 287.764, 286.828, 286.006, 284.883, 283.694, 282.416, 281.453, 280.518,
THIS is the output (diff file June instead of Sept) put only one row as it is too long.
2 0 1 2 J u n 2 0 2 3 5 8 , F D 1 2 P _ S / N , O K , F D , 1 0 2 , 2 4 1 1 6 , 3 2 6 3 0 , C , 0 , 0 , 0 , 0 . 0 0 , 7 2 . 1 9 , 1 9 . 0 , 0 . 0 0 0 , 0 . 0 0 0 , 0 . 0 0 0 , 0 . 0 0 0 , 0 . 0 0 0 , 0 . 0 0 0 , 3 1 . 2 0 , 5 0 1 4 , B E L _ R A I N , O K , - 0 . 7 0 , - 0 . 1 9 , - 0 . 1 9 , - 0 . 0 8 , - 0 . 1 9 , - 0 . 1 9 , - 0 . 1 9 , 1 2 6 . 2 , , , 0 , 0 . 0 0 0 0 0 0 0 , 0 . 0 0 0 0 0 0 0 , S n o w 1 1 _ I c i n g , O K , 3 3 , 0 8 3 0 , 2 6 9 , 3 0 . 5 6 , 1 1 1 1 1 , F A , W S 4 2 5 _ S N _ , O K , 2 3 4 , 2 3 4 , 5 , 5 , 1 5 . 2 , 1 5 . 2 , 1 5 . 4 , 7 . 8 , 7 . 9 , 7 . 7 , 2 3 8 . 0 , 9 . 6 , 2 4 2 . 0 , 6 . 1 , 2 3 7 . 0 , 1 0 . 5 , 2 3 4 . 0 , 4 . 9 , 2 1 3 . 0 , 5 . 6 , 2 1 . 0 , 1 , 1 0 . 5 , 2 . 8 , C R 3 0 0 0 _ S N _ , O K , " 2 0 1 2 - 0 6 - 2 0 2 0 : 0 1 : 3 0 " , " S N _ 1 8 3 8 " , 1 2 . 7 7 , 3 2 . 7 7 , 3 0 . 7 2 , 4 3 . 4 6 , 1 0 0 5 . 2 , 0 . 0 0 , 0 . 0 0 , 1 6 0 2 . 5 , 1 1 7 . 6 , 9 5 . 1 , 5 . 3 , 2 4 9 . 1 , S 7 8 D _ S N _ , O K , 6 4 6 4 6 . 0 , 6 4 0 6 3 . 0 , 1 3 . 0 , 2 3 8 . 9 , 7 . 0 , 2 4 4 . 8 , 7 . 5 , 9 . 1 , 2 4 9 . 5 , 5 . 9 , 2 4 4 . 9 , 9 . 7 , 2 2 2 . 9 , 4 . 3 , 2 3 6 . 2 , 5 . 4 , 1 3 . 4 , 0 , 0 . 0 , 2 . 2 ,
How can I do away with the spaces?
I think your problem lies with an incorrect sed string. As per the following transcript, this command works fine:
pax> cat file.dat
2012Sep212357 23:56:03.06250, AAA_YMDHMS, FD12P_S/N , blah blah PARS_SN , BLAH
pax> sed -ne 's#^\(2012Sep[^ ]*\).*\(FD12P.*\)PARS.*#\1,\2#p' file.dat
...> | sed -e 's# *# #'
2012Sep212357,FD12P_S/N , blah blah
However, if you then pass that through:
sed 's# *# #'
# ^
# +- Note ONE space
you get the spacing you seem to be experiencing:
pax> sed -n 's#^\(2012Sep[^ ]*\).*\(FD12P.*\)PARS.*#\1,\2#p' file.dat
...> | sed 's# *# #g'
2 0 1 2 S e p 2 1 2 3 5 7 , F D 1 2 P _ S / N , b l a h b l a h
You'll notice that my sed string above has one space before the asterisk which means replace any occurrence of zero or more spaces with a single space. Zero or more spaces happens between every single character, which is why the spaces are showing up.
If you use the more correct two-space version:
sed 's# *# #'
# ^^
# ++- Note TWO spaces
it will correctly collapse multiple spaces into one without putting a space between every character.
Related
why do I get: "unsupervised_wiener() got an unexpected keyword argument 'max_num_iter'" when using skimage.restoration.unsupervised_wiener?
i am playing around with scikit image restoration package and successfully ran the unsupervised_wiener algorithm on some made up data. In this simple example it does what I expect, but on my more complicated dataset it returns a striped pattern with extreme values of -1 and 1. I would like to fiddle with the parameters to better understand what is going on, but I get the error as stated in the question. I tried scikit image version 0.19.3 and downgraded to scikit image version 0.19.2, but the error remains. The same goes for the "other parameters":https://scikit-image.org/docs/0.19.x/api/skimage.restoration.html#skimage.restoration.unsupervised_wiener Can someone explain why I can't input parameters? The example below contains a "scan" and a "point-spread-function". I convolve the scan with the point spread function and then reverse the process using the unsupervised wiener deconvolution. import numpy as np import matplotlib.pyplot as plt from skimage import color, data, restoration import pickle rng = np.random.default_rng() from scipy.signal import convolve2d as conv2 scan = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0.5, 1, 0.5, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] ]) print(scan.shape) psf = np.array([ [1, 1, 1, 1, 1],#1 [1, 0, 0, 0, 1],#2 [1, 0, 0, 0, 1],#3 [1, 0, 0, 0, 1],#4 [1, 1, 1, 1, 1]#5 ]) psf = psf/(np.sum(psf)) print(psf) scan_conv = conv2(scan, psf, 'same') deconvolved1, _ = restoration.unsupervised_wiener(scan_conv, psf, max_num_iter=10) fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(8, 5), sharex=True, sharey=True) ax[0].imshow(scan, vmin=scan.min(), vmax=1) ax[0].axis('off') ax[0].set_title('Data') ax[1].imshow(scan_conv) ax[1].axis('off') ax[1].set_title('Data_distorted') ax[2].imshow(deconvolved1) ax[2].axis('off') ax[2].set_title('restoration1') fig.tight_layout() plt.show()
Improve time efficiency in double loops
I have a working code to make some calculations and create a dataframe, however it take a considerable amount of time when the number if id:s considered grows (actually, time increases exponentially). So, here if the situation: I have a dataframe consisting of vectors, one for each id: id vector 0 A4070270297516241 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... 1 A4060461064716279 [0, 2, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 2 A4050500015016271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, ... 3 A4050494283416274 [15, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... 4 A4050500876316279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 5 A4050494111016270 [6, 10, 1, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 6 A4050470673516272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 7 A4060461035616276 [0, 0, 0, 11, 0, 15, 13, 0, 5, 3, 0, 0, 0, 0, ... 8 A4050500809916271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 9 A4050500822216279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, ... 10 A4050494817416277 [0, 0, 0, 0, 0, 4, 9, 0, 5, 8, 0, 15, 0, 0, 8,... 11 A4060462005116279 [15, 12, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... 12 A4050500802316278 [0, 0, 0, 0, 0, 1, 2, 0, 2, 2, 0, 15, 12, 0, 8... 13 A4050500841416272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 5, ... 14 A4050494856516271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 3, ... 15 A4060462230216270 [0, 0, 2, 2, 15, 15, 10, 0, 0, 0, 0, 0, 0, 0, ... 16 A4090150867216273 [0, 0, 0, 0, 0, 0, 0, 13, 6, 3, 0, 2, 0, 15, 4... 17 A4060464010916275 [0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 18 A4139311891213540 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 19 A4050500938416279 [0, 10, 11, 6, 6, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0... 20 A4620451871516274 [0, 0, 0, 0, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 21 A4060460331116279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 5, 15, 0, 2,... I provide a dict at the end of the question to avoid clutter. Now, what I do is that I determine, for each id which other id is the closest by calculating a weighted distance between each vector and creating a dataframe storing the infomation: ids = list(set(df1.id)) Closest_identifier = pd.DataFrame(columns = ['id','Identifier','Closest identifier','distance']) and my code goes like this: import time t = time.process_time() for idnr in ids: df_identifier = df1[df1['id'] == idnr] identifier = df_identifier['vector'].to_list() base_identifier = np.array(df_identifier['vector'].to_numpy().tolist()) Number_of_devices =len(np.nonzero(identifier)[1]) df_other_identifier = df1[df1['id'] != idnr] other_id = list(set(df_other_identifier.id)) for id_other in other_id: gf_identifier = df_other_identifier[df_other_identifier['id']==id_other] identifier_other = np.array(gf_identifier['vector'].to_numpy().tolist()) dist = np.sqrt(np.sum((base_identifier - identifier_other)**2/Number_of_devices)) Closest_identifier = Closest_identifier.append({'id':id_other,'Identifier':base_identifier, 'Closest identifier':identifier_other, 'distance':dist},ignore_index=True) elapsed_time = time.process_time() - t print(elapsed_time) 6.0625 To explain what is happening: In the fisrt part of the code, I choose an id and set up all the infomation I need. The number of devices is the number of non zero values of the vector associated to that id (i.e., the number of devices that detected the object with that id). In the second part I compute the distance of that id to all other objects. So, for each id, I have n-1 rows, where n is the length of my id set. So, for 50 ids, I have 50*50-50 = 2450 rows The time given here is for 50 ids. For 200, the time for the loops to finish is 120 s, for 400 the time is 871 s. As you can see, time grows exponentially here. I have 1700 ids and it'll take days for this to complete. My questions is thus: Is there a more efficient way to do this? Grateful for insights. Test data {'id': {0: 'A4070270297516241', 1: 'A4060461064716279', 2: 'A4050500015016271', 3: 'A4050494283416274', 4: 'A4050500876316279', 5: 'A4050494111016270', 6: 'A4050470673516272', 7: 'A4060461035616276', 8: 'A4050500809916271', 9: 'A4050500822216279', 10: 'A4050494817416277', 11: 'A4060462005116279', 12: 'A4050500802316278', 13: 'A4050500841416272', 14: 'A4050494856516271', 15: 'A4060462230216270', 16: 'A4090150867216273', 17: 'A4060464010916275', 18: 'A4139311891213540', 19: 'A4050500938416279', 20: 'A4620451871516274', 21: 'A4060460331116279', 22: 'A4060454590916277', 23: 'A4060454778016276', 24: 'A4060462019716270', 25: 'A4050500945416277', 26: 'A4050494267716279', 27: 'A4090281644816244', 28: 'A4050500929516270', 29: 'N4010442537213363', 30: 'A4050500938216277', 31: 'A4060454598916275', 32: 'A4050494086216273', 33: 'A4060462859616271', 34: 'A4060454600116271', 35: 'A4050494551816276', 36: 'A4610490015816279', 37: 'A4060454605416279', 38: 'A4060454665916270', 39: 'A4060454579316278', 40: 'A4060464023516275', 41: 'A4050500588616272', 42: 'A4050500905516274', 43: 'A4070262442416243', 44: 'A4050500946716271', 45: 'A4070271195016244', 46: 'A4060454663216271', 47: 'A4060454590416272', 48: 'A4060461993616279', 49: 'N4010442139713366'}, 'vector': {0: [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 1: [0, 2, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 9, 14], 2: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 0, 5, 12, 15, 2, 0, 0, 0, 0, 0, 0], 3: [15, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 5], 4: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 15, 0, 0, 0, 0, 0, 0], 5: [6, 10, 1, 0, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 10, 13, 15], 6: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 15, 7, 2, 0, 0, 0], 7: [0, 0, 0, 11, 0, 15, 13, 0, 5, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 8: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 10, 2, 0, 0], 9: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 15, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0], 10: [0, 0, 0, 0, 0, 4, 9, 0, 5, 8, 0, 15, 0, 0, 8, 2, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0], 11: [15, 12, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 6, 2], 12: [0, 0, 0, 0, 0, 1, 2, 0, 2, 2, 0, 15, 12, 0, 8, 1, 9, 2, 0, 0, 0, 0, 0, 0, 0, 0], 13: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 0, 5, 3, 11, 15, 11, 1, 0, 0, 0, 0, 0, 0], 14: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 3, 0, 7, 12, 14, 1, 0, 0, 0, 0, 0, 0], 15: [0, 0, 2, 2, 15, 15, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 2, 15], 16: [0, 0, 0, 0, 0, 0, 0, 13, 6, 3, 0, 2, 0, 15, 4, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 17: [0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 15, 8, 2], 18: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 12, 9, 2, 0, 0], 19: [0, 10, 11, 6, 6, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4], 20: [0, 0, 0, 0, 15, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 4, 14, 13, 11], 21: [0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 5, 15, 0, 2, 3, 10, 9, 0, 0, 0, 0, 0, 0, 0, 0], 22: [0, 0, 0, 2, 7, 15, 15, 0, 2, 3, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 4], 23: [2, 15, 15, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 11], 24: [0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 15, 14, 2], 25: [0, 9, 13, 15, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4], 26: [0, 0, 0, 1, 2, 8, 15, 0, 1, 4, 0, 15, 1, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0], 27: [0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 0, 0, 5, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 28: [0, 7, 9, 6, 6, 4, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5], 29: [8, 6, 2, 2, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 11], 30: [0, 10, 11, 15, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4], 31: [6, 15, 6, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 8], 32: [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 11, 8, 9, 2], 33: [0, 0, 0, 0, 0, 11, 15, 0, 2, 4, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 34: [4, 15, 15, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 8], 35: [2, 1, 1, 0, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 9, 15, 14, 12], 36: [0, 0, 0, 0, 0, 0, 5, 15, 4, 2, 0, 0, 0, 1, 2, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0], 37: [0, 0, 0, 0, 0, 11, 15, 0, 15, 15, 0, 14, 0, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4], 38: [0, 0, 0, 0, 0, 3, 14, 0, 10, 15, 0, 14, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 39: [0, 0, 2, 0, 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 15, 15, 5], 40: [0, 0, 0, 3, 0, 4, 10, 5, 15, 14, 0, 2, 2, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 41: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 15, 10, 0, 0, 0, 0, 0, 0], 42: [0, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 2, 0, 10, 15, 14, 6, 2, 0, 0, 0, 0, 0, 0, 0, 0], 43: [0, 0, 0, 0, 0, 0, 3, 2, 7, 8, 0, 2, 0, 15, 8, 2, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0], 44: [0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 11, 15, 0, 3, 0, 13, 12, 0, 0, 0, 0, 0, 0, 0, 0], 45: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 11, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], 46: [0, 0, 0, 0, 0, 3, 11, 0, 15, 15, 0, 15, 2, 9, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 47: [0, 2, 3, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 15, 15, 6], 48: [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 11, 7, 9, 3], 49: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 15, 9, 0, 0, 0]}}
Try: # id1: [0, 0, 0, 1, 1, 1] m = np.repeat(np.vstack(df1['vector']), df1.shape[0], axis=0) # id2: [0, 1, 0, 1, 0, 1] n = np.tile(np.vstack(df1['vector']), (df1.shape[0], 1)) # number of devices for each vector of m d = np.count_nonzero(m, axis=1, keepdims=True) # compute the distance dist = np.sqrt(np.sum((m - n)**2/d, axis=-1)) # create the final dataframe mi = pd.MultiIndex.from_product([df1['id']] * 2, names=['id1', 'id2']) out = pd.DataFrame({'vector1': m.tolist(), 'vector2': n.tolist(), 'distance': dist}, index=mi).reset_index() Output: >>> out id1 id2 vector1 vector2 distance 0 A4070270297516241 A4070270297516241 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... 0.000000 1 A4070270297516241 A4060461064716279 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 2, 0, 0, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 13.747727 2 A4070270297516241 A4050500015016271 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, ... 14.628739 3 A4070270297516241 A4050494283416274 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [15, 13, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0... 15.033296 4 A4070270297516241 A4050500876316279 [0, 0, 0, 0, 0, 0, 0, 0, 7, 4, 0, 0, 0, 13, 0,... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 15.099669 ... ... ... ... ... ... 2495 N4010442139713366 A4070271195016244 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 11, 0, 0,... 13.916417 2496 N4010442139713366 A4060454663216271 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 3, 11, 0, 15, 15, 0, 15, 2, 9,... 21.330729 2497 N4010442139713366 A4060454590416272 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 2, 3, 0, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,... 19.304576 2498 N4010442139713366 A4060461993616279 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 12.288206 2499 N4010442139713366 N4010442139713366 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0.000000 [2500 rows x 5 columns] Performance %timeit loop_op() 3.75 s ± 88.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit vect_corralien() 4.16 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
You can calculate the squared distances with broadcasting. After you have that you just have to find num_devices for every row and use it to calculate your custom distance. After filling the diagonal with infinite values you can get the minimum index of every row which gives you the closest device. arr = np.array([value for value in data['vector'].values()]) squared_distances = np.power((arr[:,None,:] - arr[None,:,:]).sum(axis=-1), 2) num_devices = (arr != 0).sum(axis=1) distances = np.sqrt(squared_distances / num_devices ) np.fill_diagonal(distances, np.inf) closest_indentifiers = distances.argmin(axis=0) You can format the output of the program as you desire
Optimising FOR LOOP or alternative to it
I am using a FOR LOOP to calculate a simple probability on a dataset with approximately 500K rows of data. For loop class_ = 4 class_freq = Counter(list_[-1] for list_ in train_list) # Counter({5: 1476, 1: 1531, 4: 1562, 3: 1430, 2: 1498, 7: 1517, 6: 1486}) def cp(x, class_, freq_): # x is column index passed from another function for row in train_list: pos = 0 neg = 0 if row[x] == 1 and row[54] == class_: pos+=1 else: neg+=1 cal_0 = (neg + 0.1) / (class_freq[class_value] + 0.2) cal_1 = (pos + 0.1) / (class_freq[class_value] + 0.2) if prob_1 > prob_0: return prob_1 else: return prob_0 Train_list sample [3050, 180, 4, 277, -3, 5782, 221, 242, 156, 2721, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1] [2818, 119, 19, 30, 10, 5213, 248, 220, 92, 4497, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2] [3182, 115, 10, 553, 10, 4605, 237, 231, 124, 1768, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5] [3024, 312, 18, 474, 177, 5785, 169, 224, 194, 4961, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2] [3067, 32, 4, 30, -2, 6679, 219, 230, 147, 2947, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4] [2716, 1, 10, 234, 27, 2100, 206, 222, 153, 5581, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4] ... The FOR LOOP works well on small dataset (few hundred rows) as expected. Unfortunately, when I try to use it on 20K rows of data, the processing time take ages. I cannot imagine how long it will take to run 500K rows of data. FOR LOOP is grossly bad in performance for large dataset. What is an alternative to this? Will Lambda improve processing speed? I appreciate advice and assistance here, thanks. Edited: Thanks to everyone comments, I have tried to work on another algorithm to replace the FOR LOOP. def cp(x, class_, class_): filtered_list = [t for t in train_list if t[54] == class_] count_binary = Counter(binary[col] for binary in filtered_list) binary_1 = count_binary[1] binary_0 = count_binary[0] cal_0 = (binary_0 + 0.1) / (class_freq[class_value] + 0.2) cal_1 = (binary_1 + 0.1) / (class_freq[class_value] + 0.2) if prob_1 > prob_0: return prob_1 else: return prob_0 I am still running the above code in my program and the process is not done yet - so can't tell if it is much efficient. I will appreciate if someone can provide their opinion on this new block of code. FYI, if this is indeed a better and more efficient code, then the issue of processing speed is most likely on other parts of my code.
Variational autoencoder - comparative analysis of code
Could you comment two version of variational autoencoder loss and show me why they give me different results? Dataset: data1 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype='int32') data2 = np.array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype='int32') data3 = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype='int32') 100 samples each, so I have 300 samples. Code 1: def vae_loss(x, x_decoded_mean): xent_loss = objectives.binary_crossentropy(x, x_decoded_mean) kl_loss = -0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)) loss = xent_loss + kl_loss return loss vae.compile(optimizer='rmsprop', loss=vae_loss) Code 2: def zero_loss(y_true, y_pred): return K.zeros_like(y_pred) class CustomVariationalLayer(Layer): def __init__(self, **kwargs): self.is_placeholder = True super(CustomVariationalLayer, self).__init__(**kwargs) def vae_loss(self, x, x_decoded_mean): xent_loss = original_dim * metrics.binary_crossentropy(x, x_decoded_mean) K.exp(z_log_var), axis=-1) return K.mean(xent_loss + kl_loss) def call(self, inputs): x = inputs[0] x_decoded_mean = inputs[1] loss = self.vae_loss(x, x_decoded_mean) self.add_loss(loss, inputs=inputs) return K.ones_like(x) loss_layer = CustomVariationalLayer()([x, x_decoded_mean]) vae = Model(x, [loss_layer]) vae.compile(optimizer='rmsprop', loss=[zero_loss]) Results are so different and I don't see where? Latent dimension are different. Code 2 shows the separation between groups and code 1 not. code 1, vae.predict... is not accurate and code 2 give me 1 on all features. Code 2 gives me accurate feedback of the code: sent_encoded = encoder.predict(np.array(test), batch_size = batch_size) sent_decoded = generator.predict(sent_encoded) and code 1 is not accurate at all. Both experiments have the same layers. So, once again, where is the different and what is the best solution for dataset like described above?
Keras CNN predict error
I have a simple Convolution1D model, that I have trained successfully model = Sequential() model.add(Embedding(input_dim=vocabsize, output_dim=32, input_length=STR_MAX_LEN, dropout=0.2)) model.add(Dropout(0.2)) model.add(Convolution1D(64, 5, activation='relu', border_mode='same')) model.add(Dropout(0.2)) model.add(MaxPooling1D()) model.add(Flatten()) model.add(Dense(100, activation='relu')) model.add(Dropout(0.7)) model.add(Dense(1, activation='sigmoid')) model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=['accuracy']) model.summary() Model Summary as below ____________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ==================================================================================================== embedding_1 (Embedding) (None, 500, 32) 160000 embedding_input_1[0][0] ____________________________________________________________________________________________________ dropout_1 (Dropout) (None, 500, 32) 0 embedding_1[0][0] ____________________________________________________________________________________________________ convolution1d_1 (Convolution1D) (None, 500, 64) 10304 dropout_1[0][0] ____________________________________________________________________________________________________ dropout_2 (Dropout) (None, 500, 64) 0 convolution1d_1[0][0] ____________________________________________________________________________________________________ maxpooling1d_1 (MaxPooling1D) (None, 250, 64) 0 dropout_2[0][0] ____________________________________________________________________________________________________ flatten_1 (Flatten) (None, 16000) 0 maxpooling1d_1[0][0] ____________________________________________________________________________________________________ dense_1 (Dense) (None, 100) 1600100 flatten_1[0][0] ____________________________________________________________________________________________________ dropout_3 (Dropout) (None, 100) 0 dense_1[0][0] ____________________________________________________________________________________________________ dense_2 (Dense) (None, 1) 101 dropout_3[0][0] ==================================================================================================== Total params: 1770505 ____________________________________________________________________________________________________ And I have a text that I need to run prediction on. text = "dont know what could have saved limp dispiriting yam but it definitely wasnt a lukewarm mushroom as murky and appealing as bong water" textWordsArray = np.array(text.split()) textIdxArrayPadded = sequence.pad_sequences(textWordsIdxArray,maxlen=STR_MAX_LEN, value=0) textIdxArrayPadded structure of the text input array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5363, 121, 48, 97, 25, 1891, 8849, 51645, 19831, 18, 9, 404, 15422, 3, 15610, 27479, 14, 7217, 2, 2273, 14, 36597, 1090]], dtype=int32) However, I am getting the below error when i run the prediction. prediction = model.predict(textIdxArrayPadded, batch_size=1,verbose=1) --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-70-818365da75ca> in <module>() ----> 1 prediction = model.predict(textIdxArrayPadded, batch_size=1,verbose=1) /home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/models.pyc in predict(self, x, batch_size, verbose) 669 if self.model is None: 670 self.build() --> 671 return self.model.predict(x, batch_size=batch_size, verbose=verbose) 672 673 def predict_on_batch(self, x): /home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in predict(self, x, batch_size, verbose) 1177 f = self.predict_function 1178 return self._predict_loop(f, ins, -> 1179 batch_size=batch_size, verbose=verbose) 1180 1181 def train_on_batch(self, x, y, /home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/engine/training.pyc in _predict_loop(self, f, ins, batch_size, verbose) 876 ins_batch = slice_X(ins, batch_ids) 877 --> 878 batch_outs = f(ins_batch) 879 if type(batch_outs) != list: 880 batch_outs = [batch_outs] /home/ubuntu/anaconda2/lib/python2.7/site-packages/keras/backend/theano_backend.pyc in __call__(self, inputs) 715 def __call__(self, inputs): 716 assert type(inputs) in {list, tuple} --> 717 return self.function(*inputs) 718 719 /home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs) 869 node=self.fn.nodes[self.fn.position_of_error], 870 thunk=thunk, --> 871 storage_map=getattr(self.fn, 'storage_map', None)) 872 else: 873 # old-style linkers raise their own exceptions /home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/gof/link.pyc in raise_with_op(node, thunk, exc_info, storage_map) 312 # extra long error message in that case. 313 pass --> 314 reraise(exc_type, exc_value, exc_trace) 315 316 /home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/compile/function_module.pyc in __call__(self, *args, **kwargs) 857 t0_fn = time.time() 858 try: --> 859 outputs = self.fn() 860 except Exception: 861 if hasattr(self.fn, 'position_of_error'): IndexError: One of the index value is out of bound. Error code: 65535.\n Apply node that caused the error: GpuAdvancedSubtensor1(GpuElemwise{Composite{Switch(i0, (i1 * i2 * i3), i2)},no_inplace}.0, Elemwise{Cast{int64}}.0) Toposort index: 38 Inputs types: [CudaNdarrayType(float32, matrix), TensorType(int64, vector)] Inputs shapes: [(5000, 32), (500,)] Inputs strides: [(32, 1), (8,)] Inputs values: ['not shown', 'not shown'] Outputs clients: [[GpuReshape{3}(GpuAdvancedSubtensor1.0, MakeVector{dtype='int64'}.0)]] HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'. HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
I had the embeddings limited to the vocabsize, however I forgot to limit the word id to vocabsize This was answered for me in a different forum , posting the solution here from the author. #niazangels Niyas Mohammed Looks like you forgot to limit the vocabulary to 5000 when encoding your test input! Limit the vocabulary size to 5000 textWordsIdxArray = [np.array([i if i < vocabsize -1 else vocabsize -1 for i in s]) for s in textWordsIdxArray]