Flattening event data in big query with one query - statistics

We have over 100m rows in big query of analytics data. Each record is an event attached to an id.
A simplification:
ID EventId Timestamp
Is it possible to flatten this to one table holding rows like:
ID timestamp-period event1 event2 event3 event4
Where the event columns hold the counts of the number of events for that id in that time period?
So far, i've managed to do it on small data sets with 2 queries. One to create rows that hold counts for an individual event id and another to flatten these in to one row after. The reason I haven't yet been able to do this accross the whole data set is that bigquery runs out of resources - not entirely sure why.
These two queries look something like this:
SELECT
VideoId,
date_1,
IF(EventId = 1, INTEGER(count), 0) AS user_play,
IF(EventId = 2, INTEGER(count), 0) AS auto_play,
IF(EventId = 3, INTEGER(count), 0) AS pause,
IF(EventId = 4, INTEGER(count), 0) AS replay,
IF(EventId = 5, INTEGER(count), 0) AS stop,
IF(EventId = 6, INTEGER(count), 0) AS seek,
IF(EventId = 7, INTEGER(count), 0) AS resume,
IF(EventId = 11, INTEGER(count), 0) AS progress_25,
IF(EventId = 12, INTEGER(count), 0) AS progress_50,
IF(EventId = 13, INTEGER(count), 0) AS progress_75,
IF(EventId = 14, INTEGER(count), 0) AS progress_90,
IF(EventId = 15, INTEGER(count), 0) AS data_loaded,
IF(EventId = 16, INTEGER(count), 0) AS playback_complete,
IF(EventId = 30, INTEGER(count), 0) AS object_click,
IF(EventId = 31, INTEGER(count), 0) AS object_rollover,
IF(EventId = 32, INTEGER(count), 0) AS object_clickthrough,
IF(EventId = 33, INTEGER(count), 0) AS object_shown,
IF(EventId = 34, INTEGER(count), 0) AS object_close,
IF(EventId = 40, INTEGER(count), 0) AS logo_clickthrough,
IF(EventId = 41, INTEGER(count), 0) AS endframe_clickthrough,
IF(EventId = 42, INTEGER(count), 0) AS startframe_clickthrough,
IF(EventId = 61, INTEGER(count), 0) AS share_facebook,
IF(EventId = 62, INTEGER(count), 0) AS share_twitter,
IF(EventId = 63, INTEGER(count), 0) AS open_social_panel,
IF(EventId = 70, INTEGER(count), 0) AS embed_code_requested,
IF(EventId = 80, INTEGER(count), 0) AS player_impression,
IF(EventId = 81, INTEGER(count), 0) AS player_loaded,
IF(EventId = 90, INTEGER(count), 0) AS html5_impression,
IF(EventId = 91, INTEGER(count), 0) AS html5_load,
IF(EventId = 95, INTEGER(count), 0) AS fallback_impression,
IF(EventId = 96, INTEGER(count), 0) AS fallback_load,
IF(EventId = 152, INTEGER(count), 0) AS object_impression,
IF(EventId = 200, INTEGER(count), 0) AS ping,
IF(EventId = 250, INTEGER(count), 0) AS facebook_clickthrough,
IF(EventId = 251, INTEGER(count), 0) AS twitter_clickthrough,
IF(EventId = 252, INTEGER(count), 0) AS other_clickthrough,
IF(EventId = 253, INTEGER(count), 0) AS qr_clickthrough,
IF(EventId = 254, INTEGER(count), 0) AS banner_clickthrough,
IF(EventId = 280, INTEGER(count), 0) AS banner_impression,
IF(EventId = 281, INTEGER(count), 0) AS banner_loaded,
IF(EventId = 282, INTEGER(count), 0) AS banner_data_loaded,
IF(EventId = 284, INTEGER(count), 0) AS banner_forward,
IF(EventId = 285, INTEGER(count), 0) AS banner_back,
IF(EventId = 300, INTEGER(count), 0) AS mobile_preview_loaded,
IF(EventId = 301, INTEGER(count), 0) AS mobile_preview_clickthrough,
IF(EventId = 302, INTEGER(count), 0) AS mobile_preview_clickthrough_back,
IF(EventId = 310, INTEGER(count), 0) AS product_search_click,
IF(EventId = 311, INTEGER(count), 0) AS promo_code_click,
IF(EventId = 320, INTEGER(count), 0) AS player_share_facebook,
IF(EventId = 321, INTEGER(count), 0) AS player_share_twitter,
IF(EventId = 322, INTEGER(count), 0) AS player_share_googleplus,
IF(EventId = 323, INTEGER(count), 0) AS player_share_email,
IF(EventId = 324, INTEGER(count), 0) AS player_share_embed,
IF(EventId = 401, INTEGER(count), 0) AS youtube_error_2,
IF(EventId = 402, INTEGER(count), 0) AS youtube_error_100,
IF(EventId = 403, INTEGER(count), 0) AS youtube_error_101,
FROM
(
SELECT
VideoId, EventId, count(*) as count, Date(timestamp) as date_1
FROM [data.data_1]
GROUP EACH BY VideoId, EventId, date_1
)
ORDER BY data_loaded DESC;
Then just a group by on id and timestamp creates the full aggregated table.
Am I doing this the right way, and do I just need to do it on a small partition of the dataset or is there a better way to aggregate like this that will use bigquery in a more efficient way?
Thanks in advance,
Mat

My guess is that you're running out of resources because of the ORDER BY at the end. Everything else should be able to be done in parallel. Also note that if you remove the order by, you will be able to use the 'allow large results' flag and write out a large table of the results (if the results are > 128MB).

Related

How to slice elements between other elements with python

i have list with numbers and i want to slice all the elements between numbers 192 that exist on the list and pass them to a list
my list
[192, 0, 1, 0, 1, 192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108, 192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]
i want someting like this
[192, 0, 1, 0, 1 ]
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108]
[192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155]
until the end of the list.
Here's one possible way to do it:
# input list
lst = [192, 0, 1, 0, 1, 192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108, 192, 20, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154, 192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]
# list of indexes where 192 is found,
# plus one extra index for the final slice
indexes = [i for i, n in enumerate(lst) if n == 192] + [len(lst)]
# create the slices between consecutive indexes
[lst[indexes[i]:indexes[i+1]] for i in range(len(indexes) - 1)]
The result will be:
[[192, 0, 1, 0, 1],
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108],
[192, 20],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]]
You can build a generator with itertools.groupby that uses 192's equality method as a key function, pair the output of the generator with zip and then use itertools.chain.from_iterable to join the pairs (the example below assumes your list is stored in variable l):
from itertools import groupby, chain
i = (list(g) for _, g in groupby(l, key=(192).__eq__))
[list(chain.from_iterable(p)) for p in zip(i, i)]
This returns:
[[192, 0, 1, 0, 1],
[192, 12, 0, 5, 0, 1, 0, 1, 66, 218, 0, 10, 5, 115, 116, 97, 116, 115, 1, 108],
[192, 20],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 155],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 156],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 154],
[192, 53, 0, 1, 0, 1, 0, 0, 0, 162, 0, 4, 74, 125, 133, 157]]

Per-pixel coloring of the image in Python

I'am need a coloring image 256 on 256 by values from the matrix.
Example:
I have a matrix [[-64, -64, -64], [-8, 10, 8], [10, 50, 22]], and colors for this values, -64 = (0, 0, 0, 255) -8 = (25, 25, 25, 255) 10 = (45, 255, 255, 255) etc.
How can I quickly fill the output_matrix with color tuppels and form image through Image.fromarray(color_matrix.astype('uint8'), 'RGBA')?
Function
def draw_tile(matrix: np.matrix, type: str, img_size: tuple) -> Image:
"""
Draws an image with the given data as the image size and the size of its values for x and y.
:param matrix: numpy matrix with all measurement data
:param type: descriptor defining how to draw the input data
:param img_size: image size
:return: image
"""
def doppler_var_dict(var: float):
return {
var <= -30: 13,
-30 < var <= -25: 0,
-25 < var <= -20: 1,
-20 < var <= -15: 2,
-15 < var <= -10: 3,
-10 < var <= -5: 4,
-5 < var <= 0: 5,
0 < var <= 1: 6,
1 < var <= 5: 7,
5 < var <= 10: 8,
10 < var <= 15: 9,
15 < var <= 20: 10,
20 < var <= 25: 11,
25 < var <= 30: 12,
var >= 30: 14,
}[1]
def rainfall_var_dict(var: float):
return {
var <= 0.2: 0,
0.2 < var <= 0.5: 1,
0.5 < var <= 1.5: 2,
1.5 < var <= 2.5: 3,
2.5 < var <= 4: 4,
4 < var <= 6: 5,
5 < var <= 10: 6,
10 < var <= 15: 7,
15 < var <= 20: 8,
20 < var <= 35: 9,
35 < var <= 50: 10,
50 < var <= 80: 11,
80 < var <= 120: 12,
120 < var <= 200: 13,
200 < var <= 300: 14,
var >= 300: 1
}[1]
def reflectivity_var_dict(var: float):
return {
var <= -4: 0,
-4 < var <= -3.5: 1,
-3.5 < var <= -3: 2,
-3 < var <= -2.5: 3,
-2.5 < var <= -.5: 4,
-.5 < var <= -0: 5,
-0 < var <= .25: 6,
.25 < var <= 0.5: 7,
.5 < var <= 1: 8,
1 < var <= 1.25: 9,
1.25 < var <= 1.5: 10,
1.5 < var <= 2: 11,
2 < var <= 2.5: 12,
2.5 < var <= 3: 13,
3 < var <= 3.5: 14,
3.5 < var <= 4: 15,
4 < var <= 5: 16,
5 < var <= 6: 17,
var >= 6: 18
}[1]
doppler_color = [(0, 0, 0, 0), (55, 255, 195, 150), (0, 250, 255, 150), (0, 195, 255, 150),
(0, 100, 255, 150), (0, 0, 255, 150), (140, 140, 140, 150), (150, 0, 0, 150), (255, 0, 0, 150),
(255, 85, 0, 150), (255, 165, 0, 150), (255, 165, 80, 150), (255, 230, 130, 150),
(65, 65, 65, 150),
(255, 255, 0, 150)]
rainfall_color = [(0, 0, 0, 0), (200, 200, 200, 150), (180, 180, 255, 150), (120, 120, 255, 150),
(20, 20, 255, 150), (0, 216, 195, 150), (0, 150, 144, 150), (0, 102, 102, 150),
(255, 255, 0, 150),
(255, 200, 0, 150), (255, 150, 0, 150), (255, 100, 0, 150), (255, 0, 0, 150), (200, 0, 0, 150),
(120, 0, 0, 150), (40, 0, 0, 150)]
z_d_r_color = [(90, 0, 150, 150), (115, 0, 255, 150), (213, 0, 255, 150), (255, 0, 0, 150), (176, 0, 0, 150),
(255, 85, 0, 150), (255, 220, 0, 150), (119, 255, 0, 150), (0, 255, 255, 150), (0, 255, 162, 150),
(0, 162, 255, 150), (0, 0, 255, 150), (255, 0, 77, 150), (50, 2, 163, 150), (173, 173, 173, 150),
(145, 145, 145, 150), (120, 120, 120, 150), (92, 92, 92, 150), (60, 60, 60, 150), (0, 0, 0, 0)]
z_d_r_color = list(reversed(z_d_r_color))
dict_dicts = {
DOPPLER_RADIAL: doppler_var_dict,
RADAR_RAINFALL: rainfall_var_dict,
HORIZONTAL: reflectivity_var_dict,
DIFFERENTIAL: reflectivity_var_dict
}
color_map_dict = {
DOPPLER_RADIAL: doppler_color,
RADAR_RAINFALL: rainfall_color,
HORIZONTAL: z_d_r_color,
DIFFERENTIAL: z_d_r_color
}
var_dict = dict_dicts[type]
try:
color_shem = color_map_dict[type]
except KeyError as ee:
return HttpResponseServerError('Error in bufr_image: wrong data type')
with timer.Profiler('color_matrix time'):
color_matrix = np.array(list(map(lambda var: color_shem[var_dict(var)], matrix.reshape(-1))))
color_matrix = color_matrix.reshape((img_size[0], img_size[1], 4))
img = Image.fromarray(color_matrix.astype('uint8'), 'RGBA')
return img
Can you suggest an algorithm that will work in less than a second for the matrix 256 by 256?

How to convert a Uint8ClampedArray array into a png image in node.js?

I am using this library to read png images, but in general
https://www.npmjs.com/package/pngjs2
if I have a Uint8ClampedArray, a width and height of the array, how can I convert that and save it as a png image in node.js?
Thanks
You can use the same library to create a png image if the dimensions are known and the data is in the form of a Uint8ClampedArray. Example:
var fs = require('fs'),
PNG = require('pngjs2').PNG;
var img_width = 16;
var img_height = 16;
var img_data = Uint8ClampedArray.from([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 134, 133, 110, 6, 97, 137, 82, 249, 97, 142, 79, 255, 93, 142, 74, 255, 90, 140, 71, 255, 90, 142, 70, 255, 79, 129, 60, 250, 115, 134, 92, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 133, 152, 125, 15, 111, 151, 96, 255, 223, 255, 209, 255, 174, 253, 148, 255, 158, 249, 126, 255, 141, 249, 103, 255, 71, 145, 43, 255, 68, 143, 42, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 137, 158, 131, 20, 111, 153, 96, 255, 216, 255, 201, 255, 172, 247, 145, 255, 156, 244, 124, 255, 139, 242, 102, 255, 72, 145, 44, 255, 75, 144, 47, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 137, 158, 131, 25, 110, 154, 94, 255, 196, 252, 178, 255, 157, 242, 125, 255, 144, 239, 110, 255, 129, 237, 91, 255, 70, 145, 42, 255, 70, 142, 43, 25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 132, 153, 128, 30, 107, 155, 90, 255, 177, 245, 151, 255, 134, 233, 100, 255, 125, 230, 87, 255, 114, 229, 73, 255, 69, 146, 41, 255, 66, 140, 40, 30, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 126, 154, 120, 55, 103, 155, 83, 255, 154, 236, 125, 255, 111, 223, 71, 255, 109, 222, 69, 255, 109, 225, 69, 255, 69, 146, 40, 255, 63, 133, 41, 44, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 116, 142, 107, 82, 100, 154, 79, 255, 145, 229, 114, 255, 103, 218, 62, 255, 105, 218, 65, 255, 106, 220, 66, 255, 69, 145, 39, 255, 67, 125, 49, 82, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 71, 119, 56, 249, 126, 178, 106, 255, 122, 174, 104, 255, 128, 194, 105, 255, 140, 226, 109, 255, 105, 215, 65, 255, 103, 214, 63, 255, 104, 215, 63, 255, 84, 167, 53, 255, 78, 139, 54, 255, 78, 142, 54, 255, 71, 127, 50, 250, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 31, 91, 8, 240, 63, 157, 29, 255, 134, 222, 103, 255, 153, 229, 124, 255, 166, 233, 140, 255, 110, 213, 73, 255, 100, 210, 61, 255, 100, 210, 61, 255, 125, 221, 91, 255, 124, 221, 89, 255, 78, 179, 43, 255, 54, 122, 29, 240, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 27, 78, 7, 42, 39, 106, 14, 253, 60, 156, 26, 255, 120, 210, 88, 255, 127, 217, 96, 255, 119, 214, 85, 255, 96, 207, 56, 255, 98, 209, 59, 255, 95, 204, 56, 255, 70, 166, 37, 255, 57, 131, 30, 253, 48, 108, 24, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 29, 87, 9, 42, 42, 107, 18, 249, 63, 160, 28, 255, 107, 201, 73, 255, 121, 212, 88, 255, 108, 210, 72, 255, 87, 194, 50, 255, 62, 153, 30, 255, 50, 118, 25, 249, 43, 103, 23, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 32, 93, 13, 38, 46, 109, 22, 241, 68, 166, 32, 255, 99, 197, 63, 255, 89, 185, 55, 255, 54, 141, 24, 255, 44, 108, 20, 241, 36, 93, 19, 38, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 97, 12, 26, 47, 110, 25, 230, 68, 162, 37, 255, 46, 127, 17, 255, 39, 98, 16, 230, 33, 89, 13, 26, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 30, 93, 21, 16, 50, 112, 26, 223, 41, 101, 19, 225, 20, 72, 0, 13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
var img_png = new PNG({width: img_width, height: img_height})
img_png.data = Buffer.from(img_data);
img_png.pack().pipe(fs.createWriteStream('tick.png'))

Write variables in PLC from EXCEL

I have an ActiveX communication driver for TCP / IP that allows me to read and write to a PLC from an Excel file. I want to write the value 5 MW having another 5 cells in Excel, I tried it with loop and I wrote in the 5 variables at a time only one value. Now I have a "Select Case" it does not work me. Please help. The code is as follows:
For tt = 1 To 5
valor(1) = Val(Cells(1, 4).Value)
valor(2) = Val(Cells(2, 4).Value)
valor(3) = Val(Cells(3, 4).Value)
valor(4) = Val(Cells(4, 4).Value)
valor(5) = Val(Cells(5, 4).Value)
Next tt
Select Case valor(tt)
Case valor(1)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 5, 1, 1000, 300, valor)
Case valor(2)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 6, 1, 1000, 300, valor)
Case valor(3)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 7, 1, 1000, 300, valor)
Case valor(4)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 8, 1, 1000, 300, valor)
Case valor(5)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 9, 1, 1000, 300, valor)
End Select
The line:
MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 9, 1, 1000, 300, valor)
Are the parameters of communication with the PLC to Settings. The last parameter is the value we need to write, which I gather from a cell.
The code is VBA in Excel 2002.
THANKS!!!
try this
For tt = 1 To 5
valor(1) = Val(Cells(1, 4).Value)
valor(2) = Val(Cells(2, 4).Value)
valor(3) = Val(Cells(3, 4).Value)
valor(4) = Val(Cells(4, 4).Value)
valor(5) = Val(Cells(5, 4).Value)
Select Case valor(tt)
Case valor(1)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 5, 1, 1000, 300, valor)
Case valor(2)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 6, 1, 1000, 300, valor)
Case valor(3)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 7, 1, 1000, 300, valor)
Case valor(4)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 8, 1, 1000, 300, valor)
Case valor(5)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 9, 1, 1000, 300, valor)
End Select
Next tt
As cboden pointed out the select-statement is outside of the for-loop. Instead of implementing a select-statement you should do it like this (read the rest of this post for an explanation):
For tt = 1 To 5
valor = Val(Cells(tt, 4).Value)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 4+tt, 1, 1000, 300, valor)
Next tt
Explanation:
Keep in mind that the for-loop will run all code inside the loop once for each value of tt (1,2,3,4,5), which means that this:
For tt = 1 To 5
valor(1) = Val(Cells(1, 4).Value)
valor(2) = Val(Cells(2, 4).Value)
valor(3) = Val(Cells(3, 4).Value)
valor(4) = Val(Cells(4, 4).Value)
valor(5) = Val(Cells(5, 4).Value)
Next tt
Can (and should) be written like this:
For tt = 1 To 5
valor(tt) = Val(Cells(tt, 4).Value)
Next tt
The select-statement is not only wrong but also unecessary. The correct version (if you choose to put it in the for-loop, which I do not recommend since it is redundant and makes the code harder to read) is:
Select Case tt
Case 1
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 5, 1, 1000, 300, valor(1))
Case 2
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 6, 1, 1000, 300, valor(2))
Case 3
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 7, 1, 1000, 300, valor(3))
Case 4
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 8, 1, 1000, 300, valor(4))
Case 5
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 9, 1, 1000, 300, valor(5))
End Select
The above can be shortened to:
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 4+tt, 1, 1000, 300, valor(tt))
for each value of tt (1,2,3,4,5) and if put into the for-loop together with valor(tt) = Val(Cells(tt, 4).Value) will form:
For tt = 1 To 5
valor(tt) = Val(Cells(tt, 4).Value)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 4+tt, 1, 1000, 300, valor(tt))
Next tt
Since we do not need to keep the array of valor we can remove the valor(tt) and replace it with valor, thus making the finished code:
For tt = 1 To 5
valor = Val(Cells(tt, 4).Value)
res2 = MB1.Write("10.56.35.214", "10.56.35.22", 502, 0, 16, 4+tt, 1, 1000, 300, valor)
Next tt
Also I would suggest that you, in the future, refrain from posting IP-addresses in such a public forum.

OS X GCD multi-thread concurrency uses more CPU but executes slower than single thread

I have a method which does a series of calculations which take quite a bit of time to complete. The objects that this method does computations on are generated at runtime and can range from a few to a few thousand. Obviously it would be better if I could run these computations across several threads concurrently, but when I try that, my program uses more CPU yet takes longer than running them one-by-one. Any ideas why?
let itemsPerThread = (dataArray.count / 4) + 1
for var i = 0; i < dataArray.count; i += itemsPerThread
{
let name = "ComputationQueue\(i)".bridgeToObjectiveC().cString()
let compQueue = dispatch_queue_create(name, DISPATCH_QUEUE_CONCURRENT)
dispatch_async(compQueue,
{
let itemCount = i + itemsPerThread < dataArray.count ? itemsPerThread : dataArray.count - i - 1
let subArray = dataArray.bridgeToObjectiveC().subarrayWithRange(NSMakeRange(i, dataCount)) as MyItem[]
self.reallyLongComputation(subArray, increment: increment, outputIndex: self.runningThreads-1)
})
NSThread.sleepForTimeInterval(1)
}
Alternatively:
If I run this same thing, but a single dispatch_async call and on the whole dataArray rather than the subarrays, it completes much faster while using less CPU.
what you (it is my guess) want to do should looks like
//
// main.swift
// test
//
// Created by user3441734 on 12/11/15.
// Copyright © 2015 user3441734. All rights reserved.
//
import Foundation
let computationGroup = dispatch_group_create()
var arr: Array<Int> = []
for i in 0..<48 {
arr.append(i)
}
print("arr \(arr)")
func job(inout arr: Array<Int>, workers: Int) {
let count = arr.count
let chunk = count / workers
guard chunk * workers == count else {
print("array.cout divided by workers must by integer !!!")
return
}
let compQueue = dispatch_queue_create("test", DISPATCH_QUEUE_CONCURRENT)
let syncQueue = dispatch_queue_create("aupdate", DISPATCH_QUEUE_SERIAL)
for var i = 0; i < count; i += chunk
{
let j = i
var tarr = arr[j..<j+chunk]
dispatch_group_enter(computationGroup)
dispatch_async(compQueue) { () -> Void in
for k in j..<j+chunk {
// long time computation
var z = 100000000
repeat {
z--
} while z > 0
// update with chunk
tarr[k] = j
}
dispatch_async(syncQueue, { () -> Void in
for k in j..<j+chunk {
arr[k] = tarr[k]
}
dispatch_group_leave(computationGroup)
})
}
}
dispatch_group_wait(computationGroup, DISPATCH_TIME_FOREVER)
}
var stamp: Double {
return NSDate.timeIntervalSinceReferenceDate()
}
print("running on dual core ...\n")
var start = stamp
job(&arr, workers: 1)
print("job done by 1 worker in \(stamp-start) seconds")
print("arr \(arr)\n")
start = stamp
job(&arr, workers: 2)
print("job done by 2 workers in \(stamp-start) seconds")
print("arr \(arr)\n")
start = stamp
job(&arr, workers: 4)
print("job done by 4 workers in \(stamp-start) seconds")
print("arr \(arr)\n")
start = stamp
job(&arr, workers: 6)
print("job done by 6 workers in \(stamp-start) seconds")
print("arr \(arr)\n")
with results
arr [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]
running on dual core ...
job done by 1 worker in 5.16312199831009 seconds
arr [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
job done by 2 workers in 2.49235796928406 seconds
arr [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24]
job done by 4 workers in 3.18479603528976 seconds
arr [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36]
job done by 6 workers in 2.51704299449921 seconds
arr [0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, 24, 24, 24, 24, 24, 24, 24, 24, 32, 32, 32, 32, 32, 32, 32, 32, 40, 40, 40, 40, 40, 40, 40, 40]
Program ended with exit code: 0
... you can use next pattern for distributing job between any number of workers (the number of workers which give you the best performance depends on worker definition and sources which are available in your environment). generally for any kind of long time calculation ( transformation ) you can expect some performance gain. in two core environment up to 50%. if your worker use highly optimized functions using more cores 'by default', the performance gain can be close to nothing :-)
// generic implementation
// 1) job distribute data between workers as fair, as possible
// 2) workers do their task in parallel
// 3) the order in resulting array reflect the input array
// 4) there is no requiremets of worker block, to return
// the same type as result of yor 'calculation'
func job<T,U>(arr: [T], workers: Int, worker: T->U)->[U] {
guard workers > 0 else { return [U]() }
var res: Dictionary<Int,[U]> = [:]
let workersQueue = dispatch_queue_create("workers", DISPATCH_QUEUE_CONCURRENT)
let syncQueue = dispatch_queue_create("sync", DISPATCH_QUEUE_SERIAL)
let group = dispatch_group_create()
var j = min(workers, arr.count)
var i = (0, 0, arr.count)
var chunk: ArraySlice<T> = []
repeat {
let a = (i.1, i.1 + i.2 / j, i.2 - i.2 / j)
i = a
chunk = arr[i.0..<i.1]
dispatch_group_async(group, workersQueue) { [i, chunk] in
let arrs = chunk.map{ worker($0) }
dispatch_sync(syncQueue) {[i,arrs] in
res[i.0] = arrs
}
}
j--
} while j != 0
dispatch_group_wait(group, DISPATCH_TIME_FOREVER)
let idx = res.keys.sort()
var results = [U]()
idx.forEach { (idx) -> () in
results.appendContentsOf(res[idx]!)
}
return results
}
You need to
Get rid of the 1 second sleep. This is artifically reducing the degree to which you get parallel execution, because you're waiting before starting the next thread. You are starting 4 threads - and you are therefore artifically delaying the start (and potentially the finish) of the final thread by 3 seconds.
Use a single concurrent queue, not one per dispatch block. A concurrent queue will start blocks in the order in which they are dispatched, but does not wait for one block to finish before starting the next one - i.e. it will run blocks in parallel.
NSArray is a thread-safe class. I presume that it uses a multiple-reader/single-writer lock internally, which means there is probably no advantage to be obtained from creating a set of subarrays. You are, however, incurring the overhead of creating the subArray
Multiple threads running on different cores cannot talk to the same cache line at the same time.Typical cache line size is 64 bytes, which seems unlikely to cause a problem here.

Resources