Tensorflow Multi-threaded QueueRunner - multithreading

I have a graph:
- preprocessing_op1 -> op2 -> img
/ \
slice_input_producer([imgs, labels]) tf.train.batch(num_threads=n)
\- - - - - - - label - - - - - - -/
which is a typical Data I/O pipeline.
Problem: The multiple threads for tf.train.batch() have a race condition.
e.g. Thread1 fetches sample1_img and sample2_label because Thread2 already took sample1_label, making a pair (sample1_img, sample2_label). I guess this is because slice_input_producer has 2 separate queues for imgs and labels and two queues work independently.
Q1. Does each of n enqueueing threads run its own replica of the subgraph? If yes, setting num_threads=n requires n times more memory for the corresponding subgraph in runtime? If no, do threads run the different part of the subgraph for one enqueue op?
Q2 (solved). If I create a FIFOQueue and enqueue a tuple of (img, label), the pair will be atomically dequeued and multi-threading will actually help. Is this correct? (although it's not 100% utilization because label tensor waits for the preprocessing of img tensor)
Q3 (solved). Is there a function like tuple_input_producer() which takes a list of tensors and internally use only one queue?
Update (Q2,Q3)
I was wrong about slice_input_tensor.
The problem only happens with two queues, not with slice_input_producer.
So just use slice_input_producer, and if two tensors need to go into different queues, I can use a single-threaded bottleneck (QueueRunner) to bundle them together.
Example code (0.11):
import tensorflow as tf
import numpy as np
a = tf.train.string_input_producer(map(str,range(100)), shuffle=False).dequeue()
b = tf.train.string_input_producer(map(str,range(100)), shuffle=False).dequeue()
op1 = tf.identity(a)
op2 = tf.identity(op1)
c1, c2 = tf.train.batch([op2,b], num_threads=10, batch_size=10)
with tf.Session() as sess, tf.device('/gpu:0'):
sess.run([tf.initialize_all_variables()])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess, coord)
for i in range(10):
d1, d2 = sess.run([c1,c2])
print d1, d2
coord.request_stop()
coord.join(threads)
Result (see the first line):
['0' '2' '1' '7' '4' '3' '6' '8' '9' '5'] ['0' '2' '1' '6' '5' '4' '7' '8' '9' '3']
['10' '11' '12' '13' '14' '15' '16' '17' '18' '19'] ['10' '11' '12' '13' '14' '15' '16' '17' '18' '19']
['20' '21' '22' '23' '24' '25' '26' '27' '28' '29'] ['20' '21' '22' '23' '24' '25' '26' '27' '28' '29']
['30' '31' '33' '32' '34' '35' '36' '37' '38' '39'] ['30' '31' '33' '32' '34' '35' '36' '37' '38' '39']
['40' '41' '42' '43' '44' '45' '46' '47' '48' '49'] ['40' '41' '42' '43' '44' '45' '46' '47' '48' '49']
['50' '51' '52' '53' '54' '55' '56' '57' '58' '59'] ['50' '51' '52' '53' '54' '55' '56' '57' '58' '59']
['60' '61' '62' '63' '64' '65' '66' '67' '68' '69'] ['60' '61' '62' '63' '64' '65' '66' '67' '68' '69']
['70' '71' '72' '73' '74' '75' '76' '77' '78' '79'] ['70' '71' '72' '73' '74' '75' '76' '77' '78' '79']
['80' '81' '82' '83' '84' '85' '86' '87' '88' '89'] ['80' '81' '82' '83' '84' '85' '86' '88' '89' '87']
['90' '91' '92' '93' '94' '95' '96' '97' '98' '99'] ['90' '91' '92' '93' '94' '95' '96' '97' '98' '99']

You have parallel run calls, called "steps", and each step maintains their own copy of tensors produced during execution. So in the worst case you need N times more memory for N parallel steps. In practice it tends to be better than times N because memory is released as soon as tensor is not needed anymore. Stateful objects like queues and variables are shared across steps.
What's happening in your case is the following scenario:
step1: dequeue queue1
step2: dequeue queue1
step2: dequeue queue2
step1: dequeue queue2
You can see that queues get out of sync for both steps. Two ways to avoid it:
Don't issue parallel run calls (num_threads=1)
Combine two queues into a single queue with images/labels, and dequeue from that queue in parallel.
In the second example you would have a single dequeue op return a pair of image/label, and things should stay synchronized because dequeues are atomic

I've tried your Update's example code with tf=0.12.1
when the iteration changes from range(10) to range(100), the mismatch also happend.
If I modified the tf.train.batch parameter with num_threads=1, it solved.
I've not tried Combine two queues into a single queue with images/labels, and dequeue from that queue in parallel. yet.

Related

How to apply recursion over this problem and solve this problem

The Problem is:-
Given a digit string, return all possible letter combinations of each digits according to the buttons on a telephone, that the number could represent.
The returned strings must be lexicographically sorted.
Example-1 :-
Input : “23”
Output : ["ad", "ae", "af", "bd", "be", "bf", "cd", "ce", "cf"]
Example-2 :-
Input : “9”
Output: [“w”, “x”, “y”, “z”]
Example-3 :-
Input : “246”
Output : ["agm", "agn", "ago", "ahm", ..., "cho", "cim", "cin" "cio"] {27 elements}
I've squeezed my brain on this, and I've tried a lot but I'm not getting ahead of this part, what I've tried is to use a recursive function that zips the individual letters of each digit with each other letters and use itertools.combinations() over it, but I'm unable to complete this function and I'm unable to get ahead of this.
What I've tried is :-
times, str_res = 0, ""
def getval(lst, times):
if times==len(lst)-1:
for i in lst[times]:
yield i
else:
for i in lst[times]:
yield i + getval(lst, times+1)
dct = {"2":("a","b","c"), "3":("d","e","f"), "4":("g","h","i"),
"5":("j","k","l"), "6":("m","n","o"), "7":("p","q","r","s"),
"8":("t","u","v"), "9":("w","x","y","z"), "1":("")}
str1, res = "23", []
if len(str1)==1:
print(dct[str1[0]])
else:
temp = [dct[i] for i in str1]
str_res = getval(temp, times)
print(str_res)
Please suggest me your ideas over this problem or in completing the function...
It's not itertools.combinations that you need, it's itertools.product.
from itertools import product
def all_letter_comb(s, dct):
for p in product(*map(dct.get, s)):
yield ''.join(p)
dct = {"2":("a","b","c"), "3":("d","e","f"), "4":("g","h","i"),
"5":("j","k","l"), "6":("m","n","o"), "7":("p","q","r","s"),
"8":("t","u","v"), "9":("w","x","y","z"), "1":("")}
for s in ['23', '9', '246']:
print(s)
print(list(all_letter_comb(s, dct)))
print()
Output:
23
['ad', 'ae', 'af', 'bd', 'be', 'bf', 'cd', 'ce', 'cf']
9
['w', 'x', 'y', 'z']
246
['agm', 'agn', 'ago', 'ahm', 'ahn', 'aho', 'aim', 'ain', 'aio', 'bgm', 'bgn', 'bgo', 'bhm', 'bhn', 'bho', 'bim', 'bin', 'bio', 'cgm', 'cgn', 'cgo', 'chm', 'chn', 'cho', 'cim', 'cin', 'cio']
If I am not wrong this is leet code problem. You can find multiple answers there.

Create dictionary from unwieldly string of keyvalues

I need to transform a list of strings similar to this:
"ABOC000 RECORD 0 Msg-type\=0220 Bit-map\=3450G83H403894JH Xbit-map\=0000000000000010 Proc code\=312000 Tran amt\=000000000000 Tran datetime\=0613064645 Trace nbr\=000000 Local time\=02:46:37 Local date\=06/13 Exp date\=24/02 Sett date\=06/13 Merchant\=6011 Pos entry\=051 Card seq no\=000 Acqr inst id\=2349823498 Cord \=23049583049583405983045983405983405900 Retr ref\=111111111111 Resp code\=00 Crd acpt trmid\=CS61252 Crd acpt id\=ISPA/PULSE Crd acpt loc\=000 8TH AVENUE BOREALIS XXUS Name\=MERCHANT NAME Tran curr\=840 Natl cond code\=1010000002U Reason codes\=004 Rsn code map\=40 Advice reason\=31 Ddsi data len\=022 Ddsi data map\=B2 Pseudo term\=070792 Acqr netid\=PUL Processor id\=INT789 Proc flags\= Info text\=NI24PS20ID16 03 "
into a list of dicts with key/values.
This is using python 3.7 -- I've gone down the list comprehension and regex paths, but have not found a workable solution yet. The difficult lies around:
keys and values sometimes being multiple words (having spaces)
some keys not always being present
A short example of what I'm aiming to end up with:
[{"RECORD":"0", "Msg-type":"0220", "Bit-map":"3450G83H403894JH", "Xbit-map":"0000000000000010", "Proc code":"312000" ... }]
Assuming that the values don't contain any whitespace (otherwise it would be hard to distinguish what part belongs to the previous value or to the next key), and stripping off the beginning of the string (not sure how the {'RECORD': '0'} fits in the picture), you can use re.findall with the following regex:
s = r"Msg-type\=0220 Bit-map\=3450G83H403894JH Xbit-map\=0000000000000010 Proc code\=312000 Tran amt\=000000000000 Tran datetime\=0613064645 Trace nbr\=000000 Local time\=02:46:37 Local date\=06/13 Exp date\=24/02 Sett date\=06/13 Merchant\=6011 Pos entry\=051 Card seq no\=000 Acqr inst id\=2349823498 Cord \=23049583049583405983045983405983405900 Retr ref\=111111111111 Resp code\=00 Crd acpt trmid\=CS61252 Crd acpt id\=ISPA/PULSE Crd acpt loc\=000 8TH AVENUE BOREALIS XXUS Name\=MERCHANT NAME Tran curr\=840 Natl cond code\=1010000002U Reason codes\=004 Rsn code map\=40 Advice reason\=31 Ddsi data len\=022 Ddsi data map\=B2 Pseudo term\=070792 Acqr netid\=PUL Processor id\=INT789 Proc flags\= Info text\=NI24PS20ID16 03 "
d = dict(re.findall(r'([A-Za-z][A-Za-z \-]*)\\=([^\s]+)', s))
Which gives:
{'Acqr inst id': '2349823498',
'Acqr netid': 'PUL',
'Advice reason': '31',
'Bit-map': '3450G83H403894JH',
'Card seq no': '000',
'Cord ': '23049583049583405983045983405983405900',
'Crd acpt id': 'ISPA/PULSE',
'Crd acpt loc': '000',
'Crd acpt trmid': 'CS61252',
'Ddsi data len': '022',
'Ddsi data map': 'B2',
'Exp date': '24/02',
'Info text': 'NI24PS20ID16',
'Local date': '06/13',
'Local time': '02:46:37',
'Merchant': '6011',
'Msg-type': '0220',
'NAME Tran curr': '840',
'Natl cond code': '1010000002U',
'Pos entry': '051',
'Proc code': '312000',
'Processor id': 'INT789',
'Pseudo term': '070792',
'Reason codes': '004',
'Resp code': '00',
'Retr ref': '111111111111',
'Rsn code map': '40',
'Sett date': '06/13',
'TH AVENUE BOREALIS XXUS Name': 'MERCHANT',
'Trace nbr': '000000',
'Tran amt': '000000000000',
'Tran datetime': '0613064645',
'Xbit-map': '0000000000000010'}

Reading large text file into a dataframe for data analysis in Python

I know similar questions have been asked before. But I still cannot figure out the best way to process data for my program
I have a large text file (50,000 to 5,000,000 lines of text). I need to process each line of this file and write it into a Dataframe so that I can do some data analysis on them.
The dataframe has 9 columns mostly floats and some strings and no. of rows ~ no. of lines in the input file
Currently, I am reading this file line-by line using "with open.." and then using regex to extract the required data and writing this as a row into the Data frame. As this is going through a For loop it takes forever to complete.
What is the best way to do this ? Any pointers or sample programs ? Should I even be using a dataframe ?
Here is my code.
def gcodetodf(self):
with open(self.inputfilepath, 'r') as ifile:
lflag = False
for item in ifile:
layermatch = self.layerpattern.match(item)
self.tlist = item.split(' ')
self.clist = re.split(r"(\w+)", item)
if layermatch and (str(self.tlist[2][:-1]) == 'end' or int(self.tlist[2][:-1]) == (self.endlayer + 1)):
break
if (layermatch and int(self.tlist[2][:-1]) == self.startlayer) or lflag is True:
lflag = True
# clist = re.split(r"(\w+)", item)
map_gcpat = {bool(self.gonepattern.match(item)): self.gc_g1xyef,
bool(self.gepattern.match(item)): self.gc_g1xye,
bool(self.gtrpattern.match(item)): self.gc_g1xyf,
bool(self.resetextpattern.match(item)): self.gc_g92e0,
bool(self.ftpattern.match(item)): self.gc_ftype,
bool(self.toolcompattern.match(item)): self.gc_toolcmt,
bool(self.layerpattern.match(item)): self.gc_laycmt,
bool(self.zpattern.match(item)): self.gc_g1z}
map_gcpat.get(True, self.contd)()
# print(self.newdataframe)
an example function that writes to the dataframe looks like this:
def gc_g1xye(self):
self.newdataframe = self.newdataframe.append(
{'Xc': float(self.tlist[1][1:]), 'Yc': float(self.tlist[2][1:]), 'Zc': self.gc_z,
'E': float(self.tlist[3][1:]),
'F': None, 'FT': self.ft_var, 'EW': self.tc_ew, 'LH': self.tc_lh, 'Layer': self.cmt_layer},
ignore_index=True)
sample input file:
........
G1 X159.8 Y140.2 E16.84505
G1 X159.8 Y159.8 E17.56214
M204 S5000
M205 X30 Y30
G0 F2400 X159.6 Y159.8
G0 X159.33 Y159.33
G0 X159.01 Y159.01
M204 S500
M205 X20 Y20
;TYPE:SKIN
G1 F1200 X140.99 Y159.01 E18.22142
G1 X140.99 Y140.99 E18.8807
G1 X159.01 Y140.99 E19.53999
G1 X159.01 Y159.01 E20.19927
M204 S5000
M205 X30 Y30
G0 F2400 X150.21 Y150.21
M204 S500
M205 X20 Y20
G1 F1200 X149.79 Y150.21 E20.21464
G1 X149.79 Y149.79 E20.23
G1 X150.21 Y149.79 E20.24537
G1 X150.21 Y150.21 E20.26073
M204 S5000
M205 X30 Y30
G0 F2400 X150.61 Y150.61
M204 S500
M205 X20 Y20
G1 F1200 X149.39 Y150.61 E20.30537
G1 X149.39 Y149.39 E20.35
G1 X150.61 Y149.39 E20.39464
..........
Beware that DataFrame.append returns a copy of your old DataFrame with the new rows added: it does not work inplace. Constructing a DataFrame row by row, using append will then work in O(n^2) instead of O(n), which is rather bad if you have 5 million rows...
What you want to do instead is to append each row to a list first (a list of dicts is fine), and then create the DataFrame object from that once all the parsing is done. This will be much faster because appending to a list is done in constant time, so your total complexity should be O(n) instead.
def gc_g1xye(self):
self.data.append(
{'Xc': float(self.tlist[1][1:]), 'Yc': float(self.tlist[2][1:]), 'Zc': self.gc_z,
'E': float(self.tlist[3][1:]),
'F': None, 'FT': self.ft_var, 'EW': self.tc_ew, 'LH': self.tc_lh, 'Layer': self.cmt_layer})
...
# Once the parsing is done:
self.newdataframe = pd.DataFrame(self.data)
Is this the best way of doing it? It looks like a good start to me. Should you be using a DataFrame? From what you say you want to do with the data once you've parsed it, a DataFrame sounds like a good option.
As a random unrelated tip, I recommend the tqdm package for showing a progress bar of your for-loop. It's super easy to use, and it helps you in judging whether it's worth waiting for that loop to finish!

Rasa NLU: How to use multiple categorical slots with same values?

I just started working with Rasa NLU and I have some problem understanding the usage of categorical slots with same values. I have 3 different types of risk, each a categorical slot with values: low, medium and high.
How can the bot differentiate between the three risks and understand which slot to be filled up, given the intent is same for each.
Or do I need to use different intents for each?
Right now what I see is (I removed unrelated logs):
How tired are you?
1: low (low)
2: medium (medium)
3: high (high)
medium
DEBUG:rasa_core.processor:Received user message 'medium' with intent '{'name': 'inform', 'confidence': 0.88372623999657118}' and entities '[{'start': 0, 'end': 6, 'value': 'medium', 'entity': 'fatigue', 'extractor': 'ner_crf'}]'
DEBUG:rasa_core.processor:Current slot values:
fatigue: medium
injury: None
stress: None
How stressed are you?
1: low (low)
2: medium (medium)
3: high (high)
low
DEBUG:rasa_core.processor:Received user message 'low' with intent '{'name': 'inform', 'confidence': 0.88762049990079372}' and entities '[{'start': 0, 'end': 3, 'value': 'low', 'entity': 'fatigue', 'extractor': 'ner_crf'}]'
DEBUG:rasa_core.processor:Current slot values:
fatigue: low
injury: None
stress: None
All the user replies have the intent inform.
An example story is:
* _greet[]
- utter_ask_fatigue
* _inform[fatigue=low]
- utter_ask_injury
* _inform[injury=medium]
- utter_ask_stress
* _inform[stress=low]
- utter_on_it
- action_reply
you can do it with one entity and four slots
the entity may be defined as type "info", with text values (i.e. low, medium, high).
The four slots: the first one is "info", which will auto filled by recognized entity "info" defined previously. The other three would be "fatigue", "stress" and "injury", which can be filled by bot actions such as action_fill_fatigue, action_fill_stress and action_fill_injury.
an example story will make it clear:
* _greet[]
- utter_ask_fatigue
* _inform[info=low]
- action_fill_fatigue
- utter_ask_injury
* _inform[info=medium]
- action_fill_injury
- utter_ask_stress
* _inform[info=low]
- action_fill_stress
- utter_on_it
- action_reply

distributed Tensorflow tracking timestamps for synchronization operations

I am new to TensorFlow. Currently, I am trying to evaluate the performance of distributed TensorFlow using Inception model provided by TensorFlow team.
The thing I want is to generate timestamps for some critical operations in a Parameter Server - Worker architecture, so I can measure the bottleneck (the network lag due to parameter transfer/synchronization or parameter computation cost) on replicas for one iteration (batch).
I came up with the idea of adding a customized dummy py_func operator designated of printing timestamps inside inception_distributed_train.py, with some control dependencies. Here are some pieces of code that I added:
def timer(s):
print ("-------- thread ID ", threading.current_thread().ident, ", ---- Process ID ----- ", getpid(), " ~~~~~~~~~~~~~~~ ", s, datetime.fromtimestamp(time.time()).strftime('%Y-%m-%d %H:%M:%S.%f'))
return Falsedf
dummy1 = tf.py_func(timer, ["got gradients, before dequeues token "], tf.bool)
dummy2 = tf.py_func(timer, ["finished dequeueing the token "], tf.bool)
I modified
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
train_op = tf.identity(total_loss, name='train_op')
into
with tf.control_dependencies([dummy1]):
apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
with tf.control_dependencies([apply_gradients_op]):
with tf.control_dependencies([dummy2]):
train_op = tf.identity(total_loss, name='train_op')
hoping to print the timestamps before evaluating the apply_gradient_op and after finishing evaluating the apply_gradient_op by enforcing node dependencies.
I did similar things inside sync_replicas_optimizer.apply_gradients, by adding two dummy print nodes before and after update_op:
dummy1 = py_func(timer, ["---------- before update_op "], tf_bool)
dummy2 = py_func(timer, ["---------- finished update_op "], tf_bool)
# sync_op will be assigned to the same device as the global step.
with ops.device(global_step.device), ops.name_scope(""):
with ops.control_dependencies([dummy1]):
update_op = self._opt.apply_gradients(aggregated_grads_and_vars, global_step)
# Clear all the gradients queues in case there are stale gradients.
clear_queue_ops = []
with ops.control_dependencies([update_op]):
with ops.control_dependencies([dummy2]):
for queue, dev in self._one_element_queue_list:
with ops.device(dev):
stale_grads = queue.dequeue_many(queue.size())
clear_queue_ops.append(stale_grads)
I understand that apply_gradient_op is the train_op returned by sync_replicas_optimizer.apply_gradient. And apply_gradient_op is the op to dequeue a token (global_step) from sync_queue managed by the chief worker using chief_queue_runner, so that replica can exit current batch and start a new batch.
In theory, apply_gradient_op should take some time as replica has to wait before it can dequeue the token (global_step) from sync_queue, but the print result for one replica I got, such as the time differences for executing apply_gradient_op is pretty short (~1/1000 sec) and sometimes the print output is indeterministic (especially for chief worker). Here is a snippet of the output on the workers (I am running 2 workers and 1 PS):
chief worker (worker 0) output
worker 1 output
My questions are:
1) I need to record the time TensorFlow takes to execute an op (such as train_op, apply_gradients_op, compute_gradients_op, etc.)
2) Is this the right direction to go, given my ultimate goal is to record the elapsed time for executing certain operations (such as the difference between the time a replica finishes computing gradients and the time it gets the global_step from sync_token)?
3) If this is not the way it should go, please guide me with some insights about the possible ways I could achieve my ultimate goal.
Thank you so much for reading my long long posts. as I have spent weeks working on this!

Resources