I am implementing a program for feature selection in large academic documents. The first step is to read each file, generate grams and do some pre-computation. I used multiprocessing.pool to make program run faster. Here is my codes of this part:
#number of process
processNum = 4
pool_precompute = mp.Pool(processes = processNum)
fileNum = len(filelist)
offset = fileNum // processNum
ProcessList = []
for i in range(processNum):
if (i == processNum - 1):
start = i * offset
end = fileNum
else:
start = i * offset
end = start + offset
#call the function
print (start, end, i)
ProcessList.append(pool_precompute.apply_async(get_kgrams_df_pmi, args = (start, end, filelist, i)))
#pool_precompute.apply_async(get_kgrams_df_pmi, args = (start, end, filelist, i))
pool_precompute.close()
pool_precompute.join()
My program must wait all the created processes to finish and then go on to the next steps. However, although I used join(), my program can not hang on at all, it seems that the join() function does not have any effect. Note that each process handle some amount of the file, they do not need to communicate with each other and they do not share any variables, there is a return value of get_kgrams_df_pmi, it is an integer computed in the function (the number of words in the document corpus). I would appreciate if anyone can find out the problem.
Just because that there is errors in my function called by the multiprocess pool, they stopped and did not print error information.
Related
I am trying to simulate n-dimensional game of life for first t=6 time steps. My Nim code is a straightforward port from Python and it works correctly but instead of the expected speedup, for n=4, t=6 it takes 2 seconds to run, which is order of magnitude slower than my CPython version. Why is my code so slow? What can I do to speed it up? I am compiling with -d:release and --opt:speed
I represent each point in space with a single 64bit integer.
That is, I map (x_0, x_1, ..., x_{n-1}) to sum x_i * 32^i. I can do that since I know that after 6 time steps each coordinate -15<=x_i<=15 so I have no overflow.
The rules are:
alive - has 2 or 3 alive neigbours: stays alive
- different number of them: becomes alive
dead - has 3 alive neighbours: becomes alive
- else: stays dead
Below is my code. The critical part is the proc nxt which gets set of active cells and outputs set of active cells next time step. This proc is called 6 times. The only thing I'm interested in is the number of alive cells.
I run the code on the following input:
.##...#.
.#.###..
..##.#.#
##...#.#
#..#...#
#..###..
.##.####
..#####.
Code:
import sets, tables, intsets, times, os, math
const DIM = 4
const ROUNDS = 6
const REG_SIZE = 5
const MAX_VAL = 2^(REG_SIZE-1)
var grid = initIntSet()
# Inits neighbours
var neigbours: seq[int]
proc initNeigbours(base,depth: int) =
if depth == 0:
if base != 0:
neigbours.add(base)
else:
initNeigbours(base*2*MAX_VAL-1, depth-1)
initNeigbours(base*2*MAX_VAL+0, depth-1)
initNeigbours(base*2*MAX_VAL+1, depth-1)
initNeigbours(0,DIM)
echo neigbours
# Calculates next iteration:
proc nxt(grid: IntSet): IntSet =
var counting: CountTable[int]
for x in grid:
for dx in neigbours:
counting.inc(x+dx)
for x, count in counting.pairs:
if count == 3 or (count == 2 and x in grid):
result.incl(x)
# Loads input
var row = 0
while true:
var line = stdin.readLine
if line == "":
break
for col in 0..<line.len:
if line[col] == '#':
grid.incl((row-MAX_VAL)*2*MAX_VAL + col-MAX_VAL)
inc row
# Run computation
let time = cpuTime()
for i in 1..ROUNDS:
grid = nxt(grid)
echo "Time taken: ", cpuTime() - time
echo "Result: ", grid.len
discard stdin.readLine
Your code runs in my computer in about 0.02:
Time taken: 0.020875947
Result: 2276
Time taken: 0.01853268
Result: 2276
Time taken: 0.021355269
Result: 2276
I changed the part where the input is read to this:
# Loads input
var row = 0
let input = open("input.txt")
for line in input.lines:
for i, col in line:
if col == '#':
grid.incl((row-MAX_VAL)*2*MAX_VAL + i-MAX_VAL)
inc row
input.close()
But it shouldn't impact the performance, it just looks better to my eyes. I compiled with:
nim c -d:danger script.nim
Using Nim 1.4.2. -d:danger is the flag for maximum speed before entering deeper waters.
But even compiling in debug mode:
$ nim c -r script.nim
Time taken: 0.07699487199999999
Result: 2276
Way faster than 2 seconds. There has to be other problem in your end. Sorry for the non-answer.
For a detailed understanding, I have attached a link of file.
Indetail understanding code
I have data in the list, that has similar syntax like:
i = [a.b>c.d , e.f.g>h.i.j ]
l = [a.b , e.f.g ]
n = [a.b>c.d , e.f.g ]
e.g. Each element of the list has multiple sub-elements separated by "." and ">"
for i in range(0, len(l)):
reac={}
reag={}
t = l[i].split(">")
REAC = t[0]
Reac = REAC.split(".")
for o in range(len(Reac)):
reaco = "https://ai.chemistryinthecloud.com/smilies/" + Reac[o]
respo = requests.get(reaco)
reac[o] ={"Smile":Reac[o],"Details" :respo.json()}
if (len(t) != 1):
REAG = t[1]
Reag = REAG.split(".")
for k in range(len(Reag)):
reagk = "https://ai.chemistryinthecloud.com/smilies/" + Reag[k]
repo = requests.get(reagk)
reag[k] = {"Smile": Reag[k], "Details" :repo.json()}
res = {"Reactants": list(reac.values()), "Reagents": list(reag.values())}
boo.append(res)
else:
res = {"Reactants": list(reac.values()), "Reagents": "No reagents"}
boo.append(res)
We have separated all the elements and for each element, we are calling 3rd party API. That consumes too much time.
Is there any way to reduce this time and optimize for the loop?
It takes around 1 minute to respond. We want to optimize to 5-10 seconds.
You can start multiple requests and process them when they are finished with
ThreadPoolExecutor.
That might speed it up a bit.
I can't make the iterator shorter while I'm running on it.
I want to write a function which gets a string and deletes repeating sequences in it.
for example:
if a have the string aaaaabbbbbbbcccccccDDDDDDaaaaa
I should get in return abcDa.
I tried to run over the string with a for loop and every time I see a new letter I will save the letter in a variable which adds up to be the fixed string.
def string_sequence_fixing(string):
c = ''
for char in my_str:
if c != char:
c = char
else:
my_str = my_str.replace(c, '', my_str.count(c) - 1)
return my_str
The problem I want to avoid is too many iterations.
When I see a new character I want to delete all the other sequences of it,
but the second line from the end does not update the "condition" in the for a loop.
Short Answer: loops don't work that way.
Longer answer:
Here is some simple pseudo code, for your perusal:
j=99
print "J is " j
for j=0;20;5
print j \t
end
print "Now J is " j
The output may surprise you.
run
J is 99
0 5 10 15 20
Now J is 99
The reason is: the variable j in the loop is NOT the as the j variable outside the loop.
I like to use the term "stack" (some languages claim they don't use a stack. In those cases I call it a "not-stack stack). The stack simple means a temporary storage space in memory.
The initial variable "j" goes into the "program data space". The loop variable "j" goes into the "stack data space."
Using a variable doesn't 'really' mean you are using a variable, it's just a mnemonic to a memory space. Let's have another look at that sample code:
pointer-to-program-space-variable-named-j = 99 (poke into memory location 1:4500)
pointer-to-stack-space-variable-named-j = 0 (poke into memory location 87:300)
print pointer-to-stack-space-variable-named-j followed by tab
increment pointer-to-stack-space-variable-named-j by 5
repeat until pointer-to-stack-space-variable-named-j = 20
print pointer-to-program-space-variable-named-j
With this knowledge, let's look at your code to see what is going on:
def string_sequence_fixing(string):
c = ''
for char in *STACK*.my_str:
if c != char:
c = char
else:
my_str = my_str.replace(c, '', *PROGRAM*.my_str.count(c) - 1)
return my_str
See how they are different variables? NEVER assume that a loop variable and a program variable are the same. You need to redo you algorithm to accomplish what you want to do.
Also, see the link provided by #David Cullen.
You can use groupby() from itertools. Code is like:
data = 'aaabbbcccDDDDDEfggghiij'
from itertools import groupby
dataN = ''
for d in groupby(data):
dataN += d[0]
print(dataN)
output:
abcDEfghij
I'am beginner in programming and have struggled for a while with one task.
Want to write a program wich finds out how many iterations is needed to arrive at the number 6174 from the specified number.
For example.: if I take number 2341 and sort it.
1) 4321-1234=3087
2) 8730-378=8352
3) 8532-2358=6174 (in this case it`s needed 3 iterations.)
And I have to use ,,while loop,, that it runs a code until it comes to number 6174 and stops.
I wrote a code:
n =input('write for nummbers ')
n=str(n)
i=0
i+=1 #"i" show how many times iteration happend.
large = "".join(sorted(n, reverse=True))
little = "".join(sorted(n,))
n = int(large) - int(little)
print(n, i)
Can you give mee some hint how I could run it with while loop.
# untested, all bugs are free ;)
n = input('write for nummbers ')
n = int(n) # you need n as a number
i=0
while n != 6174:
i += 1 #"i" show how many times iteration happened.
large = "".join(sorted(str(n), reverse=True))
little = "".join(sorted(str(n),))
n = int(large) - int(little)
print(n, i)
tl;dr: My code "works", in that it gives me the answer I need. I just can't get it to stop running when it reaches that answer. I'm stuck with scrolling back through the output.
I'm a complete novice at programming/Python. In order to hone my skills, I decided to see if I could program my own "solver" for Implied Equity Risk Premium from Prof. Damodaran's Valuation class. Essentially, the code takes some inputs and "guesses and tests" a series of interest rates until it gets a "close" value to the input.
Right now my code spits out an output list, and I can scroll back through it to find the answer. It's correct. However, I cannot for the life of me get the code to "stop" at the correct value with the while function.
I have the following code:
per = int(input("Enter the # of periods forecast ->"))
divbb = float(input("Enter the initial dividend + buyback value ->"))
divgr = float(input("Enter the div + buyback growth rate ->"))
tbondr = float(input("Enter the T-Bond rate ->"))+0.000001
sp = int(input("Enter the S&P value->"))
total=0
pv=0
for i in range(1,10000):
erp = float(i/10000)
a = divbb
b = divgr
pv = 0
temppv = 0
print (sp-total, erp)
for i in range(0, per):
a=a * (1+b)
temppv = a / pow((1+erp),i)
pv=pv+temppv
lastterm=(a*1+tbondr)/((erp-tbondr)*pow(1+erp,per))
total=(pv+lastterm)
From his example, with the inputs:
per = 5
divbb = 69.46
divgr = 0.0527
tbondr = 0.0176
sp = 1430
By scrolling back through the output, I can see my code produces the correct minimum at epr=0.0755.
My question is: where do I stick the while to stop this code at that minimum? I've tried a lot of variations, but can't get it. What I'm looking for is, basically:
while (sp-total) > |1|, keep running the code.
per = 5
divbb = 69.46
divgr = 0.0527
tbondr = 0.0176
sp = 1430
total=0
pv=0
i = 1
while(abs(sp-total)) > 1:
erp = i/10000.
a = divbb
b = divgr
pv = 0
temppv = 0
print (sp-total, erp)
for j in range(0, per):
a=a * (1+b)
temppv = a / pow((1+erp),j)
pv=pv+temppv
lastterm=(a*1+tbondr)/((erp-tbondr)*pow(1+erp,per))
total=(pv+lastterm)
i += 1
should work. Obviously, there are a million ways to do this. But the general gist here is that the while loop will stop as soon as it meets the condition. You could also test every time in the for loop and include a break statement, but because you don't know when it will stop, I think a while loop is better in this case.
Let me give you a quick rundown of two different ways you could solve a problem like this:
Using a while loop:
iterator = start value
while condition(iterator):
do some stuff
increment iterator
Using a for loop:
for i in xrange(startvalue, maxvalue):
do some stuff
if condition:
break
Two more thing: if you're doing large ranges, use the generator xrange. Also, it's probably a bad idea to reuse i inside your for loop.
I recommend CS101 from Udacity.com for learning Python. Also, if you're interested in algorithms, work through the problems at projecteuler.com