I'm working with Theano library.
I have 1000 object of size 114, defined as a variable of size 1000x114.
I have a function that takes that variable and produces 1000 numbers between 0 and 113.
I need to create a function that takes for each one of the 1000 object, the number in the position given by the previous function.
How can I do it?
This is what works:
input_var = T.imatrix('inputs')
index = something
index_fn = theano.function([input_var], index, name="index function")
This is what doesn't work:
num = input_var[:][index + 48]
num_fn = theano.function([input_var], num, name="num function")
The result of the num_fn is simply the same data I give it as input.
I'm don't know what the +48 is supposed to do but I'm assuming index + 48 is your vector of indices your function returned. The way theano (and numpy) indexing works is that your second approach, using :, will return the rows with indices in index. Instead, you need to use a range, e.g.:
num = input_var[T.arange(T.shape(input_var)[0]), index + 48]
num_fn = theano.function([input_var], num, name="num function")
Related
This question already has answers here:
Are Python variables pointers? Or else, what are they?
(9 answers)
Closed 3 years ago.
When I use the function that doesn't return anything, the input parameters remain globally unchanged.
For example:
def func(var):
var += 1
a = 0
for i in range(3):
func(a)
print(a)
logically results in
0
0
0
By it doesn't seem to work the same when the input is numpy array:
import numpy as np
def func(var):
var += 1
a = np.zeros(3)
for i in range(3):
func(a)
print(a)
Output:
[1. 1. 1.]
[2. 2. 2.]
[3. 3. 3.]
Thus, all modifications were applied to array globally, not inside the function. Why is it happening? And, more generally, are there any special rules on how to handle np arrays as functions input?
In Python, any value passed to a function is passed by object reference. So, in the first case, where you pass a number to your function, var is set to a reference to the object that reresents the number 1. In Python, even numbers are objects. To icrement var in this case actually means to set it with a reference to the object that represents the number 1+1, which is the object that represents 2. Note that object that represents the number 1 is not changed. Within the function, it is replaced.
When you pass a numpy array to your function it is likewise passed in by object reference. Hence, var now holds a reference to your array a. Incrementing an array by arr += 1 means to add 1 to each of its elements. Hence, the actual content of the object that var references has to change. However, var still references the same object are incrementation.
Take a look at the following code:
import numpy as np
def func(vals):
print('Before inc: ', id(vals))
vals += 1
print('After inc: ', id(vals))
When you pass in a number literal, vals is set to a reference of the object representing the respective number. This object has a unique id, which you can return using the id function. After incrementation, vals is a reference to the object representing a number which is one greater the first one. You can verify that by calling id again after incrementation. So, the output of the above function is something like:
Before inc: 4351114480
After inc: 4351114512
Note that there are two different objects. When now pass in an numpy array like, the resulting ids are the same:
a = np.zeros(3)
func(a)
Before inc: 4496241408
After inc: 4496241408
If you want to modify an array inside of a function and don't want that the apply changes take effect outside of the function, you have to copy your array:
def func(vals):
_vals = vals.copy()
# doing stuff with `_vals` won't change the array passed to `vals`
+= for int (and float, str, ...) creates a new value. Such types are known as immutable, because each individual object cannot be changed.
>>> i = 1
>>> id(i)
4550541072
>>> i += 1
>>> id(i) # different id
4550541104
This means incrementing such a value inside a function creates a new value inside the function. Any references outside of the function are unaffected.
+= for np.array (and list, Counter, ...) modifies the content. Such types are known as mutable, because each individual object can be changed.
>>> l = [0, 1, 2, 3]
>>> id(l)
4585425088
>>> l += [4]
>>> id(l)
4585425088
This means incrementing such a value inside a function changes the value of the object visible inside and outside of the function. Any references inside and outside of the function point to the exact same object, and show its changed value.
I have a train dataset which has 43 attributes. Each of the attributes have some tuple values as objects (as in strings with certain characters).
Now, I'm trying to scale the values using a scaler, but it gives the following error:
could not convert string to float: '?'
Now, I don't know how to convert objects to int or float in a single command and converting it for each of the 43 attributes one by one is a bit tedious.
So I want to know how to do it for the complete dataset with a single command.
I use the convert function which tries to parse the string as a float.
If it cannot, it tries to parse it as a int, and if it still cannot, assigns the value 0 (you can change the default value is the string is not a int or a float to something else)
l = []
def convert(str):
x = 0
try:
x = int(str)
except:
try:
x = float(str)
except:
pass
l.append(x)
for i in ['1','2','3','?','4.5']:
convert(i)
print(l)
#[1, 2, 3, 0, 4.5]
So I have a code that is a mutant dna generator - more specifically, it yields 100 strands with a point mutation frequency of 0.066% for any base, derived from an original strand I have specified in the code. The problem I'm having with it, however, is the print statement. I don't get an output, and I'm not sure why. Here's my code:
import random
class DNA_mutant_generator:
def __init__ (self, dna):
self.dna = dna
self.bases = bases
def mutate(self, mutation_rate=0.0066): #add a specification of the mutation rate of 0.066%
result = []
mutations = []
for base in self.dna:
if random.random() < mutation_rate: #random.random() returns the next random floating point number in range [0.0,1.0)
new_base = bases[bases.index(base) - random.randint(1, 3)] # subtracts the position of a certain base from a random integer within the range of 1 to 3 in the bases list. This creates the newly mutated base.
result.append(new_base)#append that new base to the result list
mutations.append((base, new_base))#appends the original base, as well as the new mutated base to a list of tuples
else:
result.append(base)#
return "".join(result), mutations # returns mutated strands, as well as mutations
for result_string, mutations in results:
if mutations: # skip writing out unmutated strings
print(result_string, mutations)
bases = "ACTG" #specifies the bases in the DNA strand
orig_dna = "GGCTCTCCAACAGgtaagcactgaagggtagaaggcatcgtctgtctcagacatgtctggcaccatccgctaagacattaccacgtgggtctcgagtatagctaacacccttctgtttggcagCTTACAGGAGCGAACGTTGG"
dna = orig_dna.upper()
dna_mutants = DNA_mutant_generator(dna)
dna_mutants.mutate()
Does anybody know what else I should add in order to get the output I specified in my function? I did include a print statement, so I'm not sure why the code is not yielding anything.
EDIT 2:
Should the code look something like this, then?
import random
class DNA_mutant_generator:
def __init__ (self, dna):
self.dna = dna
self.bases = bases
def mutate(self, mutation_rate=0.0066): #add a specification of the mutation rate of 0.066%
result = []
mutations = []
for base in self.dna:
if random.random() < mutation_rate: #random.random() returns the next random floating point number in range [0.0,1.0)
new_base = bases[bases.index(base) - random.randint(1, 3)] # subtracts the position of a certain base from a random integer within the range of 1 to 3 in the bases list. This creates the newly mutated base.
result.append(new_base)#append that new base to the result list
mutations.append((base, new_base))#appends the original base, as well as the new mutated base to a list of tuples
else:
result.append(base)#
return_value = "".join(result), mutations # returns mutated strands, as well as mutations
for result_string in results:
if mutations: # skip writing out unmutated strings
print(result_string, mutations)
return return_value
results = [dna_mutants.mutate() for _ in range(100)] #prints 100 strands
bases = "ACTG" #specifies the bases in the DNA strand
orig_dna = "GGCTCTCCAACAGgtaagcactgaagggtagaaggcatcgtctgtctcagacatgtctggcaccatccgctaagacattaccacgtgggtctcgagtatagctaacacccttctgtttggcagCTTACAGGAGCGAACGTTGG"
dna = orig_dna.upper()
dna_mutants = DNA_mutant_generator(dna)
dna_mutants.mutate()
but if I move results outside of the fnction, so that mutate is not repeated within the function, I get this error message:
results = [dna_mutants.mutate() for _ in range(100)] #prints 100 strands
NameError: global name 'dna_mutants' is not defined
You are returning before your print statement with the following line:
return "".join(result), mutations # returns mutated strands, as well as mutations
If you want to print information after this line, remove the return statement assign the expression to a variable instead, and then return that variable at the end of the function.
return_value = "".join(result), mutations # returns mutated strands, as well as mutations
for result_string in result:
if mutations: # skip writing out unmutated strings
print(result_string, mutations)
return return_value
Edit: Your new problem is because you've created a recursive function that is calling itself over and over and over again. Everytime a function calls itself, it requires more space on the stack, and you called it so many times your stack "overflowed".
I would like to input a DNA sequence and make some sort of generator that yields sequences that have a certain frequency of mutations. For instance, say I have the DNA strand "ATGTCGTCACACACCGCAGATCCGTGTTTGAC", and I want to create mutations with a T->A frequency of 5%. How would I go about to creating this? I know that creating random mutations can be done with a code like this:
import random
def mutate(string, mutation, threshold):
dna = list(string)
for index, char in enumerate(dna):
if char in mutation:
if random.random() < threshold:
dna[index] = mutation[char]
return ''.join(dna)
But what I am truly not sure how to do is make a fixed mutation frequency. Anybody know how to do that? Thanks.
EDIT:
So should the formatting look like this if I'm using a byte array, because I'm getting an error:
import random
dna = "ATGTCGTACGTTTGACGTAGAG"
def mutate(dna, mutation, threshold):
dna = bytearray(dna) #if you don't want to modify the original
for index in range(len(dna)):
if dna[index] in mutation and random.random() < threshold:
dna[index] = mutation[char]
return dna
mutate(dna, {"A": "T"}, 0.05)
print("my dna now:", dna)
error: "TypeError: string argument without an encoding"
EDIT 2:
import random
myDNA = bytearray("ATGTCGTCACACACCGCAGATCCGTGTTTGAC")
def mutate(dna, mutation, threshold):
dna = myDNA # if you don't want to modify the original
for index in range(len(dna)):
if dna[index] in mutation and random.random() < threshold:
dna[index] = mutation[char]
return dna
mutate(dna, {"A": "T"}, 0.05)
print("my dna now:", dna)
yields an error
You asked me about a function that prints all possible mutations, here it is. The number of outputs grows exponentially with your input data length, so the function only prints the possibilities and does not store them somehow (that could consume very much memory). I created a recursive function, this function should not be used with very large input, I also will add a non-recursive function that should work without problems or limits.
def print_all_possibilities(dna, mutations, index = 0, print = print):
if index < 0: return #invalid value for index
while index < len(dna):
if chr(dna[index]) in mutations:
print_all_possibilities(dna, mutations, index + 1)
dnaCopy = bytearray(dna)
dnaCopy[index] = ord(mutations[chr(dna[index])])
print_all_possibilities(dnaCopy, mutations, index + 1)
return
index += 1
print(dna.decode("ascii"))
# for testing
print_all_possibilities(bytearray(b"AAAATTTT"), {"A": "T"})
This works for me on python 3, I also can explain the code if you want.
Note: This function requires a bytearray as given in the function test.
Explanation:
This function searches for a place in dna where a mutation can happen, it starts at index, so it normally begins with 0 and goes to the end. That's why the while-loop, which increases index every time the loop is executed, is for (it's basically a normal iteration like a for loop). If the function finds a place where a mutation can happen (if chr(dna[index]) in mutations:), then it copies the dna and lets the second one mutate (dnaCopy[index] = ord(mutations[chr(dna[index])]), Note that a bytearray is an array of numeric values, so I use chr and ord all the time to change between string and int). After that the function is called again to look for more possible mutations, so the functions look again for possible mutations in both possible dna's, but they skip the point they have already scanned, so they begin at index + 1. After that the order to print is passed to the called functions print_all_possibilities, so we don't have to do anything anymore and quit the executioning with return. If we don't find any mutations anymore we print our possible dna, because we don't call the function again, so no one else would do it.
It may sound complicated, but it is a more or less elegant solution. Also, to understand a recursion you have to understand a recursion, so don't bother yourself if you don't understand it for now. It could help if you try this out on a sheet of paper: Take an easy dna string "TTATTATTA" with the possible mutation "A" -> "T" (so we have 8 possible mutations) and do this: Go through the string from left to right and if you find a position, where the sequence can mutate (here it is just the "A"'s), write this string down again, this time let the string mutate at the given position, so that your second string is slightly different from the original. In the original and the copy, mark how far you came (maybe put a "|" after the letter you let mutate) and repeat this procedure with the copy as new original. If you don't find any possible mutation, then underline the string (This is the equivalent to printing it). At the end you should have 8 different strings all underlined. I hope that can help to understand it.
EDIT: Here is the non-recursive function:
def print_all_possibilities(dna, mutations, printings = -1, print = print):
mut_possible = []
for index in range(len(dna)):
if chr(dna[index]) in mutations: mut_possible.append(index)
if printings < 0: printings = 1 << len(mut_possible)
for number in range(min(printings, 1 << len(mut_possible)):
dnaCopy = bytearray(dna) # don't change the original
counter = 0
while number:
if number & (1 << counter):
index = mut_possible[counter]
dnaCopy[index] = ord(mutations[chr(dna[index])])
number &= ~(1 << counter)
counter += 1
print(dnaCopy.decode("ascii"))
# for testing
print_all_possibilities(bytearray(b"AAAATTTT"), {"A": "T"})
This function comes with an additional parameter, which can control the number of maximum outputs, e.g.
print_all_possibilities(bytearray(b"AAAATTTT"), {"A": "T"}, 5)
will only print 5 results.
Explanation:
If your dna has x possible positions where it can mutate, you have 2 ^ x possible mutations, because at every place the dna can mutate or not. This function finds all positions where your dna can mutate and stores them in mut_possible (that's the code of the for-loop). Now mut_possible contains all positions where the dna can mutate and so we have 2 ^ len(mut_possible) (len(mut_possible) is the number of elements in mut_possible) possible mutations. I wrote 1 << len(mut_possible), it's the same, but faster. If printings is a negative number the function will decide to print all possibilities and set printings to the number of possibilities. If printings is positive, but lower than the number of possibilities, then the function will print only printings mutations, because min(printings, 1 << len(mut_possible)) will return the smaller number, which is printings. Else, the function will print out all possibilities. Now we have number to go through range(...) and so this loop, which prints one mutation every time, will execute the desired number of times. Also, number will increase by one every time. (e.g., range(4) is similar! to [0, 1, 2, 3]). Next we use number to create a mutation. To understand this step you have to understand a binary number. If our number is 10, it's in binary 1010. These numbers tell us at which places we have to modify out code of dna (dnaCopy). The first bit is a 0, so we don't modify the first position where a mutation can happen, the next bit is a 1, so we modify this position, after that there is a 0 and so on... To "read" the bits we use the variable counter. number & (1 << counter) will return a non-zero value if the counterth bit is set, so if this bit is set we modify our dna at the counterth position where a mutation can happen. This is written in mut_possible, so our desired position is mut_possible[counter]. After we mutated our dna at that position we set the bit to 0 to show that we already modified this position. That is done with number &= ~(1 << counter). After that we increase counter to look at the other bits. The while-loop will only continue to execute if number is not 0, so if number has at least one bit set (if we have to modify at least one position of dna). After we modified our dnaCopy the while-loop is finished and we print our result.
I hope these explanations could help. I see that you are new to python, so take yourself time to let that sink in and contact me if you have any further questions.
After what I read this question seems easy to answer. The chance is high that I misunderstood something, so please correct me if I am wrong.
If you want a chance of 5% to change a T with an A, then you should write
mutate(yourString, {"A": "T"}, 0.05)
I also suggest you to use a bytearray instead of a string. A bytearray is similar to a string, it can only contain bytes (values from 0 to 255) while a string can contain more characters, but a bytearray is mutable. By using a bytearray you don't need to create you temporary list or to join it in the end. If you do that, your code looks like this:
import random
def mutate(dna, mutation, threshold):
if isinstance(dna, str):
dna = bytearray(dna, "utf-8")
else:
dna = bytearray(dna)
for index in range(len(dna)):
if chr(dna[index]) in mutation and random.random() < threshold:
dna[index] = ord(mutation[chr(dna[index])])
return dna
dna = "ATGTCGTACGTTTGACGTAGAG"
print("DNA first:", dna)
newDNA = mutate(dna, {"A": "T"}, 0.05)
print("DNA now:", newDNA.decode("ascii")) #use decode to make newDNA a string
After all the stupid problems I had with the bytearray version, here is the version that operates on strings:
import random
def mutate(string, mutation, threshold):
dna = list(string)
for index, char in enumerate(dna):
if char in mutation:
if random.random() < threshold:
dna[index] = mutation[char]
return ''.join(dna)
dna = "ATGTCGTACGTTTGACGTAGAG"
print("DNA first:", dna)
newDNA = mutate(dna, {"A": "T"}, 0.05)
print("DNA now:", newDNA)
If you use the string version with larger input the computation time will be bigger as well as the memory used. The bytearray-version will be the best when you want to do this with much larger input.
I am trying to understand these instructions.
Set up a new function in your main program file named “summer” that takes a list as a parameter and returns a value we will determine in the next steps.
In the “summer” function, set up a loop that uses a counter variable named “n” that will take on the values 0, 2, 4, 6, 8, 10, 12.
Each time through the loop, you are to call your “powerval” function from the “mymath” module passing as parameters item “n” and “n+1” from the list of data passed into “summer”. Add up all these values and return the final result to the caller.
So far I have:
def summer(list):
for n in range(0,13,2):
value=powerval(n,n+1)
After that I am lost. How do i perform step 3?
You add them up:
from mymath import powerval
def summer(somelist):
sum = 0
for n in range(0, 13, 2):
sum += powerval(somelist[n], somelist[n + 1])
return sum
So the return value of powerval() is added to the total sum so far, which was started at 0. You do need to pass in the somelist[n] and somelist[n + 1] values, not the indices themselves.
You need to add them up:
from mymath import powerval
def summer(lst):
total = 0
for n in range(0, 13, 2):
total += powerval(lst[n], lst[n + 1])
return total
I'm not sure where you use lst (I renamed list to lst, as list is a built-in function), so I'm guessing you're trying to get the nth and n + 1th elements from that list.
You can use the sum method to accomplish this in a very fashionable way :)
def summer(myList):
return sum(powerval(myList[n], myList[n+1]) for n in range(0, 13, 2))
This is also the fastest way.
PS: It's not a good idea to name you list "list", bacause that's a reserved name in python. That's why I have renamed it to myList in the example above.