Making a random string based on random.random and a frequency table - string

I have this code:
base_distribution = {'A' : 0.345, 'C' : 0.158, 'G' : 0.059, 'T' : 0.437}
def get_random_uniform_sequence(alphabet, k):
'''It return a random uniform distribuited sequence based in a alphabet
with a k length (Bernoulli Distribuition)
alphabet = list or array of strings representing a alphabet
that the generated string is compsed.
k = is a integer representing the length of the generated string.
'''
return ''.join(rd.choice(alphabet) for _ in range(k))
# choose a random symbol according to a given distribution
def weighted_choice(distribuition_probs):
r = random.random()
# make a choise based on r
for k, v in distribuition_probs.items():
if v >= r:
print(k, v)
# generate a random sequence
def bernoulli_sequence(symbol_distribution, length):
return ''.join(weighted_choice(symbol_distribution) for i in range(length))
I need to make a random sequence based in a alphabet and k length based in certain probabilities(base_distributuion) using random.random. It is a task in a course I am doing, but I am not sure my weighted_choice functions is doing what is asked for.
I know that numpy random choices would be better, but the task not ask for that. What I am doing wrong?
I would appreciate any tip,
Thank you for your time and attention!
Paulo
PS- I hope I don't have offended anyone here because lately I have notice that I have being ignore around here! 8(

I got this done.
def weighted_choice(prob_distribution):
chars = list(prob_distribution.keys())
probs = list(prob_distribution.values())
cumdist = list(itertools.accumulate(probs))
r = random.random() * cumdist[-1]
return choices[bisect.bisect(cumdist, r)]
Worked fine!
THank you

Related

How to generate list of string with Random Characters using python3

I tried with the below code but it seems not to give the intended output.
ran = ''.join(random.choice(string.ascii_uppercase+string.digits) for x in range(10))
So the above code gives '6U1S75' but I want output like
['6U1S75', '4Z4UKK', '9111K4',....]
Please help.
I thought this is elegant :
from string import digits, ascii_letters
from random import choices
def rand_list_of_strings(list_size, word_size, pool=ascii_letters + digits):
return ["".join(choices(pool, k=word_size)) for _ in range(list_size)]
I used ascii_letters instead of ascii_uppercase to have both upper and lower case values, you can edit it to your suiting.
Example use of the above function :
>>> rand_list_of_strings(4, 5)
['wBSbH', 'rJoH8', '9Gx4q', '8Epus']
>>> rand_list_of_strings(4, 10)
['UWyRglswlN', 'w0Yr7xlU5L', 'p0e6rghGMS', 'Z8zX2Vqyve']
>>>
The first argument is the list size, and the second argument is how large each consequent string should be, and the function invocation returns a list instance. Do not that this should not be used for cryptographic purposes.
Take a look at this.
list_size = 10
word_size = 4
ran = []
for i in range(list_size):
rans = ''
for j in range(word_size):
rans += random.choice(string.ascii_uppercase + string.digits)
ran.append(rans)
Though the above solution is clearer and should be preferred, if you absolutely want to do this with list comprehension...
list_size = 10
word_size = 4
ran = [
''.join([
random.choice(string.ascii_uppercase + string.digits)
for j in range(word_size)
])
for i in range(list_size)
]

how to add characters from array into one string python

I'm trying to change characters from x into upper or lower character depending whether they are in r or c. And the problem is that i can't get all the changed characters into one string.
import unittest
def fun_exercise_6(x):
y = []
r = 'abcdefghijkl'
c = 'mnopqrstuvwxz'
for i in range(len(x)):
if(x[i] in r):
y += x[i].lower()
elif(x[i] in c):
y += x[i].upper()
return y
class TestAssignment1(unittest.TestCase):
def test1_exercise_6(self):
self.assertTrue(fun_exercise_6("osso") == "OSSO")
def test2_exercise_6(self):
self.assertTrue(fun_exercise_6("goat") == "gOaT")
def test3_exercise_6(self):
self.assertTrue(fun_exercise_6("bag") == "bag")
def test4_exercise_6(self):
self.assertTrue(fun_exercise_6("boat") == "bOaT" )
if __name__ == '__main__':
unittest.main()
Using a list as you are using is probably the best approach while you are figuring out whether or not each character should be uppered or lowered. You can join your list using str's join method. In your case, you could have your return statement look like this:
return ''.join(y)
What this would do is join a collection of strings (your individual characters into one new string using the string you join on ('').
For example, ''.join(['a', 'b', 'c']) will turn into 'abc'
This is a much better solution than making y a string as strings are immutable data types. If you make y a string when you are constructing it, you would have to redefine and reallocate the ENTIRE string each time you appended a character. Using a list, as you are doing, and joining it at the end would allow you to accumulate the characters and then join them all at once, which is comparatively very efficient.
If you define y as an empty string y = "" instead of an empty list you will get y as one string. Since when you declare y = [] and add an item to the list, you add a string to a list of string not a character to a string.
You can't compare a list and a string.
"abc" == ["a", "b", "c'] # False
The initial value of y in the fun_exercise_6 function must be ""

How can i optimise my code and make it readable?

The task is:
User enters a number, you take 1 number from the left, one from the right and sum it. Then you take the rest of this number and sum every digit in it. then you get two answers. You have to sort them from biggest to lowest and make them into a one solid number. I solved it, but i don't like how it looks like. i mean the task is pretty simple but my code looks like trash. Maybe i should use some more built-in functions and libraries. If so, could you please advise me some? Thank you
a = int(input())
b = [int(i) for i in str(a)]
closesum = 0
d = []
e = ""
farsum = b[0] + b[-1]
print(farsum)
b.pop(0)
b.pop(-1)
print(b)
for i in b:
closesum += i
print(closesum)
d.append(int(closesum))
d.append(int(farsum))
print(d)
for i in sorted(d, reverse = True):
e += str(i)
print(int(e))
input()
You can use reduce
from functools import reduce
a = [0,1,2,3,4,5,6,7,8,9]
print(reduce(lambda x, y: x + y, a))
# 45
and you can just pass in a shortened list instead of poping elements: b[1:-1]
The first two lines:
str_input = input() # input will always read strings
num_list = [int(i) for i in str_input]
the for loop at the end is useless and there is no need to sort only 2 elements. You can just use a simple if..else condition to print what you want.
You don't need a loop to sum a slice of a list. You can also use join to concatenate a list of strings without looping. This implementation converts to string before sorting (the result would be the same). You could convert to string after sorting using map(str,...)
farsum = b[0] + b[-1]
closesum = sum(b[1:-2])
"".join(sorted((str(farsum),str(closesum)),reverse=True))

working with huge int represented as string

Learning Python just for two months, so please be patient .)
Stumbled at very interesting task - perform arithmetic operations with huge numbers ( n: int, number of digits < 10**6 ), represented as str ( len(str) < 10**6 ). I need to split this string in parts, make some simple arithmetics, and give output back as string. Converting to int and back takes about 0.5 sec:
from time import time
from random import choice
string = ''
digits = [str(i) for i in range(10)]
for _ in range(1, 99999):
string += choice(digits)
start = time()
a, b = string[:50000], string[50000:] # Split in half
a, b = map(int, [a, b])
print('str to int:', time() - start)
# Output: str to int: 0.1092...
c = a + b
start = time()
output = ''.join([str(part) for part in [a, b, c]])
print('int to str:', time() - start)
# Output: int to str: 0.4836...
Problem starts when I need to perform this operation in a loop with hundreds iterations
This is something about reverse hashing-like functions
My simple question is this - what kind of optimisation is possible for this kind of situation?
With my best regards! .)
Vadim, I can't answer your question directly. However, let me point you towards the gmpy2 library. The principal author is one of the outrageously clever people we used to see often here on SO; you could use gmpy2 itself, or have a look at the source code.
>>> from gmpy2 import *
>>> s = '012345678901234'
>>> t = '123456789012345'
>>> N = mpz(s)+mpz(t)
>>> N
mpz(135802467913579)
>>> str(N)
'135802467913579'

Repeat string to certain length

What is an efficient way to repeat a string to a certain length? Eg: repeat('abc', 7) -> 'abcabca'
Here is my current code:
def repeat(string, length):
cur, old = 1, string
while len(string) < length:
string += old[cur-1]
cur = (cur+1)%len(old)
return string
Is there a better (more pythonic) way to do this? Maybe using list comprehension?
Jason Scheirer's answer is correct but could use some more exposition.
First off, to repeat a string an integer number of times, you can use overloaded multiplication:
>>> 'abc' * 7
'abcabcabcabcabcabcabc'
So, to repeat a string until it's at least as long as the length you want, you calculate the appropriate number of repeats and put it on the right-hand side of that multiplication operator:
def repeat_to_at_least_length(s, wanted):
return s * (wanted//len(s) + 1)
>>> repeat_to_at_least_length('abc', 7)
'abcabcabc'
Then, you can trim it to the exact length you want with an array slice:
def repeat_to_length(s, wanted):
return (s * (wanted//len(s) + 1))[:wanted]
>>> repeat_to_length('abc', 7)
'abcabca'
Alternatively, as suggested in pillmod's answer that probably nobody scrolls down far enough to notice anymore, you can use divmod to compute the number of full repetitions needed, and the number of extra characters, all at once:
def pillmod_repeat_to_length(s, wanted):
a, b = divmod(wanted, len(s))
return s * a + s[:b]
Which is better? Let's benchmark it:
>>> import timeit
>>> timeit.repeat('scheirer_repeat_to_length("abcdefg", 129)', globals=globals())
[0.3964178159367293, 0.32557755894958973, 0.32851039397064596]
>>> timeit.repeat('pillmod_repeat_to_length("abcdefg", 129)', globals=globals())
[0.5276265419088304, 0.46511475392617285, 0.46291469305288047]
So, pillmod's version is something like 40% slower, which is too bad, since personally I think it's much more readable. There are several possible reasons for this, starting with its compiling to about 40% more bytecode instructions.
Note: these examples use the new-ish // operator for truncating integer division. This is often called a Python 3 feature, but according to PEP 238, it was introduced all the way back in Python 2.2. You only have to use it in Python 3 (or in modules that have from __future__ import division) but you can use it regardless.
def repeat_to_length(string_to_expand, length):
return (string_to_expand * ((length/len(string_to_expand))+1))[:length]
For python3:
def repeat_to_length(string_to_expand, length):
return (string_to_expand * (int(length/len(string_to_expand))+1))[:length]
This is pretty pythonic:
newstring = 'abc'*5
print newstring[0:6]
def rep(s, m):
a, b = divmod(m, len(s))
return s * a + s[:b]
from itertools import cycle, islice
def srepeat(string, n):
return ''.join(islice(cycle(string), n))
Perhaps not the most efficient solution, but certainly short & simple:
def repstr(string, length):
return (string * length)[0:length]
repstr("foobar", 14)
Gives "foobarfoobarfo". One thing about this version is that if length < len(string) then the output string will be truncated. For example:
repstr("foobar", 3)
Gives "foo".
Edit: actually to my surprise, this is faster than the currently accepted solution (the 'repeat_to_length' function), at least on short strings:
from timeit import Timer
t1 = Timer("repstr('foofoo', 30)", 'from __main__ import repstr')
t2 = Timer("repeat_to_length('foofoo', 30)", 'from __main__ import repeat_to_length')
t1.timeit() # gives ~0.35 secs
t2.timeit() # gives ~0.43 secs
Presumably if the string was long, or length was very high (that is, if the wastefulness of the string * length part was high) then it would perform poorly. And in fact we can modify the above to verify this:
from timeit import Timer
t1 = Timer("repstr('foofoo' * 10, 3000)", 'from __main__ import repstr')
t2 = Timer("repeat_to_length('foofoo' * 10, 3000)", 'from __main__ import repeat_to_length')
t1.timeit() # gives ~18.85 secs
t2.timeit() # gives ~1.13 secs
How about string * (length / len(string)) + string[0:(length % len(string))]
i use this:
def extend_string(s, l):
return (s*l)[:l]
Not that there haven't been enough answers to this question, but there is a repeat function; just need to make a list of and then join the output:
from itertools import repeat
def rep(s,n):
''.join(list(repeat(s,n))
Yay recursion!
def trunc(s,l):
if l > 0:
return s[:l] + trunc(s, l - len(s))
return ''
Won't scale forever, but it's fine for smaller strings. And it's pretty.
I admit I just read the Little Schemer and I like recursion right now.
This is one way to do it using a list comprehension, though it's increasingly wasteful as the length of the rpt string increases.
def repeat(rpt, length):
return ''.join([rpt for x in range(0, (len(rpt) % length))])[:length]
Another FP aproach:
def repeat_string(string_to_repeat, repetitions):
return ''.join([ string_to_repeat for n in range(repetitions)])
def extended_string (word, length) :
extra_long_word = word * (length//len(word) + 1)
required_string = extra_long_word[:length]
return required_string
print(extended_string("abc", 7))
c = s.count('a')
div=n//len(s)
if n%len(s)==0:
c= c*div
else:
m = n%len(s)
c = c*div+s[:m].count('a')
print(c)
Currently print(f"{'abc'*7}") generates:
abcabcabcabcabcabcabc

Resources