What is an efficient way to repeat a string to a certain length? Eg: repeat('abc', 7) -> 'abcabca'
Here is my current code:
def repeat(string, length):
cur, old = 1, string
while len(string) < length:
string += old[cur-1]
cur = (cur+1)%len(old)
return string
Is there a better (more pythonic) way to do this? Maybe using list comprehension?
Jason Scheirer's answer is correct but could use some more exposition.
First off, to repeat a string an integer number of times, you can use overloaded multiplication:
>>> 'abc' * 7
'abcabcabcabcabcabcabc'
So, to repeat a string until it's at least as long as the length you want, you calculate the appropriate number of repeats and put it on the right-hand side of that multiplication operator:
def repeat_to_at_least_length(s, wanted):
return s * (wanted//len(s) + 1)
>>> repeat_to_at_least_length('abc', 7)
'abcabcabc'
Then, you can trim it to the exact length you want with an array slice:
def repeat_to_length(s, wanted):
return (s * (wanted//len(s) + 1))[:wanted]
>>> repeat_to_length('abc', 7)
'abcabca'
Alternatively, as suggested in pillmod's answer that probably nobody scrolls down far enough to notice anymore, you can use divmod to compute the number of full repetitions needed, and the number of extra characters, all at once:
def pillmod_repeat_to_length(s, wanted):
a, b = divmod(wanted, len(s))
return s * a + s[:b]
Which is better? Let's benchmark it:
>>> import timeit
>>> timeit.repeat('scheirer_repeat_to_length("abcdefg", 129)', globals=globals())
[0.3964178159367293, 0.32557755894958973, 0.32851039397064596]
>>> timeit.repeat('pillmod_repeat_to_length("abcdefg", 129)', globals=globals())
[0.5276265419088304, 0.46511475392617285, 0.46291469305288047]
So, pillmod's version is something like 40% slower, which is too bad, since personally I think it's much more readable. There are several possible reasons for this, starting with its compiling to about 40% more bytecode instructions.
Note: these examples use the new-ish // operator for truncating integer division. This is often called a Python 3 feature, but according to PEP 238, it was introduced all the way back in Python 2.2. You only have to use it in Python 3 (or in modules that have from __future__ import division) but you can use it regardless.
def repeat_to_length(string_to_expand, length):
return (string_to_expand * ((length/len(string_to_expand))+1))[:length]
For python3:
def repeat_to_length(string_to_expand, length):
return (string_to_expand * (int(length/len(string_to_expand))+1))[:length]
This is pretty pythonic:
newstring = 'abc'*5
print newstring[0:6]
def rep(s, m):
a, b = divmod(m, len(s))
return s * a + s[:b]
from itertools import cycle, islice
def srepeat(string, n):
return ''.join(islice(cycle(string), n))
Perhaps not the most efficient solution, but certainly short & simple:
def repstr(string, length):
return (string * length)[0:length]
repstr("foobar", 14)
Gives "foobarfoobarfo". One thing about this version is that if length < len(string) then the output string will be truncated. For example:
repstr("foobar", 3)
Gives "foo".
Edit: actually to my surprise, this is faster than the currently accepted solution (the 'repeat_to_length' function), at least on short strings:
from timeit import Timer
t1 = Timer("repstr('foofoo', 30)", 'from __main__ import repstr')
t2 = Timer("repeat_to_length('foofoo', 30)", 'from __main__ import repeat_to_length')
t1.timeit() # gives ~0.35 secs
t2.timeit() # gives ~0.43 secs
Presumably if the string was long, or length was very high (that is, if the wastefulness of the string * length part was high) then it would perform poorly. And in fact we can modify the above to verify this:
from timeit import Timer
t1 = Timer("repstr('foofoo' * 10, 3000)", 'from __main__ import repstr')
t2 = Timer("repeat_to_length('foofoo' * 10, 3000)", 'from __main__ import repeat_to_length')
t1.timeit() # gives ~18.85 secs
t2.timeit() # gives ~1.13 secs
How about string * (length / len(string)) + string[0:(length % len(string))]
i use this:
def extend_string(s, l):
return (s*l)[:l]
Not that there haven't been enough answers to this question, but there is a repeat function; just need to make a list of and then join the output:
from itertools import repeat
def rep(s,n):
''.join(list(repeat(s,n))
Yay recursion!
def trunc(s,l):
if l > 0:
return s[:l] + trunc(s, l - len(s))
return ''
Won't scale forever, but it's fine for smaller strings. And it's pretty.
I admit I just read the Little Schemer and I like recursion right now.
This is one way to do it using a list comprehension, though it's increasingly wasteful as the length of the rpt string increases.
def repeat(rpt, length):
return ''.join([rpt for x in range(0, (len(rpt) % length))])[:length]
Another FP aproach:
def repeat_string(string_to_repeat, repetitions):
return ''.join([ string_to_repeat for n in range(repetitions)])
def extended_string (word, length) :
extra_long_word = word * (length//len(word) + 1)
required_string = extra_long_word[:length]
return required_string
print(extended_string("abc", 7))
c = s.count('a')
div=n//len(s)
if n%len(s)==0:
c= c*div
else:
m = n%len(s)
c = c*div+s[:m].count('a')
print(c)
Currently print(f"{'abc'*7}") generates:
abcabcabcabcabcabcabc
Related
temp = "75.1,77.7,83.2,82.5,81.0,79.5,85.7"
I am stuck in this assignment and unable to find a relevant answer to help.
I’ve used .split(",") and float()
and I am still stuck here.
temp = "75.1,77.7,83.2,82.5,81.0,79.5,85.7"
li = temp.split(",")
def avr(li):
av = 0
for i in li:
av += float(i)
return av/len(li)
print(avr(li))
You can use sum() to add the elements of a tuple of floats:
temp = "75.1,77.7,83.2,82.5,81.0,79.5,85.7"
def average (s_vals):
vals = tuple ( float(v) for v in s_vals.split(",") )
return sum(vals) / len(vals)
print (average(temp))
Admittedly similar to the answer by #emacsdrivesmenuts (GMTA).
However, opting to use the efficient map function which should scale nicely for larger strings. This approach removes the for loop and explicit float() conversion of each value, and passes these operations to the lower-level (highly optimised) C implementation.
For example:
def mean(s):
vals = tuple(map(float, s.split(',')))
return sum(vals) / len(vals)
Example use:
temp = '75.1,77.7,83.2,82.5,81.0,79.5,85.7'
mean(temp)
>>> 80.67142857142858
I tried with the below code but it seems not to give the intended output.
ran = ''.join(random.choice(string.ascii_uppercase+string.digits) for x in range(10))
So the above code gives '6U1S75' but I want output like
['6U1S75', '4Z4UKK', '9111K4',....]
Please help.
I thought this is elegant :
from string import digits, ascii_letters
from random import choices
def rand_list_of_strings(list_size, word_size, pool=ascii_letters + digits):
return ["".join(choices(pool, k=word_size)) for _ in range(list_size)]
I used ascii_letters instead of ascii_uppercase to have both upper and lower case values, you can edit it to your suiting.
Example use of the above function :
>>> rand_list_of_strings(4, 5)
['wBSbH', 'rJoH8', '9Gx4q', '8Epus']
>>> rand_list_of_strings(4, 10)
['UWyRglswlN', 'w0Yr7xlU5L', 'p0e6rghGMS', 'Z8zX2Vqyve']
>>>
The first argument is the list size, and the second argument is how large each consequent string should be, and the function invocation returns a list instance. Do not that this should not be used for cryptographic purposes.
Take a look at this.
list_size = 10
word_size = 4
ran = []
for i in range(list_size):
rans = ''
for j in range(word_size):
rans += random.choice(string.ascii_uppercase + string.digits)
ran.append(rans)
Though the above solution is clearer and should be preferred, if you absolutely want to do this with list comprehension...
list_size = 10
word_size = 4
ran = [
''.join([
random.choice(string.ascii_uppercase + string.digits)
for j in range(word_size)
])
for i in range(list_size)
]
In this kata you need to build a function to return either true/True or false/False if a string can be seen as the repetition of a simpler/shorter subpattern or not.
For example:
has_subpattern("a") == False #no repeated pattern
has_subpattern("aaaa") == True #created repeating "a"
has_subpattern("abcd") == False #no repeated pattern
has_subpattern("abababab") == True #created repeating "ab"
has_subpattern("ababababa") == False #cannot be entirely reproduced repeating a pattern
Strings will never be empty and can be composed of any character (just consider upper- and lowercase letters as different entities) and can be pretty long (keep an eye on performances!).
My solution is:
def has_subpattern(string):
string_size = len(string)
for i in range(1, string_size):
slice1 = string[:i]
appearence_count = string.count(slice1)
slice1_len = len(slice1)
if appearence_count > 0:
if appearence_count * slice1_len == string_size:
return True
return False
Obviously there are weak and too slow things like slice1 = string[:i] and string.count() in loop..
Is there better ways to solve an issue or ways to improve performance ?
Short regex approach:
import re
def has_subpattern_re(s):
return bool(re.search(r'^(\w+)\1+$', s))
It'll provide a close (to initial has_subpattern approach) performance on small strings:
import timeit
...
print(timeit.timeit('has_subpattern("abababab")', 'from __main__ import has_subpattern'))
0.7413144190068124
print(timeit.timeit('has_subpattern_re("abababab")', 'from __main__ import re, has_subpattern_re'))
0.856149295999785
But, a significant performance increase (in about 3-5 times faster) on long strings:
print(timeit.timeit('has_subpattern("ababababababababababababababababababababababababa")', 'from __main__ import has_subpattern'))
14.669428467008402
print(timeit.timeit('has_subpattern_re("ababababababababababababababababababababababababa")', 'from __main__ import re, has_subpattern_re'))
4.308312018998549
And one more test for a more longer string:
print(timeit.timeit('has_subpattern("ababababababababababababababababababababababababaababababababababababababababababababababababababab")', 'from __main__ import has_subpattern'))
35.998031173992786
print(timeit.timeit('has_subpattern_re("ababababababababababababababababababababababababaababababababababababababababababababababababababab")', 'from __main__ import re, has_subpattern_re'))
7.010367843002314
Within standard Python, the bottlenecks here will be count, which enjoys C speed implementation and the looping.
The looping itself may be hard to speed-up (althogh Cython may be of some help).
Hence, the most important optimization is to reduce the number of loopings.
One obvious way is to let range() do not exceed half the size of the input (+ 2: + 1 for rounding issues, + 1 for end extrema exclusion in range()):
Also, string is a standard Python module, so better not use it as a variable name.
def has_subpattern_loop(text):
for i in range(1, len(text) // 2 + 2):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A much more effective way of restricting the number of calls to count is to skip computation when i is not a multiple of the length of the input.
def has_subpattern_loop2(text):
for i in range(1, len(text) // 2 + 2):
if len(text) % i == 0:
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
Even better would be to generate only the divisors of the length of the input.
This could be done using sympy and the approach outlined here:
import sympy as sym
import functools
def get_divisors(n):
if n == 1:
yield 1
return
factors = list(sym.factor_.factorint(n).items())
nfactors = len(factors)
f = [0] * nfactors
while True:
yield functools.reduce(lambda x, y: x * y, [factors[x][0]**f[x] for x in range(nfactors)], 1)
i = 0
while True:
f[i] += 1
if f[i] <= factors[i][1]:
break
f[i] = 0
i += 1
if i >= nfactors:
return
def has_subpattern_divs(text):
for i in get_divisors(len(text)):
subtext = text[:i]
num_subtext = text.count(subtext)
if num_subtext > 1 and num_subtext * len(subtext) == len(text):
return True
return False
A completely different approach is the one proposed in #ВладДавидченко answer:
def has_subpattern_find(text):
return (text * 2).find(text, 1) != len(text)
or the more memory efficient (requires ~50% less additional memory compared to has_subpattern_find2()):
def has_subpattern_find2(text):
return (text + text[:len(text) // 2 + 2]).find(text, 1) > 0
and it is based on the idea that if there is a exactly self-repeating string, the string itself must be found in a circularly extended string:
Input: abab
Extension1: abababab
Found1: |-abab
Extension2: ababab
Found2: |-abab
Input: ababa
Extension1: ababaababa
Found1: |----ababa
Extension2: ababab
Found2: NOT FOUND!
The find-based method are the fastest, with has_subpattern_find() being fastest in the small input size regime, and has_subpattern_find2() gets generally faster in the intermediate and large input size regime (especially in the False case).
For shorter inputs, the direct looping approaches (especially has_subpattern_loop2()) are fastest, closely followed (but sometimes surpassed by has_subpattern_re()), but as soon as the input gets bigger (and especially for the False outcome), the has_subpattern_divs() method gets to be the fastest (aside of find-based ones) by far and large, as shown by the following benchmarks.
For the True outcome, has_subpattern_loop2() gets to be the fastest due to the very small number of loops required, which is independent of the input size.
The input is generated as a function of n using:
def gen_input(n, m=0):
g = string.ascii_lowercase
if not m:
m = n
offset = '!' if n % 2 else ''
return g[:n] * (m // min(n, len(g)) + 2) + offset
so that if n is even, the has_subpattern*() always return True and the opposite for odd n.
Note that, in general, the has_subpattern() function will depend not only on the raw size of the input but also on the length of the repeating string, if any. This is not explored in the benchmarks, except for the odd/even separation.
Even Inputs
Odd Inputs
(Full code available here).
(EDITED to include some more solutions as well as comparison with regex-based solution from #RomanPerekhrest)
(EDITED to include some more solutions based on the find from #ВладДавидченко)
Found another one solution, probably will be useful:
def has_subpattern(string):
return (string * 2).find(string, 1) != len(string)
Learning Python just for two months, so please be patient .)
Stumbled at very interesting task - perform arithmetic operations with huge numbers ( n: int, number of digits < 10**6 ), represented as str ( len(str) < 10**6 ). I need to split this string in parts, make some simple arithmetics, and give output back as string. Converting to int and back takes about 0.5 sec:
from time import time
from random import choice
string = ''
digits = [str(i) for i in range(10)]
for _ in range(1, 99999):
string += choice(digits)
start = time()
a, b = string[:50000], string[50000:] # Split in half
a, b = map(int, [a, b])
print('str to int:', time() - start)
# Output: str to int: 0.1092...
c = a + b
start = time()
output = ''.join([str(part) for part in [a, b, c]])
print('int to str:', time() - start)
# Output: int to str: 0.4836...
Problem starts when I need to perform this operation in a loop with hundreds iterations
This is something about reverse hashing-like functions
My simple question is this - what kind of optimisation is possible for this kind of situation?
With my best regards! .)
Vadim, I can't answer your question directly. However, let me point you towards the gmpy2 library. The principal author is one of the outrageously clever people we used to see often here on SO; you could use gmpy2 itself, or have a look at the source code.
>>> from gmpy2 import *
>>> s = '012345678901234'
>>> t = '123456789012345'
>>> N = mpz(s)+mpz(t)
>>> N
mpz(135802467913579)
>>> str(N)
'135802467913579'
Maybe I'm missing something but I can't find a straightforward way to accomplish this simple task. When I go to negate a binary number through the "~" operator it returns a negative number due to the two's complement:
>>> bin(~0b100010) # this won't return '0b011101'
'-0b100011'
What about if I just want to switch 0s into 1s and vice-versa, like in classic logical complement?
>>> bin(0b111111 ^ 0b100010)
'0b11101'
>>>
YOU's answer as a function:
def complement(n):
size = len(format(n, 'b'))
comp = n ^ ((1 << size) - 1)
return '0b{0:0{1}b}'.format(comp, size)
>>> complement(0b100010)
'0b011101'
I made it preserve the bit length of the original. The int constructor doesn't care about the leading zeros:
>>> complement(0b1111111100000000)
'0b0000000011111111'
>> int(complement(0b1111111100000000), 2)
255
Ultra nasty:
>>> '0b' + ''.join('10'[int(x)] for x in format(0b100010,'b')).lstrip('0')
'0b11101'
Here's another couple of functions that returns the complement of a number I came out with.
A one-liner:
def complement(c):
return c ^ int('1'*len(format(c, 'b')), 2)
A more mathematical way:
def complement(c):
n=0
for b in format(c, 'b'): n=n<<1|int(b)^1
return n
Moreover, one-linerizing this last one with functools (such baroque):
def complement(c):
return functools.reduce( lambda x,y: x<<1|y, [ int(b)^1 for b in format(c, 'b') ])
Finally, a uselessly nerdish variant of the first one that uses math.log to count the binary digits:
def complement(c):
c ^ int('1' * math.floor(math.log((c|1)<<1, 2)), 2)
Another function more a 'Hack' for complementing a Integer. You can use the same logic for complementing binary. Wonder why I did not come across python external libs that can do same. The next ver of Python should take care of this in built-ins
def complement(x):
b = bin(x)[2:]
c= []
for num in b:
if num == '1': c.append('0')
elif num == '0': c.append('1')
cat = ''.join(c)
res = int(cat, 2)
return print(res)