Replace string characters with their word index

Replace string characters with their word index - python-3.x

Note the two consecutive spaces in this string:
string = "Hello there everyone!"
for i, c in enumerate(string):
print(i, c)
0 H
1 e
2 l
3 l
4 o
5
6 t
7 h
8 e
9 r
10 e
11
12
13 e
14 v
15 e
16 r
17 y
18 o
19 n
20 e
21 !
How can I make a list len(string) long, with each value containing the word count up to that point in the string?
Expected output: 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2
The only way I could do it was by looping through each character, setting a space=True flag and increasing a counter each time I hit a non-space character when space == True. This is probably because I'm most proficient with C, but I would like to learn a more Pythonic way to solve this.

I feel like your solution is not that far from being pythonic. Maybe you can use the zip operator to iterate your string two by two and then just detect local changes (from a space to a letter -> this is a new word):
string = "Hello there everyone!"
def word_index(phrase):
nb_words = 0
for a, b in zip(phrase, phrase[1:]):
if a == " " and b != " ":
nb_words += 1
yield nb_words
print(list(word_index(string)))
This also make use of generators which is quite common in python (see the documentation for the yield keyword). You can probably do the same by using itertools.accumulate instead of the for loop, but I'm not sure it wouldn't obfuscate the code (see the third item from The Zen of Python). Here is what it would look like, note that I used a lambda function here, not because I think it's the best choice, but simply because I couldn't find any meaningful function name:
import itertools
def word_index(phrase):
char_pairs = zip(phrase, phrase[1:])
new_words = map(lambda p: int(p[0] == " " and p[1] != " "), char_pairs)
return itertools.accumulate(new_words)
This second version similarly to the first one returns an iterator. Note that using a iterators is usually a good idea as it doesn't make any assumption on whether your user want to instantiate anything. If the user want to transform an iterator it to a list he can always call list(it) as I did in the first piece of code. Iterators simply gives you the values one by one: at any point in time, there only is a single value in memory:
for word_index in word_index(string):
print(word_index)
Remark that phrase[1:] makes a copy of the phrase, which means it doubles the memory used. This can be improved by using itertools.islice which returns an iterator (and therefore only use constant memory). The second version would for example look like this:
def word_index(phrase):
char_pairs = zip(phrase, itertools.islice(phrase, 1, None))
new_words = map(lambda p: int(p[0] == " " and p[1] != " "), char_pairs)
return itertools.accumulate(new_words)

Related

ValueError: invalid literal for int() after reading input into a tuple

I am writing a code that takes some numbers as tuple and then verify if there are numbers divisible by 3.
I am a beginner in python and just know some basic stuff about tuples. I have my code below:
def Div3and5():
data=tuple(input("Enter 3 numbers:"))
c=[]
a=0
for i in range(0,len(data)):
d=data[i]
c.append(d)
m=[int(x) for x in c]
print(m)
for i in m:
if m[i]%3==0:
print("It is not divisible")
Div3and5()
So, when I run this code I get an error which is:
ValueError: invalid literal for int() with base 10: ','
See, the values are stored as integers and when I give a command of printing c, it clearly shows all elements. Then, I try to convert each element to integers but it says error I don't know why. So, can you tell me the reason for that error. And also is there any straight-way for using this (divisibility) operation directly on tuples without converting them to list first.
Thanks

You are likely entering the numbers with spaces (or commas) in between. Hence, these spaces (or commas) make it into the tuple -- and can't be converted into ints.
Try instead, using str.split() to put the input numbers into a list.
def Div3and5():
c = input("Enter 3 numbers:").split(",")
# Gives you the list e.g. ["3", "4", "5"]
m = [int(x) for x in c]
# Gives you the list e.g. [3, 4, 5]
for i in m:
if i % 3 == 0:
print(f"{i} is divisible by 3")
Div3and5()
Remember that str.split() will accept a delimiter as an argument. In this case, I've put a comma ,, but you can have a space ' ' instead, depending on how your input should be entered by the user.
Note also, you were doing if m[i] % 3 == 0 in the if statement instead of if i % 3 == 0. This is not correct since i in each iteration of the loop is an element of the list m, and not an index.
Also, your condition i % 3 == 0 is such that if i is divisible by 3, then the print should indicate that the number is divisible -- you were printing that it's not divisible (probably a typo).
If you want all the numbers divisible by 3 and 5, you can change the condition like this:
if i % 3 == 0 and i % 5 == 0:
print(f"{i} is divisible by 3 and 5")

Here is the answer of your QUESTION :
def Div3and5():
c = input("Enter 3 numbers:")
list=c.split(",")
c=tuple(list)
m = [int(x) for x in c]
for i in m:
if i % 3 == 0:
print(f'{i} is is divisible by 3')
Div3and5()
#Enter 3 numbers with "," : 123,45,67

Array sorting timing out for huge size of arrays

I am doing an online code challenge. I have an array which I need to sort and record to minimum number of iterations required to be sorted. I have the following code.
def minSwap(ar):
c = 0
for i in range(0, len(ar)):
if ar[i] == i+1:
continue
else:
for k in range(i+1, len(ar)):
if ar[k] == i+1:
ar[k] = ar[i]
ar[i] = i+1
c = c+1
break
return c
This code passes majority of test cases, however for really huge number of case such as where array size is beyond (let's say 50000) it gets timeout.
How can I identify the faulty block of code? I can't see a way to tweak it further.

Looking at the problem statement, it looks like you want to sort a list that has numbers starting from 1 thru n.
If you are trying to sort the list, and the final list is expected to be [1, 2, 3, 4, 5, 6, 7 , 8, .....], then all you need to do is to insert the i+1 to the current position and pop the value of i+1 from its current position. That will reduce the number of iterations you need to sort or swap.
Here's the code that does this with least number of moves. At least that's what I found based on my tests.
def minSwap(ar):
c=0
for i in range(0, len(ar)):
if ar[i] != i+1:
#find the value of i+1 from i+1th position
temp = ar.index(i+1,i+1)
ar.insert(i,i+1) #insert i+1 in the ith position
ar.pop(temp+1) #remove the value of i+1 from the right
c+=1 #every time you do a swap, increment the counter
print (ar) #if you want to check if ar is correct, use this print stmt
return c
a = [1,3,4,5,6,7,2,8]
print (minSwap(a))
The total number of swaps for the above example is 1. It just inserts 2 in the second place and pops out 2 from position 6.
I ran the code for a = [1,6,5,4,3,8,2,7] and it swapped in 5 moves.
I ran the code for a = [1,3,5,4,6,8,2,7] and it swapped in 3 moves.
If you are trying to figure out how this works, use a print statement right after the if statement. It will tell you the element being swapped.

From your code I take it that sorting isn't the issue here, since you know you'll end up with ar[i] == i+1. Given that, why not change your else block to swap the current element into its slot, and repeat until you ar[i] is correct.
else:
while ar[i] != i+1:
temp = ar[i]
ar[i] = ar[temp - 1]
ar[temp - 1] = temp

You don't actually need to do a sort on this array. You just need to figure out the minimum number of swaps needed. If we just look at the following pattern, we can form a hypothesis to be tested:
1234 = 0
1324 = 1, swap 2 and 3
1423 = 2, swap 2 and 4, swap 3 and 4
4213 = 2, swap 1 and 4, swap 3 and 4
4123 = 3, swap 4 and 1, swap 4 and 2, swap 4 and 3
Based on these observations, I think we can work on the hypothesis that the answer will be max(0, n - 1) where n is the count of the number of "out of place" elements.
Then the code becomes simplified to:
def minSwap(ar):
c = 0
for i in range(0, len(ar)):
if ar[i] != i+1:
c = c + 1
return c < 0 ? 0 : c
Note that I don't actually know python so don't know if that last ternary is valid in python.

find the first occurrence of a number greater than k in a sorted array

For the given sorted list,the program should return the index of the number in the list which is greater than the number which is given as input.
Now when i run code and check if it is working i am getting 2 outputs. One is the value and other output is None.
If say i gave a input of 3 for the below code.The expected output is index of 20 i.e., 1 instead i am getting 1 followed by None.
If i give any value that is greater than the one present in the list i am getting correct output i.e., "The entered number is greater than the numbers in the list"
num_to_find = int(input("Enter the number to be found"))
a=[2,20,30]
def occur1(a,num_to_find):
j = i = 0
while j==0:
if a[len(a)-1] > num_to_find:
if num_to_find < a[i]:
j=1
print(i)
break
else:
i = i + 1
else:
ret_state = "The entered number is greater than the numbers in the list"
return ret_state
print(occur1(a,num_to_find))

This code is difficult to reason about due to extra variables, poor variable names (j is typically used as an index, not a bool flag), usage of break, nested conditionals and side effect. It's also inefficient because it needs to visit each element in the list in the worst case scenario and fails to take advantage of the sorted nature of the list to the fullest. However, it appears working.
Your first misunderstanding is likely that print(i) is printing the index of the next largest element rather than the element itself. In your example call of occur1([2, 20, 30], 3)), 1 is where 20 lives in the array.
Secondly, once the found element is printed, the function returns None after it breaks from the loop, and print dutifully prints None. Hopefully this explains your output--you can use return a[i] in place of break to fix your immediate problem and meet your expectations.
Having said that, Python has a builtin module for this: bisect. Here's an example:
from bisect import bisect_right
a = [1, 2, 5, 6, 8, 9, 15]
index_of_next_largest = bisect_right(a, 6)
print(a[index_of_next_largest]) # => 8
If the next number greater than k is out of bounds, you can try/except that or use a conditional to report the failure as you see fit. This function takes advantage of the fact that the list is sorted using a binary search algorithm, which cuts the search space in half on every step. The time complexity is O(log(n)), which is very fast.
If you do wish to stick with a linear algorithm similar to your solution, you can simplify your logic to:
def occur1(a, num_to_find):
for n in a:
if n > num_to_find:
return n
# test it...
a = [2, 5, 10]
for i in range(11):
print(i, " -> ", occur1(a, i))
Output:
0 -> 2
1 -> 2
2 -> 5
3 -> 5
4 -> 5
5 -> 10
6 -> 10
7 -> 10
8 -> 10
9 -> 10
10 -> None
Or, if you want the index of the next largest number:
def occur1(a, num_to_find):
for i, n in enumerate(a):
if n > num_to_find:
return i
But I want to stress that the binary search is, by every measure, far superior to the linear search. For a list of a billion elements, the binary search will make about 20 comparisons in the worst case where the linear version will make a billion comparisons. The only reason not to use it is if the list can't be guaranteed to be pre-sorted, which isn't the case here.
To make this more concrete, you can play with this program (but use the builtin module in practice):
import random
def bisect_right(a, target, lo=0, hi=None, cmps=0):
if hi is None:
hi = len(a)
mid = (hi - lo) // 2 + lo
cmps += 1
if lo <= hi and mid < len(a):
if a[mid] < target:
return bisect_right(a, target, mid + 1, hi, cmps)
elif a[mid] > target:
return bisect_right(a, target, lo, mid - 1, cmps)
else:
return cmps, mid + 1
return cmps, mid + 1
def linear_search(a, target, cmps=0):
for i, n in enumerate(a):
cmps += 1
if n > target:
return cmps, i
return cmps, i
if __name__ == "__main__":
random.seed(42)
trials = 10**3
list_size = 10**4
binary_search_cmps = 0
linear_search_cmps = 0
for n in range(trials):
test_list = sorted([random.randint(0, list_size) for _ in range(list_size)])
test_target = random.randint(0, list_size)
res = bisect_right(test_list, test_target)[0]
binary_search_cmps += res
linear_search_cmps += linear_search(test_list, test_target)[0]
binary_search_avg = binary_search_cmps / trials
linear_search_avg = linear_search_cmps / trials
s = "%s search made %d comparisons across \n%d searches on random lists of %d elements\n(found the element in an average of %d comparisons\nper search)\n"
print(s % ("binary", binary_search_cmps, trials, list_size, binary_search_avg))
print(s % ("linear", linear_search_cmps, trials, list_size, linear_search_avg))
Output:
binary search made 12820 comparisons across
1000 searches on random lists of 10000 elements
(found the element in an average of 12 comparisons
per search)
linear search made 5013525 comparisons across
1000 searches on random lists of 10000 elements
(found the element in an average of 5013 comparisons
per search)
The more elements you add, the worse the situation looks for the linear search.

I would do something along the lines of:
num_to_find = int(input("Enter the number to be found"))
a=[2,20,30]
def occur1(a, num_to_find):
for i in a:
if not i <= num_to_find:
return a.index(i)
return "The entered number is greater than the numbers in the list"
print(occur1(a, num_to_find))
Which gives the output of 1 (when inputting 3).
The reason yours gives you 2 outputs, is because you have 2 print statements inside your code.

Algorithm for generating all string combinations

Say I have a list of strings, like so:
strings = ["abc", "def", "ghij"]
Note that the length of a string in the list can vary.
The way you generate a new string is to take one letter from each element of the list, in order. Examples: "adg" and "bfi", but not "dch" because the letters are not in the same order in which they appear in the list. So in this case where I know that there are only three elements in the list, I could fairly easily generate all possible combinations with a nested for loop structure, something like this:
for i in strings[0].length:
for ii in strings[1].length:
for iii in strings[2].length:
print(i+ii+iii)
The issue arises for me when I don't know how long the list of strings is going to be beforehand. If the list is n elements long, then my solution requires n for loops to succeed.
Can any one point me towards a relatively simple solution? I was thinking of a DFS based solution where I turn each letter into a node and creating a connection between all letters in adjacent strings, but this seems like too much effort.

In python, you would use itertools.product
eg.:
>>> for comb in itertools.product("abc", "def", "ghij"):
>>> print(''.join(comb))
adg
adh
adi
adj
aeg
aeh
...
Or, using an unpack:
>>> words = ["abc", "def", "ghij"]
>>> print('\n'.join(''.join(comb) for comb in itertools.product(*words)))
(same output)
The algorithm used by product is quite simple, as can be seen in its source code (Look particularly at function product_next). It basically enumerates all possible numbers in a mixed base system (where the multiplier for each digit position is the length of the corresponding word). A simple implementation which only works with strings and which does not implement the repeat keyword argument might be:
def product(words):
if words and all(len(w) for w in words):
indices = [0] * len(words)
while True:
# Change ''.join to tuple for a more accurate implementation
yield ''.join(w[indices[i]] for i, w in enumerate(words))
for i in range(len(indices), 0, -1):
if indices[i - 1] == len(words[i - 1]) - 1:
indices[i - 1] = 0
else:
indices[i - 1] += 1
break
else:
break

From your solution it seems that you need to have as many for loops as there are strings. For each character you generate in the final string, you need a for loop go through the list of possible characters. To do that you can make recursive solution. Every time you go one level deep in the recursion, you just run one for loop. You have as many level of recursion as there are strings.
Here is an example in python:
strings = ["abc", "def", "ghij"]
def rec(generated, k):
if k==len(strings):
print(generated)
return
for c in strings[k]:
rec(generated + c, k+1)
rec("", 0)

Here's how I would do it in Javascript (I assume that every string contains no duplicate characters):
function getPermutations(arr)
{
return getPermutationsHelper(arr, 0, "");
}
function getPermutationsHelper(arr, idx, prefix)
{
var foundInCurrent = [];
for(var i = 0; i < arr[idx].length; i++)
{
var str = prefix + arr[idx].charAt(i);
if(idx < arr.length - 1)
{
foundInCurrent = foundInCurrent.concat(getPermutationsHelper(arr, idx + 1, str));
}
else
{
foundInCurrent.push(str);
}
}
return foundInCurrent;
}
Basically, I'm using a recursive approach. My base case is when I have no more words left in my array, in which case I simply add prefix + c to my array for every c (character) in my last word.
Otherwise, I try each letter in the current word, and pass the prefix I've constructed on to the next word recursively.
For your example array, I got:
adg adh adi adj aeg aeh aei aej afg afh afi afj bdg bdh bdi
bdj beg beh bei bej bfg bfh bfi bfj cdg cdh cdi cdj ceg ceh
cei cej cfg cfh cfi cfj

Python: How to display all print statements in a if-else statement

This is one of the lab questions: I try to create a program that generates a list of N random integers between 0 and 19 and computes the element strictly less than 5, 10, 15 and 20. I want to print all of the 'There are {} elements between x and y' statements.
When I run the program, it only shows the first one, and not the others. How do I correct it?
from random import randint
import sys
while True:
nb_of_elements = input('How many element do you want to generate? ')
try:
nb_of_elements = int(nb_of_elements)
break
except ValueError:
print('Input is not an integer, try again...')
L = [randint(0, 19) for _ in range (nb_of_elements)]
print('The list is :', L)
number = [0] * 4
for i in range (nb_of_elements):
number[L[i] // 5]+=1
for i in range(4):
if number[i] < 5:
print('There are {} elements between 0 and 4'.format (number[i]))
elif 5<= number[i] < 10:
print('There are {} elements between 5 and 9'.format(number[i]))
elif 10<= number[i] < 15:
print('There are {} elements between 10 and 14'.format(number[i]))
else:
print('There are {} elements between 15 and 20'.format(number[i]))

Your mistake is that you're attempting to count numbers in a range twice.
First, you use the trick with integer division:
for i in range (nb_of_elements):
number[L[i] // 5]+=1
So, number already contains the count of elements in the ranges 0--4, 5--9, 10--14 and 15--19 (inclusive).
Then, in your if-elif-elif-else block, you look at the value of number, whether it fits in any of these ranges. number, however, contains counts. On average, it will contain about nb_of_elements / 5 counts for each element.
You don't need the if-elif-elif-else block. Instead, loop through range(4) as you do know, and print each element number[i]. Each time, it'll correspond to the next range (you may need some smart thing to print the range. 5*i and 5*i+4 may do that).
It's kind-of interesting that you came up with a smart way to count the numbers in a range (number[L[i]//5] += 1), and then fell back to standard range comparison in an if-elif-else chain. I guess one can outsmart oneself.

You already have found a smart way to count fill the nb_of_elements list. Now you may want a smart way to print it. You can use enumerate to get the current index in the for loop: with this index, you can create the 'between X and Y' variables.
counts = [0] * 4
for i in range (nb_of_elements):
counts[L[i] // 5]+=1
# Loop the counts, and keep track of the index for enumerate
for i,count in enumerate(counts):
# i * 5 will be [0,5,10,15] and i * 5 + 5 will be [5,10,15,20]
print('There are {} elements between {} and {}'.format (count, i*5, i*5 + 5))
#The list is : [7, 10, 5]
#There are 0 elements between 0 and 5
#There are 2 elements between 5 and 10
#There are 1 elements between 10 and 15
#There are 0 elements between 15 and 20
In Python, a range is exclusive, meaning 'between 0 and 5' is [0,1,2,3,4]. I have chosen this notation for the print function as well: it now states 'between 0 and 5' (exclusive) instead of 'between 0 and 4' (inclusive) like you used in your code. This can of course be easily changed: i*5 + 5 > i*5 + 4.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Replace string characters with their word index - python-3.x

Related

ValueError: invalid literal for int() after reading input into a tuple

Array sorting timing out for huge size of arrays

find the first occurrence of a number greater than k in a sorted array

Algorithm for generating all string combinations

Python: How to display all print statements in a if-else statement

Categories

Resources