Trying to understand a Python line code - python-3.x

I am new to python, and when I search for a way to get a string length without using "len()", I found this answer:
sum([1 for _ in "your string goes here"])
Can someone help me understand this line,what's the '1' doing there for example?

This is basically equivalent to this:
lst = []
for dontCareAboutTheName in "your string goes here":
lst.append(1)
print(sum(lst))
The list comprehension basically collects the number 1 for each character it finds while looping through the string. So the list will contain exactly as many elements as the length of the string. And since all those list elements are 1, when calculating the sum of all those elements, you end up with the length of the string.

Related

Python - how to recursively search a variable substring in texts that are elements of a list

let me explain better what I mean in the title.
Examples of strings where to search (i.e. strings of variable lengths
each one is an element of a list; very large in reality):
STRINGS = ['sftrkpilotndkpilotllptptpyrh', 'ffftapilotdfmmmbtyrtdll', 'gftttepncvjspwqbbqbthpilotou', 'htfrpilotrtubbbfelnxcdcz']
The substring to find, which I know is for sure:
contained in each element of STRINGS
is also contained in a SOURCE string
is of a certain fixed LENGTH (5 characters in this example).
SOURCE = ['gfrtewwxadasvpbepilotzxxndffc']
I am trying to write a Python3 program that finds this hidden word of 5 characters that is in SOURCE and at what position(s) it occurs in each element of STRINGS.
I am also trying to store the results in an array or a dictionary (I do not know what is more convenient at the moment).
Moreover, I need to perform other searches of the same type but with different LENGTH values, so this value should be provided by a variable in order to be of more general use.
I know that the first point has been already solved in previous posts, but
never (as far as I know) together with the second point, which is the part of the code I could not be able to deal with successfully (I do not post my code because I know it is just too far from being fixable).
Any help from this great community is highly appreciated.
-- Maurizio
You can iterate over the source string and for each sub-string use the re module to find the positions within each of the other strings. Then if at least one occurrence was found for each of the strings, yield the result:
import re
def find(source, strings, length):
for i in range(len(source) - length):
sub = source[i:i+length]
positions = {}
for s in strings:
# positions[s] = [m.start() for m in re.finditer(re.escape(sub), s)]
positions[s] = [i for i in range(len(s)) if s.startswith(sub, i)] # Using built-in functions.
if not positions[s]:
break
else:
yield sub, positions
And the generator can be used as illustrated in the following example:
import pprint
pprint.pprint(dict(find(
source='gfrtewwxadasvpbepilotzxxndffc',
strings=['sftrkpilotndkpilotllptptpyrh',
'ffftapilotdfmmmbtyrtdll',
'gftttepncvjspwqbbqbthpilotou',
'htfrpilotrtubbbfelnxcdcz'],
length=5
)))
which produces the following output:
{'pilot': {'ffftapilotdfmmmbtyrtdll': [5],
'gftttepncvjspwqbbqbthpilotou': [21],
'htfrpilotrtubbbfelnxcdcz': [4],
'sftrkpilotndkpilotllptptpyrh': [5, 13]}}

Finding position of first letter in subtring in list of strings (Python 3)

I have a list of strings, and I'm trying to find the position of the first letter of the substring I am searching for in the list of strings. I'm using the find() method to do this, however when I try to print the position of the first letter Python returns the correct position but then throws a -1 after it, like it couldn't find the substring, but only after it could find it. I want to know how to return the position of the first letter of he substring without returning a -1 after the correct value.
Here is my code:
mylist = ["blasdactiverehu", "sdfsfgiuyremdn"]
word = "active"
if any(word in x for x in mylist) == True:
for x in mylist:
position = x.find(word)
print(position)
The output is:
5
-1
I expected the output to just be:
5
I think it may be related to the fact the loop is searching for the substring for every string in the list and after it's found the position it still searches for more but of course returns an error as there is only one occurrence of the substring "active", however I'm not sure how to stop searching after successfully finding one substring. Any help is appreciated, thank you.
Indeed your code will not work as you want it to, since given that any of the words contain the substring, it will do the check for each and every one of them.
A good way to avoid that is using a generator. More specifically, next()
default_val = '-1'
position = next((x.find(word) for x in mylist if word in x), default_val)
print(position)
It will simply give you the position of the substring "word" for the first string "x" that will qualify for the condition if word in x, in the list 'mylist'.
By the way, no need to check for == True when using any(), it already returns True/False, so you can simply do if any(): ...

Convert a string into an integer of its ascii values

I am trying to write a function that takes a string txt and returns an int of that string's character's ascii numbers. It also takes a second argument, n, that is an int that specified the number of digits that each character should translate to. The default value of n is 3. n is always > 3 and the string input is always non-empty.
Example outputs:
string_to_number('fff')
102102102
string_to_number('ABBA', n = 4)
65006600660065
My current strategy is to split txt into its characters by converting it into a list. Then, I convert the characters into their ord values and append this to a new list. I then try to combine the elements in this new list into a number (e.g. I would go from ['102', '102', '102'] to ['102102102']. Then I try to convert the first element of this list (aka the only element), into an integer. My current code looks like this:
def string_to_number(txt, n=3):
characters = list(txt)
ord_values = []
for character in characters:
ord_values.append(ord(character))
joined_ord_values = ''.join(ord_values)
final_number = int(joined_ord_values[0])
return final_number
The issue is that I get a Type Error. I can write code that successfully returns the integer of a single-character string, however when it comes to ones that contain more than one character, I can't because of this type error. Is there any way of fixing this. Thank you, and apologies if this is quite long.
Try this:
def string_to_number(text, n=3):
return int(''.join('{:0>{}}'.format(ord(c), n) for c in text))
print(string_to_number('fff'))
print(string_to_number('ABBA', n=4))
Output:
102102102
65006600660065
Edit: without list comprehension, as OP asked in the comment
def string_to_number(text, n=3):
l = []
for c in text:
l.append('{:0>{}}'.format(ord(c), n))
return int(''.join(l))
Useful link(s):
string formatting in python: contains pretty much everything you need to know about string formatting in python
The join method expects an array of strings, so you'll need to convert your ASCII codes into strings. This almost gets it done:
ord_values.append(str(ord(character)))
except that it doesn't respect your number-of-digits requirement.

How to detect palindrome cycle length in a string?

Suppose a string is like this "abaabaabaabaaba", the palindrome cycle here is 3, because you can find the string aba at every 3rd position and you can augment the palindrome by concatenating any number of "aba"s to the string.
I think it's possible to detect this efficiently using Manacher's Algorithm but how?
You can find it easily by searching the string S in S+S. The first index you find is the cycle number you want (may be the entire string). In python it would be something like:
In [1]: s = "abaabaabaabaaba"
In [2]: print (s+s).index(s, 1)
3
The 1 is there to ignore the index 0, that would be a trivial match.

Are two strings anagrams or not?

I want to find if two strings are anagrams or not..
I thought to sort them,and then check one by one but is there any algorithms for sorting stings? or another idea to make it? (simple ideas or code because i am a beginner )thanks
Strings are lists of characters in Haskell, so the standard sort simply works.
> import Data.List
> sort "hello"
"ehllo"
Your idea of sorting and then comparing sounds fine for checking anagrams.
I can give you and idea-(as I am not that much acquainted with haskell).
Take an array having 26 spaces.
Now for each character in the first string you increase certaing position in array.
If array A[26]={0,0,...0}
Now if you find 'a' then put A[1]=A[1]+1;
if 'b' then A[2]=A[2]+1;
Now in case of 2nd string for each character you decrease the values for each character found in the same array.(if you find 'a' decrease A[1] like A[1]=A[1]-1)
At last check if all the array elements are 0 or not. If 0 then definitely they are anagram else not an anagram.
Note: You may extend this for Capital letters similarly.
It is not necessary to count the crowd each letter.
Simply, you can sort your string and then check each element of two lists.
For example, you have this
"cinema" and "maneci"
It would be helpful to make your string into a list of characters.
['c','i','n','e','m','a'] and ['m','a','n','e','c','i']
Then , you can sort these list and you will check each character.
Note that you will have these cases :
example [] [] = True
example [] a = False
example a [] = False
example (h1:t1)(h2:t2) = if h1==h2 then _retroactively_ else False
In the Joy of Haskell "Finding Success and Failure", pp.11-14, the authors offer the following code which works:
import Data.List
isAnagram :: String -> String -> Bool
isAnagram word1 word2 = (sort word1) == (sort word2)
After importing your module (I imported practice.hs into Clash), you can enter two strings which, if they are anagrams, will return true:
*Practice> isAnagram "julie" "eiluj"
True

Resources