Python get character position matches between 2 strings - string

I'm looking to encode text using a custom alphabet, while I have a decoder for such a thing, I'm finding encoding more difficult.
Attempted string.find, string.index, itertools and several loop attempts. I would like to take the position, convert it to integers to add to a list. I know its something simple I'm overlooking, and all of these options will probably yield a way for me to get the desired results, I'm just hitting a roadblock for some reason.
alphabet = '''h8*jklmnbYw99iqplnou b'''
toencode = 'You win'
I would like the outcome to append to a list with the integer position of the match between the 2 string. I imagine the output to look similar to this:
[9,18,19,20,10,13,17]

Ok, I just tried a bit harder and got this working. For anyone who ever wants to reference this, I did the following:
newlist = []
for p in enumerate(flagtext):
for x in enumerate(alphabet):
if p[1] == x[1]:
newlist.append(x[0])
print newlist

Related

Python ord() and chr()

I have:
txt = input('What is your sentence? ')
list = [0]*128
for x in txt:
list[ord(x)] += 1
for x in list:
if x >= 1:
print(chr(list.index(x)) * x)
As per my understanding this should just output every letter in a sentence like:
))
111
3333
etc.
For the string "aB)a2a2a2)" the output is correct:
))
222
B
aaaa
For the string "aB)a2a2a2" the output is wrong:
)
222
)
aaaa
I feel like all my bases are covered but I'm not sure what's wrong with this code.
When you do list.index(x), you're searching the list for the first index that value appears. That's not actually what you want though, you want the specific index of the value you just read, even if the same value occurs somewhere else earlier in the list too.
The best way to get indexes along side values from a sequence is with enuemerate:
for i, x in enumerate(list):
if x >= 1:
print(chr(i) * x)
That should get you the output you want, but there are several other things that would make your code easier to read and understand. First of all, using list as a variable name is a very bad idea, as that will shadow the builtin list type's name in your namespace. That makes it very confusing for anyone reading your code, and you even confuse yourself if you want to use the normal list for some purpose and don't remember you've already used it for a variable of your own.
The other issue is also about variable names, but it's a bit more subtle. Your two loops both use a loop variable named x, but the meaning of the value is different each time. The first loop is over the characters in the input string, while the latter loop is over the counts of each character. Using meaningful variables would make things a lot clearer.
Here's a combination of all my suggested fixes together:
text = input('What is your sentence? ')
counts = [0]*128
for character in text:
counts[ord(character)] += 1
for index, count in enumerate(counts):
if count >= 1:
print(chr(index) * count)

Python - how to recursively search a variable substring in texts that are elements of a list

let me explain better what I mean in the title.
Examples of strings where to search (i.e. strings of variable lengths
each one is an element of a list; very large in reality):
STRINGS = ['sftrkpilotndkpilotllptptpyrh', 'ffftapilotdfmmmbtyrtdll', 'gftttepncvjspwqbbqbthpilotou', 'htfrpilotrtubbbfelnxcdcz']
The substring to find, which I know is for sure:
contained in each element of STRINGS
is also contained in a SOURCE string
is of a certain fixed LENGTH (5 characters in this example).
SOURCE = ['gfrtewwxadasvpbepilotzxxndffc']
I am trying to write a Python3 program that finds this hidden word of 5 characters that is in SOURCE and at what position(s) it occurs in each element of STRINGS.
I am also trying to store the results in an array or a dictionary (I do not know what is more convenient at the moment).
Moreover, I need to perform other searches of the same type but with different LENGTH values, so this value should be provided by a variable in order to be of more general use.
I know that the first point has been already solved in previous posts, but
never (as far as I know) together with the second point, which is the part of the code I could not be able to deal with successfully (I do not post my code because I know it is just too far from being fixable).
Any help from this great community is highly appreciated.
-- Maurizio
You can iterate over the source string and for each sub-string use the re module to find the positions within each of the other strings. Then if at least one occurrence was found for each of the strings, yield the result:
import re
def find(source, strings, length):
for i in range(len(source) - length):
sub = source[i:i+length]
positions = {}
for s in strings:
# positions[s] = [m.start() for m in re.finditer(re.escape(sub), s)]
positions[s] = [i for i in range(len(s)) if s.startswith(sub, i)] # Using built-in functions.
if not positions[s]:
break
else:
yield sub, positions
And the generator can be used as illustrated in the following example:
import pprint
pprint.pprint(dict(find(
source='gfrtewwxadasvpbepilotzxxndffc',
strings=['sftrkpilotndkpilotllptptpyrh',
'ffftapilotdfmmmbtyrtdll',
'gftttepncvjspwqbbqbthpilotou',
'htfrpilotrtubbbfelnxcdcz'],
length=5
)))
which produces the following output:
{'pilot': {'ffftapilotdfmmmbtyrtdll': [5],
'gftttepncvjspwqbbqbthpilotou': [21],
'htfrpilotrtubbbfelnxcdcz': [4],
'sftrkpilotndkpilotllptptpyrh': [5, 13]}}

How to find sentence clauses that match word sequences? python

I have a large number of sentences from which I want to extract clauses/ segments that match certain word combinations. I have the following code that works, but it only works with one string of one word. I cannot find a way to extend it to work with multiple strings and strings of two words. I thought this was simple and asked by others before me, but could not find the answer. Can anybody help me?
This is my code:
import pandas as pd
df = pd.read_csv('text.csv')
identifiers = ('what')
sentence = df['A']
for i in sentence:
i = i.split()
if identifiers in i:
index = i.index(identifiers)
print(i[index:])
Give a sentence like this:
"Given that I want to become an entrepreneur, I am wondering what collage to attend."
and a list of two-word identifiers such as this:
identifiers = [('I am', 'I can' ..., 'I will')] # There could be dozens
how can I achieve a result like this?
I am wondering what collage to attend.
I tried: extending the code above, using isin() and something like if any([x in i for x in identifiers]) but no solution. Any suggestions?
It does not work for multiple-word phrases because you used split. Since it splits on spaces (by default), logically there won't be any single element left containing a space.
You can use in immediately to test if a certain string contains any other:
>>> sentence = "Given that I want to become an entrepreneur, I am wondering what collage to attend."
>>> identifiers = ['I am', 'I can', 'I will']
>>> for i in identifiers:
... if i in sentence:
... print (sentence[sentence.index(i):])
...
I am wondering what collage to attend.
Your attempt any([x in sentence for x in identifiers]), for these strings, shows
[True, False, False]
and while it gives some useful result, but still not the index, it would require another loop over this result to actually print the index. (And the any part is not necessary unless you specifically and only want to know if a sentence contains such a phrase.)
But the [x in sentence ..] list comprehension only yields a list of True and False, with which you cannot do anything, so it's a dead end.
But it suggests an alternative:
>>> [sentence.index(x) for x in identifiers if x in sentence]
[45]
which leads us to a list of results:
>>> [sentence[sentence.index(x):] for x in identifiers if x in sentence]
['I am wondering what collage to attend.']
If you add 'I want' to your list of identifiers, you still get a correct result, now consisting of two sentence fragments (both all the way up to the end):
['I am wondering what collage to attend.', 'I want to become an entrepreneur, I am wondering what collage to attend.']
(For fun and while I'm at it: if you want to clip off the excess at the first comma, add a regexp that matches everything except a comma:
>>> [re.match(r'^([^,]+)', sentence[sentence.index(x):]).groups(0)[0] for x in identifiers if x in sentence]
['I am wondering what collage to attend.', 'I want to become an entrepreneur']
Never mind the groups(0)[0] part at the end of that regex, it's just to coerce the SRE_Match object back into a regular string.)

Trying to understand a Python line code

I am new to python, and when I search for a way to get a string length without using "len()", I found this answer:
sum([1 for _ in "your string goes here"])
Can someone help me understand this line,what's the '1' doing there for example?
This is basically equivalent to this:
lst = []
for dontCareAboutTheName in "your string goes here":
lst.append(1)
print(sum(lst))
The list comprehension basically collects the number 1 for each character it finds while looping through the string. So the list will contain exactly as many elements as the length of the string. And since all those list elements are 1, when calculating the sum of all those elements, you end up with the length of the string.

How can I write the following script in Python?

So the program that I wanna write is about adding two strings S1 and S2 who are made of int.
example: S1='129782004977', S2='754022234930', SUM='883804239907'
So far I've done this but still it has a problem because it does not rive me the whole SUM.
def addS1S2(S1,S2):
N=abs(len(S2)-len(S1))
if len(S1)<len(S2):
S1=N*'0'+S1
if len(S2)<len(S1):
S2=N*'0'+S2
#the first part was to make the two strings with the same len.
S=''
r=0
for i in range(len(S1)-1,-1,-1):
s=int(S1[i])+int(S2[i])+r
if s>9:
r=1
S=str(10-s)+S
if s<9:
r=0
S=str(s)+S
print(S)
if r==1:
S=str(r)+S
return S
This appears to be homework, so I will not give full code but just a few pointers.
There are three problems with your algorithm. If you fix those, then it should work.
10-s will give you negative numbers, thus all those - signs in the sum. Change it to s-10
You are missing all the 9s. Change if s<9: to if s<=9:, or even better, just else:
You should not add r to the string in every iteration, but just at the very end, after the loop.
Also, instead of using those convoluted if statements to check r and substract 10 from s you can just use division and modulo instead: r = s/10 and s = s%10, or just r, s = divmod(s, 10).
If this is not homework: Just use int(S1) + int(S2).

Resources