Cryptopals challenge 4 concern - python-3.x

i am not getting the desired results for Cryptopals challenge 4 set 1.
The concept of the program to check to see if any of these 300ish strings have been XORd by a single character. So with a brute force, my solution is take every string, XOR it with every character on the keyboard, and check to see if any of these results produce an english sentence. if not, then check the next string. Here is my code:
MY_DICT = {}
index = 0
my_plaintext = "Now that the party is jumping"
#fills the dictionary with hex strings from the txt file
with open("hexstrings.txt") as f:
my_list = f.readlines()
for x in my_list:
MY_DICT[index] = x.rstrip('\n')
index = index + 1
i=0
input() #this is just here to help me keep track of where i am when running it
#this loop fills possible_plaintext with all the possible 255 XORs of the i'th string
#of the dictionary that was previously filler from the txt file
for i in range(326):
possible_plaintexts = brute_force_singlechar_xor(MY_DICT[i])
print(possible_plaintexts)
if possible_plaintexts == my_plaintext: #line of concern
print("ya found it yay :) ")
Im sure that myBruteForce function works because it worked properly on the last problem where i XORd every possible char against a string. and i also know that the plaintext is the one provided bc i saw the solution. im just not sure why my program isnt recognizing that the plaintext is not in the dictionary.
(i am aware that using a scoring system to score every string to see if its close to english would be easier, but this is the way i chose to do it for now until i figure out how to get my scoring function to work /: )

How is your dictionary "possible_plaintexts" like when you print it?
Can you spot the solution in the printed text? How is it printed?
The decrypted string should also have a '\n' character.

Related

Basic string slicing from indices

I will state the obvious that I am a beginner. I should also mention that I have been coding in Zybooks, which affects things. My textbook hasn't helped me much
I tried sub_lyric= rhyme_lyric[ : ]
Zybooks should be able to input an index number can get only that part of the sentence but my book doesnt explain how to do that. If it throws a [4:7] then it would output cow. Hopefully I have exolained everything well.
You need to set there:
sub_lyric = rhyme_lyric[start_index:end_index]
The string is as a sequence of characters and you can use string slicing to extract any sub-text from the main one. As you have observed:
sub_lyric = rhyme_lyric[:]
will copy the entire content of rhyme_lyric to sub_lyric.
To select only a portion of the text, specify the start_index (strings start with index 0) to end_index (not included).
sub_lyric = rhyme_lyric[4:7]
will extract characters in rhyme_lyric from position 4 (included) to position 7 (not included) so the result will be cow.
You can check more on string slicing here: Python 3 introduction

Python ord() and chr()

I have:
txt = input('What is your sentence? ')
list = [0]*128
for x in txt:
list[ord(x)] += 1
for x in list:
if x >= 1:
print(chr(list.index(x)) * x)
As per my understanding this should just output every letter in a sentence like:
))
111
3333
etc.
For the string "aB)a2a2a2)" the output is correct:
))
222
B
aaaa
For the string "aB)a2a2a2" the output is wrong:
)
222
)
aaaa
I feel like all my bases are covered but I'm not sure what's wrong with this code.
When you do list.index(x), you're searching the list for the first index that value appears. That's not actually what you want though, you want the specific index of the value you just read, even if the same value occurs somewhere else earlier in the list too.
The best way to get indexes along side values from a sequence is with enuemerate:
for i, x in enumerate(list):
if x >= 1:
print(chr(i) * x)
That should get you the output you want, but there are several other things that would make your code easier to read and understand. First of all, using list as a variable name is a very bad idea, as that will shadow the builtin list type's name in your namespace. That makes it very confusing for anyone reading your code, and you even confuse yourself if you want to use the normal list for some purpose and don't remember you've already used it for a variable of your own.
The other issue is also about variable names, but it's a bit more subtle. Your two loops both use a loop variable named x, but the meaning of the value is different each time. The first loop is over the characters in the input string, while the latter loop is over the counts of each character. Using meaningful variables would make things a lot clearer.
Here's a combination of all my suggested fixes together:
text = input('What is your sentence? ')
counts = [0]*128
for character in text:
counts[ord(character)] += 1
for index, count in enumerate(counts):
if count >= 1:
print(chr(index) * count)

Split string in Lua and print selected keys

I'm looking for a little help with splitting a string using Lua and printing selected parts of it. I have this code so far:
b = "an example string"
for i in string.gmatch(b, "%w+") do
print(i)
end
Output is...
an
example
string
How do I go about printing only bits of the result?
I've tried the following but just returns a list of "nils":
b = "an example string"
for i in string.gmatch(b, "%w+") do
print(i[1])
end
So if I wanted to print:
string
example
How would this work? I was pretty sure I just added the value assigned to the key that is in memory, like [0] or [1]. But I must be wrong..
In this use case the sample text will remain the same, only time stamps will change in the string. I just need to reorder the words.
Any help is greatly appreciated :)
The best way I can find is to use the loop to store the matches in an array. Then you can access them with literal indexes:
b = "an example string"
local words = {}
for i in string.gmatch(b, "%w+") do
table.insert(words, i)
end
print(words[3])
print(words[2])
In addition to the existing (probably perferable) answer, you could also do some manual work with a counter:
counter = 0
for i in string.gmatch(b, "%w+") do
counter = counter + 1
if counter > 1 then print(i) end
end
Or, here's a one-liner (that wouldn't scale with larger strings though and also doesn't insert a newline between second and third word):
print(string.match(b, "%w+%s+(%w+)%s+(%w+)"))

Python get character position matches between 2 strings

I'm looking to encode text using a custom alphabet, while I have a decoder for such a thing, I'm finding encoding more difficult.
Attempted string.find, string.index, itertools and several loop attempts. I would like to take the position, convert it to integers to add to a list. I know its something simple I'm overlooking, and all of these options will probably yield a way for me to get the desired results, I'm just hitting a roadblock for some reason.
alphabet = '''h8*jklmnbYw99iqplnou b'''
toencode = 'You win'
I would like the outcome to append to a list with the integer position of the match between the 2 string. I imagine the output to look similar to this:
[9,18,19,20,10,13,17]
Ok, I just tried a bit harder and got this working. For anyone who ever wants to reference this, I did the following:
newlist = []
for p in enumerate(flagtext):
for x in enumerate(alphabet):
if p[1] == x[1]:
newlist.append(x[0])
print newlist

How can I create a more efficient way of parsing words between two large text files (Python 3.6.4)

I'm brand new to Python and this was my first attempt in applying what I've learned, but I know I'm being inefficient. The code works but it takes a couple minutes to finish executing on a novel sized text file.
Is there a more efficient way to reach the same output? Any styling critiques would also be appreciated. Thank you!
def realWords(inFile, dictionary, outFile):
with open(inFile, 'r') as inf, open(dictionary, 'r') as dictionary, open(outFile, 'w') as outf:
realWords = ''
dList = []
for line in dictionary:
dSplit = line.split()
for word in dSplit:
dList.append(word)
for line in inf:
wordSplit = line.split()
for word in wordSplit:
if word in dList:
realWords += word + ' '
outf.write(realWords)
print('File of real words created')
inf.close()
dictionary.close()
outf.close()
'''
I created a function to compare the words in a text file to real words taken
from a reference dictionary (like the Webster Unabridged Dictionary). It
takes a text file and breaks it up into individual word components. It then
compares each word to each word in the reference dictionary text file in
order to test whether the world is a real word or not. This is done so as to
eliminate non-real words, names, and some other junk. For each word that
passes the test, each word is then added to the same empty string. Once all
words have been parsed, the output string containing all real words is
written to a new text file.
'''
For every single word in your novel, you search the ENTIRE dictionary once to see if you can find that word. That's really slow.
You can benefit from using a set() data structure, with lets you quickly determine, in constant time, whether an element is inside of it or not.
Furthermore, by getting rid of string concatenation and using .join() instead, you can speed your code up some more.
I made some adjustments to your code so it uses set() and .join(), which should speed it up considerably
def realWords(inFile, dictionary, outFile):
with open(inFile, 'r') as inf, open(dictionary, 'r') as dictionary, open(outFile, 'w') as outf:
realWords = [] #note list for constant time appends
dList = set()
for line in dictionary:
dSplit = line.split()
for word in dSplit:
dList.add(word)
for line in inf:
wordSplit = line.split()
for word in wordSplit:
if word in dList: #done in constant time because dList is a set
realWords.append(word)
outf.write(' '.join(realWords))
print('File of real words created')
inf.close()
dictionary.close()
outf.close()
You can use a set() to do quick lookup of words, and you can increase the string concatenation speed by using " ".join(your_list), something like:
def write_real_words(in_file, dictionary, out_file):
with open(in_file, 'r') as i, open(dictionary, 'r') as d, open(out_file, 'w') as o:
dictionary_words = set()
for l in d:
dictionary_words |= set(l.split())
real_words = [word for l in i for word in l.split() if word in dictionary_words]
o.write(" ".join(real_words))
print('File of real words created')
As for style, the above is mostly PEP-compliant, I've shortened the variable names to avoid scrolls on the code block in SO, I'd suggest you to use something more descriptive for real world usage.
I wrote a possible response. The main comments I have are:
1) Modularize the functions more; that is, each function should do fewer things (i.e. should do one thing very well). The function realWords can only be reused in the very specific case you want to do all of exactly what you propose. The functions below do fewer things, so they are more likely to be reused.
2) I added functionality to remove special chars from words to avoid Type II error (that is, to avoid missing a real word and calling it nonsense)
3) I added functionality to store everything that is designated as not a real word. The main QC step for this workflow would be iteratively examining the output going into the "nonsense" category and systematically eliminating true words that were missed.
4) Store the real words in the dictionary as a set in python to guarantee minimal lookup time.
5) I did not run this because I do not have appropriate input files, so I may have a few typos or bugs in there.
# real words could be missed if they adjoin a special character. strip all incoming words of special chars
def clean_words_in_line(input_line):
""" iterate through a line, remove special characters, return clean words"""
chars_to_strip=[":", ";", ",", "."] # add characters as need be to remove them
for dirty_word in input_line:
for char in chars_to_strip:
clean_word=dirty_word.strip(char)
clean_words.append(dirty_word)
return(clean_words)
def ref_words_to_set(dct_file):
""" iterate through a source file containing known words, build a list of real words, return as a set """
clean_word_list=[]
with open(dct_file, 'r') as dt_fh:
for line in dt_fh:
line=line.strip().split()
clean_line=clean_words_in_line(line)
for word in clean_line:
clean_word_list.append(word)
clean_word_set=set(clean_word_list) # use a set comprehension to minimize lookup time
return(clean_word_set)
def find_real_words(my_novel, cws):
""" iterate through a book or novel, check for clean words """
words_in_dict=[]
quite_possibly_runcible=[]
with open(my_novel) as mn_fh:
for line in my_novel:
line=line.strip().split()
clean_line=clean_words_in_line(line)
for word in clean_line:
if word in cws:
words_in_dict.append(word)
else:
quite_possibly_runcible.append(word)
return(words_in_dict, quite_possibly_runcible)
set_of_real_words=ref_words_to_set("The_Webster_Unabridged_Dictionary.txt")
(real_words, non_sense)=find_real_words("Don_Quixote.txt", set_of_real_words)
with open("Verified_words.txt", 'a') as outF:
outF.write(" ".join(real_words) + "\n")
with open("Lears_words.txt", 'a') as n_outF:
n_outF.write(" ".join(non_sense) + "\n")
This answer is for understanding, rather than just giving better code.
What you need to do is study Big O notation.
The complexity of reading the dictionary is O(number of lines in dictionary * number of words per line), or just O(number of words in dictionary).
The complexity of reading inf looks similar at first. However, idiomatic python includes deceptive practices - namely, if word in dList is not a constant-time operation for some types. Additionally, the python language requires a new object for += here (although in limited circumstances, it can optimize it out - but don't rely on that), thus the complexity is equal to O(length of realWords). Assuming most words are in fact in the dictionary, this is equivalent to the length of the file.
So your overall complexity for this step is O(number of words in infile * number of words in dictionary) with the optimization, or O((number of words in infile)² * number of words in dictionary) without it.
Since the complexity of the first step is smaller and smaller components disappear, the overall complexity is just the complexity of the second half.
The other answers give you a complexity of O(number of words in dictionary + number of words in file), which is irreducible since the sides of the + are unrelated. Of course, this assumes no hash collisions, but as long as your dictionary is not subject user input, that's a safe assumption. (If you do, grab the blist package from pypi for a convenient container with good worst-case performance).

Resources