Trying to get this spellchecker I came across online to work, but no luck. Any help Would be appreciated. Original code from http://norvig.com/spell-correct.html
import re, collections, codecs
def words(text): return re.findall('[a-z]+', text.lower())
def train(features):
model = collections.defaultdict(lambda: 1)
for f in features:
model[f] += 1
return model
file = codecs.open('C:\88888\88888\88888\88888\8888\A Word.txt', encoding='utf-8', mode='r')
NWORDS = train(words(file.read()))
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits1(word):
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]
deletes = [a + b[1:] for a, b in splits if b]
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b]
inserts = [a + c + b for a, b in splits for c in alphabet]
return set(deletes + transposes + replaces + inserts)
def known_edits2(word):
return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)
def known(words): return set(w for w in words if w in NWORDS)
def correct(word):
candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
return max(candidates, key=NWORDS.get)
Error:
File "C:\8888\8888\8888\8888\88888\SpellCheck.py", line 11
file = codecs.open('C:\888\888\888\8888\88888\A Word.txt', encoding='utf-8', mode='r')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
OK, let's do something try this...
get a string value '\x' and try to do something to it
or try
string('\x.....')
Returns your error right?
So if you have a string defined say
x = string('\y\o\u \c\a\n \n\e\v\e\r \c\h\a\n\g\e \t\h\i\s \i\n \p\y\t\h\o\n')
Than you are just out of luck.
It will be a bummer if the user decides to type a '\' as any character of the input.
To fix the problem you could try using some looping or recursive code like:
How to remove illegal characters from path and filenames?
C:\88888\88888\88888\88888\8888\A Word.txt - that's the strangest path I've seen this year :)
Try replacing it with C:\\88888\\88888\\88888\\88888\\8888\\A Word.txt
Related
I am writing a python code to find all possible combinations of password with specific rules
should contain alphabets A-Z a-z
should contain numbers 0-9
should contain special symbols
first character of password must be capital letter
from itertools import permutations
pw = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789[#_!#$%^&*()<>?/\|}{~:]"
firstchar = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
c = permutations(pw, 2) #3 is the password length for providing sample output quickly
f=open("password.txt","w+")
f.truncate(0)
for x in firstchar:
for i in c:
current_pw = x + "".join(i)
f.write( "\t" + current_pw + "\n" )
**
the output contains only password starting from A and stops doesn't iterate for B, C...
**
I think permutation quits after the first loop.
So you define c in each loop.
from itertools import permutations
pw = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789[#_!#$%^&*()<>?/\|}{~:]"
firstchar = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
f=open("password.txt","w+")
f.truncate(0)
for x in firstchar:
c = permutations(pw, 2) # c is defined here
for i in c:
current_pw = x + "".join(i)
f.write( "\t" + current_pw + "\n" )
How can I shuffle two strings s||t (shuffle(s, t)) with the given requirement that the first char always stands in front of the second one in s and t as well no matter we shuffle. The result returns as a set of strings without duplicates.
I have the following test:
print(shuffle('ab', 'cd'))
Result:
['abcd', 'acbd', 'acdb', 'cabd', 'cadb', 'cdab']
Thanks a lot.
This method will shuffle two strings and return a list of shuffles between them where the order of the characters is the same as in the original strings. If there are duplicate characters there will be duplicate results as well.
def shuffle(s1, s2):
if len(s1) == 1:
return [s2[:i] + s1 + s2[i:] for i in range(len(s2) + 1)]
if len(s2) == 1:
return [s1[:i] + s2 + s1[i:] for i in range(len(s1) + 1)]
return [s1[0]+ s for s in shuffle(s1[1:], s2)] + [s2[0] + s for s in shuffle(s1, s2[1:])]
print shuffle("ab", "cd")
It works by getting the first character of each string and recursively shuffling the rest and adding this character to each element in the list. When there is one character remaining on each of the strings it returns a list where the character is added in each position of the other string. Hope it helps.
So you can apply a condition on final shuffled list to generate a new list from the shuffled one:
S=shuffle('ab','cd')
nl=[]
for w in S:
if(w.index('a')<w.index('b') and w.index('c')<w.index('d')):
nl.append(w)
So nl is your new list as per your requirement:)
If I understood the question correctly, this should work. Note, as you add letters to this, it becomes a long running problem. 4 letters have 6 possible combination for each entry in the list. 8 letters have 5,040 possible combinations for each entry in the list.
import random
import math
InputList = ['ab','cd']
PossibleUniqueCombinations = math.factorial(len("".join(InputList))-1)
print (PossibleUniqueCombinations)
TargetList = []
UniqueCombinationList = []
for lst in InputList:
UniqueCnt = 0
FirstChar = lst[0]
TheRest = list(lst[1:])
while UniqueCnt < PossibleUniqueCombinations:
if InputList.index(lst) == 0:
LeftList = []
else:
LeftList = InputList[0:InputList.index(lst)]
RightList = list(InputList[InputList.index(lst)+1:])
TargetList = LeftList + TheRest + RightList
TargetStr = ''.join(TargetList)
TargetStr = ''.join(random.sample(TargetStr, len(TargetStr)))
ShuffledStr = FirstChar + ''.join(TargetStr)
try:
FndIdx = UniqueCombinationList.index(ShuffledStr)
except ValueError:
UniqueCombinationList.append(ShuffledStr)
UniqueCnt += 1
for combo in UniqueCombinationList:
print(combo)
having a brain fart. But how do i decode a string that contains.
t = '%2Fdata%2F'
print(t.decode('utf8'))
'str' object has no attribute 'decode'
expecting /data/
2F is a hexadecimal number of / character. Python has chr function that returns a character representation by a decimal number.
So you need to get two symbols after %s and "decode" ("hex" -> chr(int("hex",16))) them into a character.
def decode_utf(string):
for i in range(string.count("%")):
tmp_index = string.index("%")
hex_chr = string[tmp_index:tmp_index + 3]
#replace only one characher at a time
string = string.replace(hex_chr, chr(int(hex_chr[1:],16)),1)
return string
print(decode_utf("%2Fdata%2F"))
#/data/
print(decode_utf("hello%20world%21"))
#hello world!
Edit 1:
The previous code breaks if there's %25 character, use the code below.
def decode_utf(string):
utf_characters = []
tmp_index = 0
for i in range(string.count("%")):
tmp_index = string.index("%",tmp_index)
hex_chr = string[tmp_index:tmp_index + 3]
if not hex_chr in utf_characters:
utf_characters.append(hex_chr)
tmp_index += 1
for hex_chr in utf_characters:
string = string.replace(hex_chr, chr(int(hex_chr[1:],16)))
return string
print(decode_utf("%25t%20e%21s%2ft%25"))
#%t e!s/t%
#!/usr/bin/python3.4
import urllib.request
import os
import re
os.chdir('/home/whatever/')
a = open('Shopstxt.csv','r')
b = a.readlines()
a.close()
c = len(b)
d = list(zip(*(e.split(';') for e in b)))
shopname = []
shopaddress = []
shopcity = []
shopphone = []
shopwebsite = []
f = d[0]
g = d[1]
h = d[2]
i = d[3]
j = d[4]
e = -1
for n in range(0, 5):
e = e + 1
sn = f[n]
sn.title()
print(sn)
shopname.append(sn)
sa = g[n]
sa.title()
shopaddress.append(sa)
sc = h[n]
sc.title()
shopcity.append(sc)
Shopstxt.csv is all upper case letters and I want to convert them to title. I thought this would do it but it doesn't...it still leaves them all upper case. What am I doing wrong?
I also want to save the file back. Just wanting to check on a couple of things real quick like as well...time pressed.
When I combine the file back together, before writing it back to the drive do I have to add an '\n' at the end of each line or does it automatically include the '\n' when I write each line to the file?
Strings are immutable, so you need to asign the result of title():
sa = sa.title()
sc = sc.title()
Also, if you do this:
with open("bla.txt", "wt") as outfile:
outfile.write("stuff")
outfile.write("more stuff")
then this will not automatically add line endings.
A quick way to add line endings would be this:
textblobb = "\n".join(list_of_text_lines)
with open("bla.txt", "wt") as outfile:
outfile.write(textblobb)
As long as textblobb isn't inefficiently large and fits into memory, that should do the trick nicely.
Use the .title() method when defining your variables like I did in the code below. As others have mentioned, strings are immutable so save yourself a step and create the string you need in one line.
for n in range(0, 5):
e = e + 1
sn = f[n].title() ### Grab and modify the list index before assigning to your variable
print(sn)
shopname.append(sn)
sa = g[n].title() ###
shopaddress.append(sa)
sc = h[n].title() ###
shopcity.append(sc)
So far, I have this:
def main():
bad_filename = True
l =[]
while bad_filename == True:
try:
filename = input("Enter the filename: ")
fp = open(filename, "r")
for f_line in fp:
a=(f_line)
b=(f_line.strip('\n'))
l.append(b)
print (l)
bad_filename = False
except IOError:
print("Error: The file was not found: ", filename)
main()
this is my program and when i print this what i get
['1,2,3,4,5']
['1,2,3,4,5', '6,7,8,9,0']
['1,2,3,4,5', '6,7,8,9,0', '1.10,2.20,3.30,0.10,0.30']
but instead i need to get
[1,2,3,4,5]
[6,7,8,9,0.00]
[1.10,2.20,3.3.0,0.10,0.30]
Each line of the file is a series on numbers separated by commas, but to python they are just characters. You need one more conversion step to get your string into a list. First split on commas to create a list of strings each of which is a number. Then use what is called "list comprehension" (or a for loop) to convert each string into a number:
b = f_line.strip('\n').split(',')
c = [float(v) for v in b]
l.append(c)
If you really want to reset the list each time through the loop (your desired output shows only the last line) then instead of appending, just assign the numerical list to l:
b = f_line.strip('\n').split(',')
l = [float(v) for v in b]
List comprehension is a shorthand way of saying:
l = []
for v in b:
l.append(float(v))
You don't need a or the extra parentheses around the assignment of a and b.