I want to get random but unqiues lines/words from a txt file in Python but It doesnt work for me
this is my code :
f=open("Order#.txt", "r")
aaawdad = f.read()
words = aaawdad.split()
Repeat = len(words)
driver = webdriver.Chrome(options=option)
df = pd.read_csv('Order#.txt', sep='\t')
uniques = df[df.columns[0]].unique()
for i in range(Repeat):
Mainlink = 'https://footlocker.narvar.com/footlocker/tracking/startrack?order_number=' + uniques
driver.get(Mainlink)
The text file looks like this :
Order#1
Order#2
Order#3
…
You didn't attach the file.
But I think you should put the lines of the text file to the list and then random the index.
Related
I have written the code to extract the numbers and the company name from the extracted pdf file.
sample pdf content:
#88876 - Sample1, GTRHEUSKYTH, -99WED,-0098B
#99945 - SAMPLE2, DJWHVDFWHEF, -8876D,-3445G
The above example is what my pdf file contains. I wanted to extract the App number which is after # (i.e) five numbers(88876) and App name which is after the (-) (i.e) Sample1. An write that to an excel file as separate columns which is App_number and App_name.
Please refer the below code which I have tried.
import PyPDF2, re
import csv
for k in range(1,100):
pdfObj = open(r"C:\\Users\merge.pdf",'rb')
object = PyPDF2.PdfFileReader("C:\\Users\merge.pdf")
pdfReader = PyPDF2.PdfFileReader(pdfObj)
NumPages = object.getNumPages()
pdfReader.numPages
for i in range(0, NumPages):
pdfPageObj = pdfReader.getPage(i)
text = pdfPageObj.extractText()
x=re.findall('(?<=#).[0-9]+', text)
y=re.findall("(?<=\- )(.*?)(?=,)", text)
print(x)
print(y)
with open("out.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerows(x)
Please pour some suggestions.
Try this:
text = '#88876 - Sample1, GTRHEUSKYTH'
App_number = re.search('(?<=#).[0-9]+', text).group()
App_name = re.search("(?<=\- )(.*?)(?=,)", text).group()
In the first regex you get the first consecutive digits after #, in the second one you get everything between - and ,
Hope it helped
I have a .csv table that looks like this:
original csv
I want to get a new .csv data that looks like this:
new csv
I already got to the point that I have the second csv with the unique values of the SITENAMES in the first column, but now I'm struggling to append the SPECIESNAMES into the second column.
uri = 'file:///C:/Users/t/Desktop/T/Natura/Python/20220214_Natura2000_specieslist.txt'
csvLyr = QgsVectorLayer(uri, "csvLayer", "delimitedtext")
spalten = ["SITECODE"]
sitecodes = pd.read_csv(uri, usecols=spalten)
spalten2 = ["SPECIESNAME_deutsch"]
species = pd.read_csv(uri, usecols=spalten2)
#### Schritt 2: Mithilfe von unique() die unique values der Sidecodes erhalten und als neue Spalte in eine csv schreiben
sitecodes_unique = sitecodes.SITECODE.unique()
print(sitecodes_unique)
print(len(sitecodes_unique))
path = 'C:/Users/t/Desktop/T/Natura/Python/Ergebnisse'
if not os.path.isdir(path):
os.makedirs(path)
with open('C:/Users/t/Desktop/T/Natura/Python/Ergebnisse/sitecodes_namen.csv', 'w+', newline='') as f:
wr = csv.writer(f)
for line in sitecodes_unique:
sitecodes_unique_split = line.split(',')
wr.writerow(sitecodes_unique_split)
Try this natural python code a viable alternative which calls directly a csv file instead of txt. I've tried to use collections as mentioned by #JonSG :
sitecodes = pd.read_csv('file:///C:/Users/t/Desktop/T/Natura/Python/20220214_Natura2000_specieslist.csv', index_col= False)
sitecodes_df = pd.DataFrame(sitecodes,columns = sitecodes.columns)
sitecodes_namen = defaultdict(list)
for i in range(len(sitecodes_df)):
if sitecodes_df['SITECODE'][i] in sitecodes_namen.keys():
sitecodes_namen[sitecodes_df['SITECODE'][i]]+=','+sitecodes_df['SPECIESNAME_deutsch'][i]
else:
sitecodes_namen[sitecodes_df['SITECODE'][i]] = sitecodes_df['SPECIESNAME_deutsch'][i]
df = pd. DataFrame(list(sitecodes_namen.items()), columns = sitecodes.columns)
df.to_csv('C:/Users/t/Desktop/T/Natura/Python/Ergebnisse/sitecodes_namen.csv',index=False)
For school, I need to create a spell checker, using python. I decided to do it using a GUI created with tkinter. I need to be able to input a text (.txt) file that will be checked, and a dictionary file, also a text file. The program needs to open both files, check the check file against the dictionary file, and then display any words that are misspelled.
Here's my code:
import tkinter as tk
from tkinter.filedialog import askopenfilename
def checkFile():
# get the sequence of words from a file
text = open(file_ent.get())
dictDoc = open(dict_ent.get())
for ch in '!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
# make a dictionary of the word counts
wordDict = {}
for w in words:
wordDict[w] = wordDict.get(w,0) + 1
for k in dictDict:
dictDoc.pop(k, None)
misspell_lbl["text"] = dictDoc
# Set-up the window
window = tk.Tk()
window.title("Temperature Converter")
window.resizable(width=False, height=False)
# Setup Layout
frame_a = tk.Frame(master=window)
file_lbl = tk.Label(master=frame_a, text="File Name")
space_lbl = tk.Label(master=frame_a, width = 6)
dict_lbl =tk.Label(master=frame_a, text="Dictionary File")
file_lbl.pack(side=tk.LEFT)
space_lbl.pack(side=tk.LEFT)
dict_lbl.pack(side=tk.LEFT)
frame_b = tk.Frame(master=window)
file_ent = tk.Entry(master=frame_b, width=20)
dict_ent = tk.Entry(master=frame_b, width=20)
file_ent.pack(side=tk.LEFT)
dict_ent.pack(side=tk.LEFT)
check_btn = tk.Button(master=window, text="Spellcheck", command=checkFile)
frame_c = tk.Frame(master=window)
message_lbl = tk.Label(master=frame_c, text="Misspelled Words:")
misspell_lbl = tk.Label(master=frame_c, text="")
message_lbl.pack()
misspell_lbl.pack()
frame_a.pack()
frame_b.pack()
check_btn.pack()
frame_c.pack()
# Run the application
window.mainloop()
I want the file to check against the dictionary and display the misspelled words in the misspell_lbl.
The test files I'm using to make it work, and to submit with the assignment are here:
check file
dictionary file
I preloaded the files to the site that I'm submitting this on, so it should just be a matter of entering the file name and extension, not the entire path.
I'm pretty sure the problem is with my function to read and check the file, I've been beating my head on a wall trying to solve this, and I'm stuck. Any help would be greatly appreciated.
Thanks.
The first problem is with how you try to read the files. open(...) will return a _io.TextIOWrapper object, not a string and this is what causes your error. To get the text from the file, you need to use .read(), like this:
def checkFile():
# get the sequence of words from a file
with open(file_ent.get()) as f:
text = f.read()
with open(dict_ent.get()) as f:
dictDoc = f.read().splitlines()
The with open(...) as f part gives you a file object called f, and automatically closes the file when it's done. This is more concise version of
f = open(...)
text = f.read()
f.close()
f.read() will get the text from the file. For the dictionary I also added .splitlines() to turn the newline separated text into a list.
I couldn't really see where you'd tried to check for misspelled words, but you can do it with a list comprehension.
misspelled = [x for x in words if x not in dictDoc]
This gets every word which is not in the dictionary file and adds it to a list called misspelled. Altogether, the checkFile function now looks like this, and works as expected:
def checkFile():
# get the sequence of words from a file
with open(file_ent.get()) as f:
text = f.read()
with open(dict_ent.get()) as f:
dictDoc = f.read().splitlines()
for ch in '!"#$%&()*+,-./:;<=>?#[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
# make a dictionary of the word counts
wordDict = {}
for w in words:
wordDict[w] = wordDict.get(w,0) + 1
misspelled = [x for x in words if x not in dictDoc]
misspell_lbl["text"] = misspelled
Here I'm trying to convert few numbers inside a list read from a file into float format, but my output comes still as a string format. Where is the problem?
table = []
fileName = input("Enter the name of the file: ")
readFile = open(fileName)
lines = readFile.readlines()
for line in lines:
line = line.split()
for item in line:
item = float(item)
table.append(item)
print(table)
Here is a screenshot of my code :
You should append the item that is a float(stored in the variable Item) and not the string version(stored in the variable line) inside the loop so each item is added as the loop iterates through the items.I also use the split() function to add every three numbers into another nested list
Here is the fixed code:
table = []
readFile = open(filename)
lines = readFile.readlines()
for i in lines:
for line in i.split():
items = float(lines)
table = [[items]]
print(table)
OR:
readFile = open(filename)
lines = readFile.readlines()
table=[([items] for line in i.split) for i in lines]
print(table)
Output:
[[2.0,7.0,6.0],[9.0,5.0,1.0],[4.0,3.0,8.0]]
2331,0,13:30:08,25.35,22.05,23.8,23.9,23.5,23.7,5455,350,23.65,132,23.6,268,23.55,235,23.5,625,23.45,459,23.7,83,23.75,360,23.8,291,23.85,186,23.9,331,0,1,25,1000,733580089,name,,,
I got a line like this and how could I cut it? I only need the first 9 variable like this:
2331,0,13:30:08,25.35,22.05,23.8,23.9,23.5,23.7,5455
the original data i save as txt.file, and could I rewrite the original one and save?
Use either csv or just to straight file io with string split function
For example:
import csv
with open('some.txt', 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row[:9]
or if everything is on a single line and you don't want to use a csv interface
with open('some.txt', 'r') as f:
line = f.read()
print line.split(str=",")[:9]
If you have a file called "content.txt".
f = open("content.txt","r")
contentFile = f.read();
output = contentFile.split(",")[:9]
output = ",".join(output)
f.close()
f = open("content.txt","wb")
f.write(output)
If all your values are stored in an Array, you can slice like this:
arrayB = arrayA[:9]
To get your values into an array you could split your String at every ","
arrayA = inputString.split(str=",")