How to make it so my code remembers what is has written in a text file? - python-3.x

Hello python newbie here.
I have code that prints names into a text file. It takes the names from a website. And on that website, there may be multiple same names. It filters them perfectly without an issue into one name by looking if the name has already written in the text file. But when I run the code again it ignores the names that are already in the text file. It just filters the names it has written on the same session. So my question is how do I make it remember what it has written.
image of the text file
kaupan_nimi = driver.find_element_by_xpath("//span[#class='store_name']").text
with open("mainostetut_yritykset.txt", "r+") as tiedosto:
if kaupan_nimi in tiedosto:
print("\033[33mNimi oli jo tiedostossa\033[0m")
else:
print("\033[32mUusi asiakas vahvistettu!\033[0m")
#Kirjoittaa tekstitiedostoon yrityksen nimen
tiedosto.seek(0)
data = tiedosto.read(100)
if len(data) > 0:
tiedosto.write("\n")
tiedosto.write(kaupan_nimi)
There is the code that I think is the problem. Please correct me if I am wrong.

There are two main issues with your current code.
The first is that you are likely only going to be able to detect duplicated names if they are back to back. That is, if the prior name that you're seeing again was the very last thing written into the file. That's because all the lines in the file except the last one will have newlines at the end of them, but your names do not have newlines. You're currently looking for an exact match for a name as a line, so you'll only ever have a chance to see that with the last line, since it doesn't have a newline yet. If the list of names you are processing is sorted, the duplicates will naturally be clumped together, but if you add in some other list of names later, it probably won't pick up exactly where the last list left off.
The second issue in your code is that it will tend to clobber anything that gets written more than 100 characters into the file, starting every new line at that point, once it starts filling up a bit.
Lets look at the different parts of your code:
if kaupan_nimi in tiedosto:
This is your duplicate check, it treats the file as an iterator and reads each line, checking if kaupan_nimi is an exact match to any of them. This will always fail for most of the lines in the file because they'll end with "\n" while kaupan_nimi does not.
I would suggest instead reading the file only once per batch of names, and keeping a set of names in your program's memory that you can check your names-to-be-added against. This will be more efficient, and won't require repeated reading from the disk, or run into newline issues.
tiedosto.seek(0)
data = tiedosto.read(100)
if len(data) > 0:
tiedosto.write("\n")
This code appears to be checking if the file is empty or not. However, it always leaves the file position just past character 100 (or at the end of the file if there were fewer than 100 characters in it so far). You can probably fit several names in that first 100 characters, but after that, you'll always end up with the names starting at index 100 and going on from there. This means you'll get names written on top of each other.
If you take my earlier advice and keep a set of known names, you could check that set to see if it is empty or not. This doesn't require doing anything to the file, so the position you're operating on it can remain at the end all of the time. Another option is to always end every line in the file with a newline so that you don't need to worry about whether to prepend a newline only if the file isn't empty, since you know that at the end of the file you'll always be writing a fresh line. Just follow each name with a newline and you'll always be doing the right thing.
Here's how I'd put things together:
# if possible, do this only once, at the start of the website reading procedure:
with open("mainostetut_yritykset.txt", "r+") as tiedosto:
known_names = set(name.strip() for name in tiedosto) # names already in the file
# do the next parts in some kind of loop over the names you want to add
for name in something():
if name in known_names: # duplicate found
print("\033[33mNimi oli jo tiedostossa\033[0m")
else: # not a duplicate
print("\033[32mUusi asiakas vahvistettu!\033[0m")
tiedosto.write(kaupan_nimi) # write out the name
tiedosto.write("\n") # and always add a newline afterwards
# alternatively, if you can't have a trailing newline at the end, use:
# if known_names:
# tiedosto.write("\n")
# tiedosto.write(kaupan_nimi)
known_names.add(kaupan_nimi) # update the set of names

Related

how to fix if-else problem in calling functions in python?

if file is not None:
content = file.readlines()
if 'I' and 'J' in content:
display_oval()
else:
display_polygon()
in this case,suppose i opened a file containing I&J . i expect to call display_oval() but it calls display_polygon(). when i opened file not containing I&J,display_polygon() calls as expected.
when i replaced 'I' and 'J'with 'I' or 'J',when i opened a file containing I&J,display_oval() works fine. But when i opened file not containing I&J, nothing works.
I want to call display_oval()if file contains I&J and display_polygon()otherwise. how it can be done?
You have a couple of intersecting issues with your code.
Thie first issue is that 'I' and 'J' in content gets grouped as ('I') and ('J' in content), which is surely not what you intend. A string like 'I' is always truthy, so testing in that way is not useful. You probably mean 'I' in content and 'J' in content`.
But that's not enough to fix your code (it makes fewer inputs match, not more). The condition will still not work right because your content is a list of strings, all but the last of which will be newline terminated. When done on a list, the in operator expects exact matches, not substring matches as in does when both arguments are strings.
I'm not exactly sure what fix would be best for that second issue. It depends on the logic of your program, and the contents of your file. If you want to test if I and J show up as individual lines in the file (each separately, on a line with no other characters), you might want to test for 'I\n' in content and 'J\n' in content using the same content you're using now. On the other hand, if you want to check for a capital I and J characters anywhere in the text of the file, without regard to lines, then you probably need to change content instead of changing the matching logic. Use content = file.read() to read the whole file into a single string, rather than a list of strings. Then 'I' in content will do a substring search.

Python '\r' print on the same line but won't erase the previous text

I am trying to print the report for each iteration. Since each iteration takes a really long time to run, therefore, I use print together with end="\r" to show the current item being processed.
Here's the dummy code:
import time
y = list(range(50))
print("epoch\ttrain loss\ttest loss\ttrain avg\ttest avg\ttime\tutime")
for e in range(10):
for i in range(50):
print("training {}/{} batches".format(i,50), end = '\r')
time.sleep(0.05)
print('{}\t{:2f}\t{:2f}\t{:2f}\t{:2f}\t{:2.1f}\t{:2.1f}'.format(y[0]+e,y[1]+e,y[2]+e,y[3]+e,y[4]+e,y[5]+e,y[6]+e))
Expected Result
This is my expected result, where the progress information is completely erased after each iteration. (I am running it in Jupyter notebook, and it looks fine)
The Result that I am getting
However, when I run it on linux terminal, the progress information is not completely erased, and the result is overlaying on top of the progress.
Why is it so? How to solve it?
\r simply moves the cursor back to the beginning of the current line. Anything printed after the \r is printed "on top of" the content previously there. On a real printer/teletype this would be literally true, with two characters getting printed in the same position ("overstruck"). On a terminal, the new characters replace the old ones (but only in positions that you actually write to).
You can take advantage of this behavior of terminals by printing spaces. You need at least as many spaces as the content you want to erase, but not enough to make the terminal wrap to the next line (this may be impossible if the line was printed all the way to the last character).
In your case, you know that the line won't be more than 22 characters long, so you could use end='\r \r' (go back to the beginning of the line, print 22 blanks, then go back to the beginning of the line again).
\r option will set (move) the cursor to start. It will not clear the text.
You have to make sure your printed data has enough space to overwrite the previous printed data or be of the same length since just moving to the same line would not automatically clear the previous contents.

Possible names for a process Linux?

I am trying to write a script in which I read from /proc/.../stat. One of the values in the space separated list is the name of the process, which does not interest me for the time being. I would like to read some other value after it. My idea was to move forward a certain number of values using spaces as the separator. A potential problem with this though is that I could have /proc/.../stat containing something like 1234 (asdf asdf) S .... The space in the process name would cause the program to read asdf) instead of S as intended.
So my question is can the process name have spaces in it? If so how could I differentiate between the values in /proc/.../stat?
I, personally, hate the way this file is laid out for precisely the reason you stated. With that said, it is possible to parse it uniquely no matter what the process name is. This is important, because not only the process name may contain spaces, it may also contain the close bracket character.
The method I suggest is to manually parse out the process name, and use space delimiting for everything else.
The process name should be defined as starting at the first open-bracket character on the line and ending at the last close bracket on the line. Since the other fields on the line don't have user-controlled format, this should reliably single the process name out, no matter what weird ways the proces is named.

Writing to a text file in seperate lines

I am creating a program, which at the end of all the inputs, I write to a text file and it comes up as one big line. Other than going into my text file and manually changing it to multiple lines, how do you write to a python text file in separate lines?
Eta:
Line one that holds one piece of inputted information
Line two that holds another piece of inputted information
Line three that holds a final piece of inputted information
I've tried writing twice to a file before closing it, but it returns an error saying it expected 1 argument and received more.
You should post at least a failed attempt that we can fix; but, due to the simplistic nature of the problem, I'll just give a quick answer anyway.
Steps:
Open the file in write ('w') mode (note that this blanks the file)
Write a line
Write a new-line ('\n'). Note that this step can be combined with the previous
Repeat steps 2 and 3 for all your lines
Close the file.
So, here's an implementation of the above. Note how we can use a with statement to do the first and last steps in one (and other benefits).
lines = ['Line one', 'Line two', 'Line three']
with open('your_file.txt', 'w') as f:
for l in lines:
f.write(l + '\n')

searching elements of list in file

The list name is disk and its below:
disks
['5000cca025884d5\n', '5000cca025a1ee6\n']
The file name is p and its below:
c0t5000CCA025884D5Cd0 solaris
/scsi_vhci/disk#g5000cca025884d5c
c0t5000CCA025A1EE6Cd0
/scsi_vhci/disk#g5000cca025a1ee6c
c3t50060E8007DB981Ad1
/pci#400/pci#1/pci#0/pci#8/SUNW,emlxs#0/fp#0,0/ssd#w50060e8007db981a,1
c3t50060E8007DB981Ad2
/pci#400/pci#1/pci#0/pci#8/SUNW,emlxs#0/fp#0,0/ssd#w50060e8007db981a,2
c3t50060E8007DB981Ad3
/pci#400/pci#1/pci#0/pci#8/SUNW,emlxs#0/fp#0,0/ssd#w50060e8007db981a,3
c3t50060E8007DB981Ad4
i want to search elements of a list in file
There are a couple of things to look at here:
I haven't actually used re.match() before, but I can see the first issue: Your list of disks has a newline character after every entry, so that will mess up matches. Also, re.match() only matches from the start of the line. Your lines start with numbers, so you need to search during the line, using re.search(). Finally, you should make it case insensitive; one option to d this is to make everything lowercase just as your disks list is.
try adapting your loop as so:
#.strip() will get rid of new lines and .lower() will make the string lowercase
for line in q:
if re.search(disks[0].strip(),line.lower()):
print line
If that doesn't fix it, I would try making it print out disks[0].strip() and line for every iteration of the loop (not just when it matches the if clause) to make sure it's reading in what you think it is.

Resources