how to replace multiple duplicate strings in a file without deleting anything else python 3 - python-3.x

Ok so I have this code:
for line in fileinput.FileInput("zero.html",inplace=1):
if '{problem}' in line:
rep = 'a dynamic var'
line = line.replace('{problem}', rep)
print(line)
Now, the problem is that it replaces the text fine, but it deletes all other lines without '{problem}' in it. How can I replace '{problem}' with something else, without deleting the other lines? Also, I have multiple occurrences of '{problem}' in my file, and I want each one to be changed to a different, random string.
Thanks!

The if statement doesn't say what to do with the line if it doesn't contain '{problem}'. So as written, your code just ignores those lines. You could add an else clause that prints the line. Or you could just drop the if test, like this:
for line in fileinput.FileInput("zero.html", inplace=1):
rep = 'a dynamic var'
line = line.replace('{problem}', rep)
print(line)
The replace method will leave the line unchanged if it doesn't contain '{problem}'.

Related

Python: Reading line with 'readline()' function and appending to a list

My code:
In my file i have these numbers in a list
charge_account = ['4654145', '9658115', '5658845', '5658045', '6181531', '2134874', '5964554']
I am reading the file with a function, appending it to a list and then returning the list:
import os
os.system('cls')
def fileReader():
contentList = []
with open('charge_accounts.txt','r') as f:
line = f.readline().rstrip('\n')
while line !="":
line = f.readline().rstrip(' \n')
contentList.append(line)
# print(contentList)
# print(len(contentList))
#contentList = contentList[:-1]
print(contentList)
return contentList
Now my question is, when i read all the file content and append them to my list, i am getting an extra blank string at the end of the list.
output:
['4654145', '9658115', '5658845', '5658045', '6181531', '2134874', '5964554', '']
Now i have solved it by using slicing (as i commented them out) but i still have not figured out why i am getting the ' ' in the end of the list. i tried filtering it out but noting happens. i have checked if it there is an extra line in the end of the file but what am i doing wrong ?
There are a couple of things. You are reading the file line by line in the while loop. This means that after the last line is read, the while condition is still true so you read an extra line (which is empty) but still added to your list.
But you don't need a while loop: use lines = f.readlines(). It will read the whole file in a list, and you almost have the list you are aiming for. Almost, because you need to strip each element:
def fileReader():
with open('charge_accounts.txt','r') as f:
lines = f.readlines()
return [line.strip() for line in lines]
print(fileReader())
while line !="":
contentList.append(line)
line = f.readline().rstrip(' \n')
print(contentList)
I realized i had to append the while loop primer into the list which i read before the loop started. content.append(line) had to be the first statement in the while loop. This solves the blank entry in the end of list, which in hindsight i realize means that i skipped the first readline value.

Python3 - Problem during removing a line from a text file

I am trying to delete a line from a text file after opening it and without storing it in any list variable using f.readlines() or anything like that.
I dont have an option to open the file and store the contents in a variable and make some changes and write them to another file or any kind of operations that would require to open the file and store them again in a list variable and make some changes and store them back to the file. The file is being constantly appended by some other program, so I cannot do any kind of that stuff.
I am using f.seek() to reset the pointer to the beginning of the file, and using f.readline() as well as f.tell() to know the length of the first line. After that I am trying to replace each character with a blank space using while loop.
pos=0
eol = 0
ll=0
with open('file1.txt','rb+') as f:
f.seek(pos,1) #position at the beginning of the file
print(f.readline()) #reading the first line
pos = f.tell() #storing the length of first line
#the while loop will run from 0 to pos and replace every character with blank space
while eol != pos:
with open('file1.txt','rb+') as f:
f.seek(eol,1)
f.write(b' ')
eol += 1 #incrementing the eol variable to move the file pointer to next character
the code is working fine but with one problem which I cant figure out what,
for example if this is the original file
file1.txt
this is line 1
this is line 2
this is line 3
after running the program , my output is
this is line 2
this is line 3
the first line is getting deleted but there is a bunch of white space in front of the 2nd line.
Maybe I am missing a simple logic here.
Any help will be appreciated.
Thank you
Update :
If i have understood it correctly I have changed the code and made it like this, and instead of b' ' i am putting '\r' as carraige return, which resulted in this :
the code :
while eol != pos-1:
with open('file1.txt','rb+') as f:
f.seek(eol,0)
f.write(b'\r')
eol += 1
the result :
original :
this is line 1
this is line 2
this is line 3
after execution
this is line 2
this is line 3
you see the 1st line is removed but followed with '\r'

str.format places last variable first in print

The purpose of this script is to parse a text file (sys.argv[1]), extract certain strings, and print them in columns. I start by printing the header. Then I open the file, and scan through it, line by line. I make sure that the line has a specific start or contains a specific string, then I use regex to extract the specific value.
The matching and extraction work fine.
My final print statement doesn't work properly.
import re
import sys
print("{}\t{}\t{}\t{}\t{}".format("#query", "target", "e-value",
"identity(%)", "score"))
with open(sys.argv[1], 'r') as blastR:
for line in blastR:
if line.startswith("Query="):
queryIDMatch = re.match('Query= (([^ ])+)', line)
queryID = queryIDMatch.group(1)
queryID.rstrip
if line[0] == '>':
targetMatch = re.match('> (([^ ])+)', line)
target = targetMatch.group(1)
target.rstrip
if "Score = " in line:
eValue = re.search(r'Expect = (([^ ])+)', line)
trueEvalue = eValue.group(1)
trueEvalue = trueEvalue[:-1]
trueEvalue.rstrip()
print('{0}\t{1}\t{2}'.format(queryID, target, trueEvalue), end='')
The problem occurs when I try to print the columns. When I print the first 2 columns, it works as expected (except that it's still printing new lines):
#query target e-value identity(%) score
YAL002W Paxin1_129011
YAL003W Paxin1_167503
YAL005C Paxin1_162475
YAL005C Paxin1_167442
The 3rd column is a number in scientific notation like 2e-34
But when I add the 3rd column, eValue, it breaks down:
#query target e-value identity(%) score
YAL002W Paxin1_129011
4e-43YAL003W Paxin1_167503
1e-55YAL005C Paxin1_162475
0.0YAL005C Paxin1_167442
0.0YAL005C Paxin1_73182
I have removed all new lines, as far I know, using the rstrip() method.
At least three problems:
1) queryID.rstrip and target.rstrip are lacking closing ()
2) Something like trueEValue.rstrip() doesn't mutate the string, you would need
trueEValue = trueEValue.rstrip()
if you want to keep the change.
3) This might be a problem, but without seeing your data I can't be 100% sure. The r in rstrip stands for "right". If trueEvalue is 4e-43\n then it is true the trueEValue.rstrip() would be free of newlines. But the problem is that your values seem to be something like \n43-43. If you simply use .strip() then newlines will be removed from either side.

IndexError: list index out of range, but list length OK

New to programming, looking for a deeper understanding on whats happening.
Goal: open a file and print the first 10 lines. (similar to head command)
Code:
with open('file') as f:
for i in range(0,10):
print([line.strip('\n') for line in f][i])
Result: prints first line fine, then returns the out of range error
File: Is a simple text file with 20 lines, no more than 50 chars per line
FYI - Removed range line and printed both type(list) and length(20). Printed specific indexes without issue (unless >1 in a row)
Able to get the desired result with different code, but trying to improve using with/as
You can actually iterate over a file. Which is what you should be doing here.
with open('file') as f:
for i, line in enumerate(file, start=1):
# Get out of the loop if we hit 10 lines
if i >= 10:
break
# Line already has a '\n' at the end
print(line, end='')
The reason that your code is failing is because of your list comprehension:
[line.strip('\n') for line in f]
The first time through your loop that consumes all of the lines in your file. Now your file has no more lines, so the next time through it creates a list of all the lines in your file and tries to get the [1]st element. But that doesn't exist because there are no lines at the end of your file.
If you wanted to keep your code mostly as-is you could do
lines = [line.rstrip('\n') for line in f]
for i in range(10):
print(lines[i])
But that's also silly, because you could just do
lines = f.readlines()
But that's also silly if you just want up to the 10th line, because you could do this:
with open('file') as f:
print('\n'.join(f.readlines()[:10]))
Some further explanation:
The shortest and worst way you could fix your code is by adding one line of code:
with open('file') as f:
for i in range(0,10):
f.seek(0) # Add this line
print([line.strip('\n') for line in f][i])
Now your code will work - but this is a horrible way to get your code to work. The reason that your code isn't working the way you expect in the first place is that files are consumable iterators. That means that when you read from them eventually you run out of things to read. Here's a simple example:
import io
file = io.StringIO('''
This is is a file
It has some lines
okay, only three.
'''.strip())
for line in file:
print(file.tell(), repr(line))
This outputs
18 'This is is a file\n'
36 'It has some lines\n'
53 'okay, only three.'
Now if you try to read from the file:
print(file.read())
You'll see that it doesn't output anything. That's because you've "consumed" the file. I mean obviously it's still on disk, but the iterator has reached the end of the file. But as shown, you can seek in the file.
print(file.tell())
file.seek(0)
print(file.tell())
print(file.read())
And you'll see your entire file printed. But what about those other positions?
file.seek(36)
print(file.read()) # => okay, only three.
As a side note, you can also specify how much to read:
file.seek(36)
print(file.read(4)) # => okay
print(file.tell()) # => 40
So when we read from a file or iterate over it we consume the iterator and get to the end of the file. Let's put your new tools to work and go back to your original code and explore what's happening.
with open('file') as f:
print(f.tell())
lines = [line.rstrip('\n') for line in f]
print(f.tell())
print(len([line for line in f]))
print(lines)
You'll see that you're at a different location in the file. And the second list comprehension produces an empty list. That's because when a list comprehension is evaluated it executes immediately. So when you do this:
for i in range(10):
print([line.strip('\n') for line in f][i])
What you're doing the first time, i = 0 and then the list comprehension reads to the end of the file. Now it takes the [0]th element of the list, or the first line in the file. But your file iterator is at the end of the file.
So now we get back to the beginning of the list and i = 1. Now we iterate to the end of the file, but we're already at the end so there are no lines to read, and we've got an empty list [] that we try to get the [0]th element of. But there's nothing there. So we get an IndexError.
List comprehensions can be useful, but when you're beginning it's usually much easier to write a for loop and then turn it into a list comprehension. So you might write something like this:
with open('file') as f:
for i, line in enumerate(file, start=10):
if i < 10:
print(line.rstrip())
Now, we shouldn't print inside a list comprehension, so instead we'll collect everything. We start out by putting what we want:
[line.rstrip()
Now add the for bit:
[line.rstrip() for i, line in enumerate(f)
And finally add the filter and our closing brace:
[line.rstrip() for i, line in enumerate(f) if i < 10]
For more on list comprehensions, this is a fantastic resource: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

Using python to remove nearly identical lines in txt file, with the exception of first and last lines

Here is a snippet from a text file I am working on.
http://pastebin.com/4Uba5i4P
I would like to use python to detect those big repeating "~ Move" lines (Which are not identical except for the "~ Move" part.), and remove all but the first and last of those lines.
How I would I start to go about this?
You could read the file line by line like this:
`## Open the file with read only permit
f = open('myTextFile.txt')
## Read the first line
line = f.readline()
## If the file is not empty keep reading line one at a time #
# till the file is empty while line: print line
line = f.readline() f.close()`
With this you could then edit this sample to test each line using a regex like this:
`if line.find("~Move") == -1:
Break;
Else:
Line=Line [5:-1]`
Though this assumes that the ~Move is all at the beginning of the line. Hope this helps, if not leave a comment and I'll try and help.

Resources