How to modify and print list items in python? - python-3.x

I am a beginner in python, working on a small logic, i have a text file with html links in it, line by line. I have to read each line of the file, and print the individual links with same prefix and suffix,
so that the model looks like this.
<item>LINK1</item>
<item>LINK2</item>
<item>LINK3</item>
and so on.
I have tried this code, but something is wrong in my approach,
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
In the output, the suffix was not as expected, as i am a beginner, can anyone sort this out for me?
<item>www.google.com
</item>
<item>www.bing.com
</item>

I think when you use .readLine you also put the end of line character into i.
If i understand you correctly and you want to print
item www.google.com item
Then try
https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip
print(str("") + i.strip() + str(""))

When you use the readlines() method, it also includes the newline character from your file ("\n") before parsing the next line.
You could use a method called .strip() which strips off spaces or newline characters from the beginning and end of each line which would correctly format your code.
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i.strip() + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")

I assume you wanted to print in the following way
www.google.com
When you use readlines it gives extra '\n' at end of each line. to avoid that you can strip the string and in printing you can use fstrings.
with open(fname) as f:
lin=f.readlines()
for i in lin:
print(f"<item>{i.strip()}<item>")
Another method:
with open('stacksource') as f:
lin=f.read().splitlines()
for i in lin:
print(f"<item>{i}<item>")
Here splitlines() splits the lines and gives a list

Related

Text parsing with specific element Python

I have an aligment result with multiple sequences as text file. I want to split each result into new text file. Far now I can detect each sequence with '>', and split into files. However, new text files writen without line that contains '>'.
with open("result.txt",'r') as fo:
start=0
op= ' '
cntr=1
# print(fo.readlines())
for x in fo.readlines():
# print(x)
if (x[0]== '>'):
if (start==1):
with open(str(cntr)+'.txt','w') as opf:
opf.write(op)
opf.close()
op= ' '
cntr+=1
else:
start=1
else:
if (op==''):
op=x
else:
op= op + '\n' + x
fo.close()
print('completed')
>P51051.1 RecName: Full=Melatonin receptor type 1B; Short=Mel-1B-R; Short=Mel1b
receptor [Xenopus laevis]
Length=152
this is how I want to see as a beginning of each text file but they start as
receptor [Xenopus laevis]
Length=152
How can I include from the beginning?
You can do it like this:
with open("result.txt", encoding='utf-8') as fo:
for index, txt in enumerate(fo.read().split(">")):
if txt:
with open(f'{index}.txt', 'w') as opf:
opf.write(txt)
You should provide the encoding of the file e.g. utf-8, no need to specify read r, there is no need to close the file if you are using a context manager i.e. with and you just need to use read instead of readlines to get a string then call split on the string. I'm using enumerate to get a counter as well as enumerate objects. And f-string as it is a better way for string concatenation.

Remove all text after last occurence in file

I have a bunch of lines inside a text file that looks like this
STANGHOLMEN_BVP01_03_ME41_DELTAT_PV
STANGHOLMEN_TA02_TF01_FO_OP
STANGHOLMEN_VV01_PV01_SP2
STANGHOLMEN_VS01_GT11_EFFBEG_X1
I am trying to remove the text after the last occurrence of _
So this is how i try to make my text look
STANGHOLMEN_BVP01_03_ME41_DELTAT
STANGHOLMEN_TA02_TF01_FO
STANGHOLMEN_VV01_PV01
STANGHOLMEN_VS01_GT11_EFFBEG
its usually around 700 lines, Best way to do this?
You can parse the file line by line and add the content to a new file. To split the string you can use rsplit with maxsplit=1.
>>> with open("f_in.txt") as f_in, open("f_out.txt","w") as f_out:
... for line in f_in:
... f_out.write(line.rsplit('_', maxsplit=1)[0])
... f_out.write("\n")
You can user rfind() (returning index of substring looking from right side in simple words) from standard library, it will be the simplest way, but not so reliable.
last_index = string.rfind("_")
Next you have to slice yours string
new_string = string[:index]
You can use rsplit() and use the index[0] value.
For example if txt = 'STANGHOLMEN_VS01_GT11_EFFBEG_X1
txt1 = txt.rsplit('_',1)[0] will give you the values upto EFFBEG.
with open("f_in.txt") as f_in, open("f_out.txt","w") as f_out:
for line in f_in:
f_out.write(line.rsplit('_', maxsplit=1)[0])
f_out.write("\n")
This worked, however now all my text is in a long line, before it was sorted in lines.

file reading in python usnig different methods

# open file in read mode
f=open(text_file,'r')
# iterate over the file object
for line in f.read():
print(line)
# close the file
f.close()
the content of file is "Congratulations you have successfully opened the file"! when i try to run this code the output comes in following form:
c (newline) o (newline) n (newline) g.................
...... that is each character is printed individually on a new line because i used read()! but with readline it gives the answer in a single line! why is it so?
r.read() returns one string will all characters (the full file content).
Iterating a string iterates it character wise.
Use
for line in f: # no read()
instead to iterate line wise.
f.read() returns the whole file in a string. for i in iterates something. For a string, it iterates over its characters.
For readline(), it should not print the line. It would read the first line of the file, then print it character by character, like read. Is it possible that you used readlines(), which returns the lines as a list.
One more thing: there is with which takes a "closable" object and auto-closes it at the end of scope. And you can iterate over a file object. So, your code can be improved like this:
with open(text_file, 'r') as f:
for i in f:
print(i)

Split text in text file into lines

I need to split a text file into lines.
I imported the text file into python but print(readline()) prints the whole file.
with open('laxdaela_saga.en.txt', 'r+') as f:
for line in f.readlines():
print(line)
I eventually need to count unique words in the text file and other stats, but one step is to divide into lines. This is the step I'm dealing with.
You can use split() function of Python. It splits the given string into an array based on some pattern.
In your case, the pattern will be newline \n.
so split('\n') should do it.
Try this
with open('laxdaela_saga.en.txt', 'r+') as f:
for line in f.readlines():
x = line.split()
print(x)
Hope this will be of your help.

Search text file for word from list then output word that matched in Python 3.x

I have been searching a large directory of text files for files that match a list of words. How do I have python output the word from the list that matches?
This is what I have so far. It writes the file name every time one of the words from the list is found. I want to add the matching word to the line with the file name so I have the file name and 1 matched word each time. How do I do that?
ngwrds= ['words'...]
for filename in os.listdir(os.getcwd()):
with open(filename, 'r') as searchfile:
for line in searchfile:
if any(x in line for x in ngwrds):
with open("keyword.txt", 'a') as out:
out.write(filename + '\n')
The input is a long text file a line might read like this:
The company reported depreciation of $1.20.
The if one of the search words from the list was depreciation then the output file would look like this:
filename depreciation
Thank you.
I am not sure what out is and I can't run your code from where I am but you could try something like this:
ngwrds= ['words'...]
for filename in os.listdir(os.getcwd()):
with open(filename, 'r') as searchfile:
for line in searchfile:
line = line.strip().split(" ")
for word in line:
if word in ngwrds:
out.write(filename + " " + word)
strip gets rid of whitespace on either end of line. split returns a list of the words in line.

Resources