I need to split a text file into lines.
I imported the text file into python but print(readline()) prints the whole file.
with open('laxdaela_saga.en.txt', 'r+') as f:
for line in f.readlines():
print(line)
I eventually need to count unique words in the text file and other stats, but one step is to divide into lines. This is the step I'm dealing with.
You can use split() function of Python. It splits the given string into an array based on some pattern.
In your case, the pattern will be newline \n.
so split('\n') should do it.
Try this
with open('laxdaela_saga.en.txt', 'r+') as f:
for line in f.readlines():
x = line.split()
print(x)
Hope this will be of your help.
Related
I have a bunch of lines inside a text file that looks like this
STANGHOLMEN_BVP01_03_ME41_DELTAT_PV
STANGHOLMEN_TA02_TF01_FO_OP
STANGHOLMEN_VV01_PV01_SP2
STANGHOLMEN_VS01_GT11_EFFBEG_X1
I am trying to remove the text after the last occurrence of _
So this is how i try to make my text look
STANGHOLMEN_BVP01_03_ME41_DELTAT
STANGHOLMEN_TA02_TF01_FO
STANGHOLMEN_VV01_PV01
STANGHOLMEN_VS01_GT11_EFFBEG
its usually around 700 lines, Best way to do this?
You can parse the file line by line and add the content to a new file. To split the string you can use rsplit with maxsplit=1.
>>> with open("f_in.txt") as f_in, open("f_out.txt","w") as f_out:
... for line in f_in:
... f_out.write(line.rsplit('_', maxsplit=1)[0])
... f_out.write("\n")
You can user rfind() (returning index of substring looking from right side in simple words) from standard library, it will be the simplest way, but not so reliable.
last_index = string.rfind("_")
Next you have to slice yours string
new_string = string[:index]
You can use rsplit() and use the index[0] value.
For example if txt = 'STANGHOLMEN_VS01_GT11_EFFBEG_X1
txt1 = txt.rsplit('_',1)[0] will give you the values upto EFFBEG.
with open("f_in.txt") as f_in, open("f_out.txt","w") as f_out:
for line in f_in:
f_out.write(line.rsplit('_', maxsplit=1)[0])
f_out.write("\n")
This worked, however now all my text is in a long line, before it was sorted in lines.
I am a beginner in python, working on a small logic, i have a text file with html links in it, line by line. I have to read each line of the file, and print the individual links with same prefix and suffix,
so that the model looks like this.
<item>LINK1</item>
<item>LINK2</item>
<item>LINK3</item>
and so on.
I have tried this code, but something is wrong in my approach,
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
In the output, the suffix was not as expected, as i am a beginner, can anyone sort this out for me?
<item>www.google.com
</item>
<item>www.bing.com
</item>
I think when you use .readLine you also put the end of line character into i.
If i understand you correctly and you want to print
item www.google.com item
Then try
https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip
print(str("") + i.strip() + str(""))
When you use the readlines() method, it also includes the newline character from your file ("\n") before parsing the next line.
You could use a method called .strip() which strips off spaces or newline characters from the beginning and end of each line which would correctly format your code.
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i.strip() + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
I assume you wanted to print in the following way
www.google.com
When you use readlines it gives extra '\n' at end of each line. to avoid that you can strip the string and in printing you can use fstrings.
with open(fname) as f:
lin=f.readlines()
for i in lin:
print(f"<item>{i.strip()}<item>")
Another method:
with open('stacksource') as f:
lin=f.read().splitlines()
for i in lin:
print(f"<item>{i}<item>")
Here splitlines() splits the lines and gives a list
Suppose i have a file.txt, in each line of it 6 value divided by a comma.
a,b,c,d,e,f
How can I list each line in the form of [a,b,c,d,e,f]?
i would suggest using .split(), as it can split it like this:
f = open("file.txt", "r")
f.read()
spited = f.split(",")
print(spited)
Which prints the numbers/letters as a list.
If you have any questions on why/how this works, just ask!! :D
Thanks for your tips, ive finally managed to do what i wanted with this piece of code:
with open(l, "r") as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
So basically i have a list in a file and i only want to print the line containing an A
Here is a small part of the list
E5341,21/09/2015,C102,440,E,0
E5342,21/09/2015,C103,290,A,290
E5343,21/09/2015,C104,730,N,0
E5344,22/09/2015,C105,180,A,180
E5345,22/09/2015,C106,815,A,400
So i only want to print the line containing A
Sorry im still new at python,
i gave a try using one "print" to print the whole line but ended up failing guess i will always suck at python
You just have to:
open file
read lines
for each line, split at ","
for each line, if the 5th part of the splitted str is equal to "A", print line
Code:
filepath = 'file.txt'
with open(filepath, 'r') as f:
lines = f.readlines()
for line in lines:
if line.split(',')[4] == "A":
print(line)
I have been searching a large directory of text files for files that match a list of words. How do I have python output the word from the list that matches?
This is what I have so far. It writes the file name every time one of the words from the list is found. I want to add the matching word to the line with the file name so I have the file name and 1 matched word each time. How do I do that?
ngwrds= ['words'...]
for filename in os.listdir(os.getcwd()):
with open(filename, 'r') as searchfile:
for line in searchfile:
if any(x in line for x in ngwrds):
with open("keyword.txt", 'a') as out:
out.write(filename + '\n')
The input is a long text file a line might read like this:
The company reported depreciation of $1.20.
The if one of the search words from the list was depreciation then the output file would look like this:
filename depreciation
Thank you.
I am not sure what out is and I can't run your code from where I am but you could try something like this:
ngwrds= ['words'...]
for filename in os.listdir(os.getcwd()):
with open(filename, 'r') as searchfile:
for line in searchfile:
line = line.strip().split(" ")
for word in line:
if word in ngwrds:
out.write(filename + " " + word)
strip gets rid of whitespace on either end of line. split returns a list of the words in line.