Remove all text after last occurence in file - python-3.x

I have a bunch of lines inside a text file that looks like this
STANGHOLMEN_BVP01_03_ME41_DELTAT_PV
STANGHOLMEN_TA02_TF01_FO_OP
STANGHOLMEN_VV01_PV01_SP2
STANGHOLMEN_VS01_GT11_EFFBEG_X1
I am trying to remove the text after the last occurrence of _
So this is how i try to make my text look
STANGHOLMEN_BVP01_03_ME41_DELTAT
STANGHOLMEN_TA02_TF01_FO
STANGHOLMEN_VV01_PV01
STANGHOLMEN_VS01_GT11_EFFBEG
its usually around 700 lines, Best way to do this?

You can parse the file line by line and add the content to a new file. To split the string you can use rsplit with maxsplit=1.
>>> with open("f_in.txt") as f_in, open("f_out.txt","w") as f_out:
... for line in f_in:
... f_out.write(line.rsplit('_', maxsplit=1)[0])
... f_out.write("\n")

You can user rfind() (returning index of substring looking from right side in simple words) from standard library, it will be the simplest way, but not so reliable.
last_index = string.rfind("_")
Next you have to slice yours string
new_string = string[:index]

You can use rsplit() and use the index[0] value.
For example if txt = 'STANGHOLMEN_VS01_GT11_EFFBEG_X1
txt1 = txt.rsplit('_',1)[0] will give you the values upto EFFBEG.

with open("f_in.txt") as f_in, open("f_out.txt","w") as f_out:
for line in f_in:
f_out.write(line.rsplit('_', maxsplit=1)[0])
f_out.write("\n")
This worked, however now all my text is in a long line, before it was sorted in lines.

Related

How to modify and print list items in python?

I am a beginner in python, working on a small logic, i have a text file with html links in it, line by line. I have to read each line of the file, and print the individual links with same prefix and suffix,
so that the model looks like this.
<item>LINK1</item>
<item>LINK2</item>
<item>LINK3</item>
and so on.
I have tried this code, but something is wrong in my approach,
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
In the output, the suffix was not as expected, as i am a beginner, can anyone sort this out for me?
<item>www.google.com
</item>
<item>www.bing.com
</item>
I think when you use .readLine you also put the end of line character into i.
If i understand you correctly and you want to print
item www.google.com item
Then try
https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip
print(str("") + i.strip() + str(""))
When you use the readlines() method, it also includes the newline character from your file ("\n") before parsing the next line.
You could use a method called .strip() which strips off spaces or newline characters from the beginning and end of each line which would correctly format your code.
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i.strip() + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
I assume you wanted to print in the following way
www.google.com
When you use readlines it gives extra '\n' at end of each line. to avoid that you can strip the string and in printing you can use fstrings.
with open(fname) as f:
lin=f.readlines()
for i in lin:
print(f"<item>{i.strip()}<item>")
Another method:
with open('stacksource') as f:
lin=f.read().splitlines()
for i in lin:
print(f"<item>{i}<item>")
Here splitlines() splits the lines and gives a list

Split text in text file into lines

I need to split a text file into lines.
I imported the text file into python but print(readline()) prints the whole file.
with open('laxdaela_saga.en.txt', 'r+') as f:
for line in f.readlines():
print(line)
I eventually need to count unique words in the text file and other stats, but one step is to divide into lines. This is the step I'm dealing with.
You can use split() function of Python. It splits the given string into an array based on some pattern.
In your case, the pattern will be newline \n.
so split('\n') should do it.
Try this
with open('laxdaela_saga.en.txt', 'r+') as f:
for line in f.readlines():
x = line.split()
print(x)
Hope this will be of your help.

How do I replace the 4th item in a list that is in a file that starts with a particular string?

I need to search for a name in a file and in the line starting with that name, I need to replace the fourth item in the list that is separated my commas. I have began trying to program this with the following code, but I have not got it to work.
with open("SampleFile.txt", "r") as f:
newline=[]
for word in f.line():
newline.append(word.replace(str(String1), str(String2)))
with open("SampleFile.txt", "w") as f:
for line in newline :
f.writelines(line)
#this piece of code replaced every occurence of String1 with String 2
f = open("SampleFile.txt", "r")
for line in f:
if line.startswith(Name):
if line.contains(String1):
newline = line.replace(str(String1), str(String2))
#this came up with a syntax error
You could give some dummy data which would help people to answer your question. I suppose you to backup your data: You can save the edited data to a new file or you can backup the old file to a backup folder before working on the data (think about using "from shutil import copyfile" and then "copyfile(src, dst)"). Otherwise by making a mistake you could easily ruin your data without being able to easily restore them.
You can't replace the string with "newline = line.replace(str(String1), str(String2))"! Think about "strong" as your search term and a line like "Armstrong,Paul,strong,44" - if you replace "strong" with "weak" you would get "Armweak,Paul,weak,44".
I hope the following code helps you:
filename = "SampleFile.txt"
filename_new = filename.replace(".", "_new.")
search_term = "Smith"
with open(filename) as src, open(filename_new, 'w') as dst:
for line in src:
if line.startswith(search_term):
items = line.split(",")
items[4-1] = items[4-1].replace("old", "new")
line = ",".join(items)
dst.write(line)
If you work with a csv-file you should have a look at the csv module.
PS My files contain the following data (the filenames are not in the files!!!):
SampleFile.txt SampleFile_new.txt
Adams,George,m,old,34 Adams,George,m,old,34
Adams,Tracy,f,old,32 Adams,Tracy,f,old,32
Smith,John,m,old,53 Smith,John,m,new,53
Man,Emily,w,old,44 Man,Emily,w,old,44

Search text file for word from list then output word that matched in Python 3.x

I have been searching a large directory of text files for files that match a list of words. How do I have python output the word from the list that matches?
This is what I have so far. It writes the file name every time one of the words from the list is found. I want to add the matching word to the line with the file name so I have the file name and 1 matched word each time. How do I do that?
ngwrds= ['words'...]
for filename in os.listdir(os.getcwd()):
with open(filename, 'r') as searchfile:
for line in searchfile:
if any(x in line for x in ngwrds):
with open("keyword.txt", 'a') as out:
out.write(filename + '\n')
The input is a long text file a line might read like this:
The company reported depreciation of $1.20.
The if one of the search words from the list was depreciation then the output file would look like this:
filename depreciation
Thank you.
I am not sure what out is and I can't run your code from where I am but you could try something like this:
ngwrds= ['words'...]
for filename in os.listdir(os.getcwd()):
with open(filename, 'r') as searchfile:
for line in searchfile:
line = line.strip().split(" ")
for word in line:
if word in ngwrds:
out.write(filename + " " + word)
strip gets rid of whitespace on either end of line. split returns a list of the words in line.

Use Python to parse comma separated string with text delimiter coming from stdin

I have a csv file that is being fed to my Python script via stdin.
This is a comma separated file with quotations as text delimiter.
Here is an example line:
457,"Last,First",NYC
My script so far, splits each line by looking for commas, but how do I make it aware of the text delimiter quotes?
My current script:
for line in sys.stdin:
line = line.strip()
line.split(',')
print line
The code splits the name into two since it does not recognize the quotations enclosing that text field. I need the name to remain as a single element.
If it matters, the data is being fed through stdin within a hadoop-streaming program.
Thanks!
Well, you could do it more manually, with something like this:
row = []
enclosed = False
word = ''
for character in sys.stdin:
if character == '"':
enclosed = not enclosed
elif character = ',' and not enclosed:
row.append(word)
word = ''
else:
word += character
Haven't tested nor thought about it for too long but seems to me it could work. Probably someone more into Pythonist sintax could fine something better for doing the trick although ;)
Attempting to answer my own question. If I read right, it may be possible to send a streaming input into csv reader like so:
for line in csv.reader(sys.stdin):
print line

Resources