Is there a faster way to extract lines from a file? - python-3.x

I have a set of files that I need to search through and extract certain lines. Right now, I'm using a for loop but this is proving costly in terms of time. Is there a faster way than the below?
import re
for file in files:
localfile = open(file, 'r')
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
open("words.txt","a+").write(words + "\n")

Well for one thing, you are creating a new file descriptor every time that you write to the words.txt file.
I ran some tests and found that python garbage collection does in fact close open file descriptors when they become inaccessible (at least in my test case).
However, creating a file descriptor every time that you want to append to a file is going to be costly. For future reference, it is considered good practice to use with as blocks for opening files.
TLDR:
One improvement you could make is to open the file you are writing to just once.
Here is what that would look like:
import re
with open("words.txt","a+") as words_file:
for file in files:
localfile = open(file, 'r')
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
words_file.write(words + "\n")
Like I said, using with as statements when opening files is considered best practice. We can fully implement this best practice like so:
import re
with open("words.txt","a+") as words_file:
for file in files:
with open(file, 'r') as localfile:
for line in localfile:
if re.search("Common English Words", line):
words = line.split("|")[0]
# Append words to file words.txt
words_file.write(words + "\n")

Related

what is the best way to modify a particular line of a csv file without erasing everything?

I have a csv file open with notepad :
I want to modify my csv to replace line 4 with 'Temperature Level Wave Degre':
My code :
with open(file.csv, 'w') as m:
content = m.readlines()
line_4 = (content[3])
line_4.replace(line_4, 'Temperature Level Wave Degre')
My csv is empty!
What is the best way to modify a particular line of a csv file without erasing everything ?
The most efficient code?
Thank you.
Here is a solution using python, however as I mentioned in my comment, there are likely more efficient command line tools (I would use awk)
with open(filename, 'r+') as f:
lines = f.readlines() # read lines and store in list
lines[3] = 'Temperature Level Wave Degre\n' # change line 4, don't forget to add a newline
f.seek(0) # come back to beginning of file
f.writelines(lines) # write the new lines
f.truncate() # ensure we don't keep old stuff from the previous file

How to replace multiple tabs with only one tab using python3 + Pandas in a given .csv File

i'm trying to replace multiple tabs with only one tab using python3 + Pandas in a given .csv File, but i'm not able to find a way to solve this problem; if my function is:
def function(csv_file):
-remove multiple tabs --> means have a \t \t b ==> a \t \b
[...]
the file must be remain a csv file.
How could i do it?
A csv is just a text file that can be parsed with tailored tools, but can also be read as plain text. So, you can use regex to substitute consecutive \t instances.
You still need to provide more details, but take this as a provisional answer.
import re
with open('test.csv', 'r') as fo:
text = fo.read()
print(text)
print(repr(text))
text = re.sub(r'\t+', r'\t', text)
print(text)
print(repr(text))
Output
test sdasdf
asfasdf asdf asfasdf asdf
'test\t\tsdasdf\nasfasdf\tasdf\tasfasdf\t\tasdf'
# after regex
test sdasdf
asfasdf asdf asfasdf asdf
'test\tsdasdf\nasfasdf\tasdf\tasfasdf\tasdf'
Notice the last print does not have any consecutive tabs.
Now you can write back to csv.
import os
with open('test_temp.csv', 'w') as fo:
fo.write(text)
# os.remove('test.csv')
# os.rename('test_temp.csv', 'test.csv')
It is a good idea to write a temp file, remove the original, and finally rename the temp. This is so you have a safe copy at all times for odd situations like corrupt file writes, power outages, or any other contingency.

Is there a way to open a file for the user to read?

I know it's possible for you to open a file and edit it, but is it possible for a python program to open a file so the user can read it? I'm trying to find a way to open up .txt file so the user can see what was written on it.
If you want to print to terminal/command line you could just do something like this.
with open("line_file.txt", "r") as f:
lines = f.readlines()
for line in lines():
print(line)
But make a habit of reading files line by line and stripping it as well.
with open('file.txt','r') as f:
line=f.readline()
while line:
print(line.strip())
line=f.readline()

How to add text to a file in python3

Let's say i have the following file,
dummy_file.txt(contents below)
first line
third line
how can i add a line to that file right in the middle so the end result is:
first line
second line
third line
I have looked into opening the file with the append option, however that adds the line to the end of the file.
with open("dummy_file.txt", 'r') as file:
lines = file.readlines()
lines.insert(1, "second line\n")
with open("dummy_file.txt", 'w') as output:
output.writelines(lines)
So:
We open the file an read all the lines making a list.
We insert to the list the desired new line, using \n for a new line.
We open the file again but this time to write.
We write all the lines from the list.
But I wouldn't recommend this method, due it hight memory usage (if the file is big).
The standard file methods don't support inserting into the middle of a file. You need to read the file, add your new data to the data that you read in, and then re-write the whole file.

Add text to specific text line

i'm kinda new to linux programing, and I've searched everywere, and i don't find any answer for my question, i have a file lets call it, config.txt/.ini;
My question is: Is there anyway with a script, to find in the file some text and if it finds the search text do something;
For exemple:
Search for: 'my/text/mytext'
And add: ';' to the begin of the line.
or even delete the line.
Have you considered looking at tools such as:
awk
sed
perl
python
which all can do this fairly easily.
Awk is probably the slimmest (and thus fastest) of these:
awk '{sub(/root/, "yoda"); print}'
will substitute the first match for regexp root with the string yoda on each line.
Since your question is vague, and you didn't define what kind of script, and because I'm currently learning Python, I took the time to write a python script to remove lines in foo.txt that contain "mytext". Yes, it is possible. There are countless other ways to do it as well.
import re
# Open the file and read all the lines into an array
f = open("foo.txt", "r")
lines = [];
for line in f:
lines.append(line)
f.close()
# Write all the lines back that don't match our criteria for removal
f = open("foo.txt", "w")
for line in lines:
if re.search("mytext", line) == None:
f.write(line)
f.close()

Resources