some_file.txt: (berore)
one
two
three
four
five
...
How can I effectively modify large file in Python?
with open("some_file.txt", "r+") as file:
for idx, line in enumerate(file.readlines()):
file.writeline(f'{idx} {line}') # something like this
some_file.txt: (after)
1 one
2 two
3 three
4 four
5 five
...
Don't try to load your entire file in memory, because the file may be too large for that. Instead, read line by line:
with open('input.txt') as inp, open('output.txt', 'w') as out:
idx = 1
for line in inp:
out.write(f'{idx} {line}'
idx += 1
You can't insert into the middle of a file without re-writing it. This is an operating system thing, not a Python thing.
Use pathlib for path manipulation. Rename the original file. Then copy it to a new file, adding the line numbers as you go. Keep the old file until you verify the new file is correct.
Open files are iterable, so you can use enumerate() on them directly without having to use readlines() first. The second argument to enumerate() is the number to start the count with. So the loop below will number the lines starting with 1.
from pathlib import Path
target = Path("some_file.txt")
# rename the file with ".old" suffix
original = target.rename(target.with_suffix(".old"))
with original.open("r") as source, target.open("w") as sink:
for line_no, line in enumerate(source, 1):
sink.writeline(f'{line_no} {line}')
I am a little bit confused in how to read all lines in many files where the file names have format from "datalog.txt.98" to "datalog.txt.120".
This is my code:
import json
file = "datalog.txt."
i = 97
for line in file:
i+=1
f = open (line + str (i),'r')
for row in f:
print (row)
Here, you will find an example of one line in one of those files:
I need really to your help
I suggest using a loop for opening multiple files with different formats.
To better understand this project I would recommend researching the following topics
for loops,
String manipulation,
Opening a file and reading its content,
List manipulation,
String parsing.
This is one of my favourite beginner guides.
To set the parameters of the integers at the end of the file name I would look into python for loops.
I think this is what you are trying to do
# create a list to store all your file content
files_content = []
# the prefix is of type string
filename_prefix = "datalog.txt."
# loop from 0 to 13
for i in range(0,14):
# make the filename variable with the prefix and
# the integer i which you need to convert to a string type
filename = filename_prefix + str(i)
# open the file read all the lines to a variable
with open(filename) as f:
content = f.readlines()
# append the file content to the files_content list
files_content.append(content)
To get rid of white space from file parsing add the missing line
content = [x.strip() for x in content]
files_content.append(content)
Here's an example of printing out files_content
for file in files_content:
print(file)
So, I'm trying to answer a coding question. It's supposed to create a random knock knock joke from an external text file, but I can't figure out how to get the joke randomized. It just prints the first joke.
The below is my code:
# Saving filepath to a variable
# makes a smoother transition to the Sandbox
filepath = "KnockKnock.txt"
# When finished copy all code after this line into the Sandbox
# Open the file as read-only
inFile = open(filepath, "r")
# Get the first line and do something with it
line = inFile.readline()
# Write your program below
print("Knock-Knock")
print("Who's there?")
print (line)
print(line + "who?")
line = inFile.readline()
print(line)
line = inFile.readline()
inFile.close()
Any idea how to get a random joke instead of it just doing the first one in the file?
Assuming your file KnockKnock.txt has the jokes in pairs, every other line, then we can read all of the jokes into a list of 2-tuples, containing the setup and punchline.
import random
...
# read in file and make a list of jokes
with open('KnockKnock.txt', 'r') as infile:
# make a list of lines from file
in_lines = infile.readlines()
# pair every line with every other line - setup and punchline
jokes = list(zip(in_lines[0::2], in_lines[1::2]))
# choose a random joke
setup, punchline = random.choice(jokes)
# print the joke
print("Knock-Knock")
print("Who's there?")
print(setup)
print(setup + " who?")
print(punchline)
I've read every thing I can find and tried about 20 examples from SO and google, and nothing seems to work.
This should be very simple, but I cannot get it to work. I just want to point to a folder, and replace every double quote in every file in the folder. That is it. (And I don't know Python well at all, hence my issues.) I have no doubt that some of the scripts I've tried to retask must work, but my lack of Python skill is getting in the way. This is as close as I've gotten, and I get errors. If I don't get errors it seems to do nothing. Thanks.
import glob
import csv
mypath = glob.glob('\\C:\\csv\\*.csv')
for fname in mypath:
with open(mypath, "r") as infile, open("output.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
writer.writerow(item.replace("""", "") for item in row)
You don't need to use csv-specific file opening and writing, I think that makes it more complex. How about this instead:
import os
mypath = r'\path\to\folder'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = fpath + '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace('"', '')) # Remove each " and write the line
Let me know if this works, and respond with any error messages you may have.
I found the solution to this based on the original answer provided by u/Jeff. It was actually smart quotes (u'\u201d') to be exact, not straight quotes. That is why I could get nothing to work. That is a great way to spend like two days, now if you'll excuse me I have to go jump off the roof. But for posterity, here is what I used that worked. (And note - there is the left curving smart quote as well - that is u'\u201c'.
mypath = 'C:\\csv\\'
myoutputpath = 'C:\\csv\\output\\'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = os.path.join(myoutputpath, file) #+ '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile:
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace(u'\u201d', ''))# Remove each " and write the line
infile.close()
outfile.close()
I have a very large file ~40GB and 674,877,098 lines I want to read and extract specific columns from. I can get about 3GB of data transferred then I get the following error.
Traceback (most recent call last):
File "C:\Users\Codes\Read_cat_write.py", line 44, in <module>
tid = int(columns[2])
IndexError: list index out of range
Sample of data that is being read in.
1,100000000,100000000,39,2.704006988169216e15,310057,0
2,100000001,100000000,38,2.650346740514816e15,303904,0.01
3,100000002,100000000,37,2.136985003098112e15,245039,0.03
4,100000003,100000000,36,2.29479163101184e15,263134,0.05
5,100000004,100000000,35,1.834645477916672e15,210371,0.06
6,100000005,100000000,34,1.814063860416512e15,208011,0.08
7,100000006,100000000,33,1.808883592986624e15,207417,0.1
8,100000007,100000000,32,1.806241248575488e15,207114,0.12
9,100000008,100000000,31,1.651783621410816e15,189403,0.14
10,100000009,100000000,30,1.634821184946176e15,187458,0.16
Code
from itertools import islice
F = r'C:\Users\Outfiles\comp_cat_raw.txt'
w = open(r'C:\Users\Outfiles\comp_cat_3col.txt','a')
def filesave(TID,M,R):
X = str(TID)
Y = str(M)
Z = str(R)
w.write(X)
w.write('\t')
w.write(Y)
w.write('\t')
w.write(Z)
w.write('\n')
N = 680000000
f = open(F) #Opens file
f.readline() # Strips Header
nlines = islice(f, N) #slices file to only read N lines
for line in nlines:
if line !='':
line = line.strip()
line = line.replace(',',' ') # Replace comma with space
columns = line.split() # Splits into column
tid = int(columns[2])
m = float(columns[4])
r = float(columns[6])
filesave(tid,m,r)
w.close()
I have looked at the file being read in at the point where the error occurs, but I don't see anything wrong with the file so I am at a loss as to the cause of this error.
Chances are, there is some line with maybe one single comma in there, or none, or an empty line, whatever. Probably just put a try-except statement around the statement and catch the index error, probably printing out the line in question, and you should be done. Besides that, there are some things in your code, that might be worth to improve.
Have a look at the csv module especially. It has some optimized C-code exactly for what you want to do, so it should be much faster. This answer shows mainly how to write the iteration with csv.
This whole slice construction seems to be superfluous. A simple for line in f: will do and is the most efficient way to handle this iteration.
Use line.split(',') directly, instead of replacing them first with spaces.
Use with open(F) as f: instead of calling close yourself. For this script it might make no difference, but this way you make sure, that you e.g. don't create open file handles in case of errors.