I have a directory containing several txt.files. I am interested only in the first 7 characters of each line. I created a dictionary that takes as keys the name of the files and as values the first 7 character of each row in each file. Then, I made 2 for loops like below:
files = glob.glob('/Users/andreasportelli/Desktop/sqldata/*')
customer = {"customer_id":[],"order_id":[]} #create a dictionary
for file in files:
x = open(file, 'r', encoding='ascii', errors='replace')
customer["customer_id"].append(str(file))
for file in files:
x = open(file, 'r', encoding='ascii', errors='replace')
customer["order_id"].append(x.readline(7))
The problem is that in this way it only reads the first line and not all lines.
How do I make it iterate on each line in each file?
thanks
You only need to loop the file names once, and then a nested loop to grab each line in the file. Hope that helps!
files = glob.glob('/Users/andreasportelli/Desktop/sqldata/*')
customer = {"customer_id":[],"order_id":[]} #create a dictionary
for file in files:
customer["customer_id"].append(file)
for line in open(file, 'r', encoding='ascii', errors='replace'):
customer["order_id"].append(line[:7])
Related
some_file.txt: (berore)
one
two
three
four
five
...
How can I effectively modify large file in Python?
with open("some_file.txt", "r+") as file:
for idx, line in enumerate(file.readlines()):
file.writeline(f'{idx} {line}') # something like this
some_file.txt: (after)
1 one
2 two
3 three
4 four
5 five
...
Don't try to load your entire file in memory, because the file may be too large for that. Instead, read line by line:
with open('input.txt') as inp, open('output.txt', 'w') as out:
idx = 1
for line in inp:
out.write(f'{idx} {line}'
idx += 1
You can't insert into the middle of a file without re-writing it. This is an operating system thing, not a Python thing.
Use pathlib for path manipulation. Rename the original file. Then copy it to a new file, adding the line numbers as you go. Keep the old file until you verify the new file is correct.
Open files are iterable, so you can use enumerate() on them directly without having to use readlines() first. The second argument to enumerate() is the number to start the count with. So the loop below will number the lines starting with 1.
from pathlib import Path
target = Path("some_file.txt")
# rename the file with ".old" suffix
original = target.rename(target.with_suffix(".old"))
with original.open("r") as source, target.open("w") as sink:
for line_no, line in enumerate(source, 1):
sink.writeline(f'{line_no} {line}')
Hello I am very new to coding, I am writing small python script but I am stuck. The goal is to compare the log.txt contents to the contents of the LargeFile.txt and every line of the log.txt that is not matching to any line of the LargeFile.txt to be stored in the outfile.txt but with the code below I only get the First line of the log.txt to repeat itself in the outfile.txt
logfile = open('log1.txt', 'r') # This file is 8KB
keywordlist = open('LargeFile.txt', 'r') # This file is 1,4GB
outfile = open('outfile.txt', 'w')
loglines = [n for n in logfile]
keywords = [n for n in keywordlist]
for line in loglines:
for word in keywords:
if line not in word:
outfile.write(line)
outfile.close()
So conceptually you're trying to check whether any line of your 1+ GB file occurs in your 8 KB file.
This means one of the files needs to be loaded into RAM, and the smaller file is the natural choice. The other file can be read sequentially and does not need to be loaded in full.
We need
a list of lines from the smaller file
an index of those lines for quick look-ups (we'll use a dict for this)
a loop that runs through the large file and checks each line against the index, making note of every matching line it finds
a loop that outputs the original lines and uses the index to determine whether they are unique or not.
The sample below prints the complete output to the console. Write it to a file as needed.
with open('log1.txt', 'r') as f:
log_lines = list(f)
index = {line: [] for line in log_lines}
with open('LargeFile.txt', 'r') as f:
for line_num, line in enumerate(f, 1):
if line in index:
index[line].append(line_num)
for line in log_lines:
if len(index[line]) == 0:
print(f'{line} -> unique')
else:
print(f'{line} -> found {len(index[line])}x')
I have folder contain multiple .txt files ,the names of files(one.txt,two.txt,three.txt,...) I need to read the one.txt and then write the content of this file in list has name onefile[], then read two.txt and write the content in list twofile[] and so on. how can do this?
Update! Iam try this code, now how can print the values in each list ?
def writeinlist(file_path,i):
multilist = {}
output = open(file_path,'r')
globals()['List%s' % i] = output
print('List%s' % i)
input_path = Path(Path.home(), "Desktop", "NN")
index=1
for root, dirs, files in os.walk(input_path):
for file in files:
file_path = Path(root, file)
writeinlist(file_path,index)
index+=1
Update2: how can delete \n from values?
value_list1 = files_dict['file1']
print('Values of file1 are:')
print(value_list1)
I used the following to create a dictionary with dynamic keys (with the names of the files) and the respective values being a list with elements the lines of the file.
First, contents of onefile.txt:
First file first line
First file second line
First file third line
Contents of twofile.txt:
Second file first line
Second file second line
My code:
import os
import pprint
files_dict = {}
for file in os.listdir("/path/to/folder"):
if file.endswith(".txt"):
key = file.split(".")[0]
full_filename = os.path.join("/path/to/folder", file)
with open(full_filename, "r") as f:
files_dict[key] = f.readlines()
pprint.pprint(files_dict)
Output:
{'onefile': ['First file first line\n',
'First file second line\n',
'First file third line'],
'twofile': ['Second file first line\n', 'Second file second line']}
Another way to do this that's a bit more Pythonic:
import os
import pprint
files_dict = {}
for file in [
f
for f in os.listdir("/Users/itroulli/Downloads/data_eng_challenge3/files")
if f.endswith(".txt")
]:
with open(os.path.join("/path/to/folder", file), "r") as fo:
files_dict[file.split(".")[0]] = fo.readlines()
pprint.pprint(files_dict)
I am a little bit confused in how to read all lines in many files where the file names have format from "datalog.txt.98" to "datalog.txt.120".
This is my code:
import json
file = "datalog.txt."
i = 97
for line in file:
i+=1
f = open (line + str (i),'r')
for row in f:
print (row)
Here, you will find an example of one line in one of those files:
I need really to your help
I suggest using a loop for opening multiple files with different formats.
To better understand this project I would recommend researching the following topics
for loops,
String manipulation,
Opening a file and reading its content,
List manipulation,
String parsing.
This is one of my favourite beginner guides.
To set the parameters of the integers at the end of the file name I would look into python for loops.
I think this is what you are trying to do
# create a list to store all your file content
files_content = []
# the prefix is of type string
filename_prefix = "datalog.txt."
# loop from 0 to 13
for i in range(0,14):
# make the filename variable with the prefix and
# the integer i which you need to convert to a string type
filename = filename_prefix + str(i)
# open the file read all the lines to a variable
with open(filename) as f:
content = f.readlines()
# append the file content to the files_content list
files_content.append(content)
To get rid of white space from file parsing add the missing line
content = [x.strip() for x in content]
files_content.append(content)
Here's an example of printing out files_content
for file in files_content:
print(file)
I have a folder with a couple thousand images named: 10000.jpg, 10001.jpg, etc.; and a csv file with two columns: id and name.
The csv id matches the images in the folder.
I need to rename the images as per the name column in the csv (e.g. from 10000.jpg to name1.jpg.
I've been trying the os.rename() inside a for loop as per below.
with open('train_labels.csv') as f:
lines = csv.reader(f)
for line in lines:
os.rename(line[0], line[1])
This gives me an encoding error inside the loop.
Any idea what I'm missing in the logic?
Also tried another strategy (below), but got the error: IndexError: list index out of range.
with open('train_labels.csv', 'rb') as csvfile:
lines = csv.reader(csvfile, delimiter = ' ', quotechar='|')
for line in lines:
os.rename(line[0], line[1])
I also got the same error. When i opened CSV file in notepad, I found that there was no comma between ID and name. So please check it. otherwise you can see the solutions in Renaming images in folder