How to work with headers in log file? Python - python-3.x

I have log file (.txt) which I want to open and read line by line, convert the data and store with Pandas. However, it has a header with some useful information I want to grab. What is best practice when working with header sections? For example I need to grab the "CAN-bus adress" which is stored on the next row. The "CAN-bus adress" part will be the same for another file but the "460 (Machine)" will change. How do I effectively achieve that? If I run my code I get the error "TypeError: '_io.TextIOWrapper' object is not subscriptable"
Any guidance would be appreciated! I could write a nasty bit of code to get this data the next time through the loop with the help of a few if statements and Booleans but there must be a better way to do this.
Also, what is a good way to detect when the header is over and the data is starting? Just compare every line with "DateTime"?
log file:
Developer
Raw data extractor
Date Range to extract
From
12/18/2022
Until
02/01/2023
CAN-bus address
460 (Machine)
DateTime GPStime CAN-bus data
19 December 2022 07:20:53 1671430853 0162c0c1cafe0000
19 December 2022 07:20:53 1671430853 05000000003e3c00
...
Code:
with open(filePath) as openfileobject: #Open the file
for row, line in enumerate(openfileobject): #Read file line by line
if line.lower() == 'CAN-bus address'.lower(): #identify the CAN message ID
print(line)
print(openfileobject[row+1])
I have tried consecutive if statments and Boolean variables to keep track on if we have found the correct row or not. It gets messy.

I hope it may help.
skip = 0
can_addr = []
# open text file
with open("temp.txt", "r") as f:
for line in f:
# skip space, enter lines
line = line.strip()
if "Developer" in line:
# skip the line starting form Developer
skip = 1
continue
if "DateTime" in line:
# skip till the line that has "DateTime" and skip this line
skip = 0
continue
if skip == 1:
continue
# At this stage, if you print(line), you will get data starting from under the line of header
# adding address into array
can_addr.append(line[28:38])
print(can_addr)

Related

Deleting a particular column/row from a CSV file using python

I want to delete a particular row from a given user input that matches with a column.
Let's say I get an employee ID and delete all it's corresponding values in the row.
Not sure how to approach this problem and other sources suggest using a temporary csv file to copy all values and re-iterate.
Since these are very primitive requirements, I would just do it manually.
Read it line by line - if you want to delete the current line, just don't write it back.
If you want to delete a column, for each line, parse it as csv (using the module csv - do not use .split(',')!) and discard the correct column.
The upside of these solutions is that it's very light on the memory and as fast as it can be runtime-wise.
That's pretty much the way to do it.
Something like:
import shutil
file_path = "test.csv"
# Creates a test file
data = ["Employee ID,Data1,Data2",
"111,Something,Something",
"222,Something,Something",
"333,Something,Something"]
with open(file_path, 'w') as write_file:
for item in data:
write_file.write(item + "\n")
# /Creates a test file
input("Look at the test.csv file if you like, close it, then press enter.")
employee_ID = "222"
with open(file_path) as read_file:
with open("temp_file.csv", 'w') as temp_file:
for line in read_file:
if employee_ID in line:
next(read_file)
temp_file.write(line)
shutil.move("temp_file.csv", file_path)
If you have other data that may match the employee ID, then you'll have to parse the line and check the employee ID column specifically.

Read some specific lines from a big file in python

I want to read some specific lines from a large text file where line numbers are in a list, for example:
list_Of_line =[3991, 3992, ...]. I want to check whether there is the string "this city" in line number 3991, 3992,... or not. I want to directly access those lines. How can I do this in python?
Text_File is like below
Line_No
......................
3990 It is a big city.
3991 I live in this city.
3992 I love this city.
.......................
There is no way to "directly access" a specific number of line of a file outright, since lines can start at any position and can be of any lengths. The only way to know where each line is in a file is therefore by reading every character of the file to locate each newline character.
Understanding that, you can iterate through a given file object to build a list of file positions of the end of each line by calling the tell method of the file object, so that you can then "directly access" any line number you want with the seek method of the file object to read the specific line:
list_of_lines = [3991, 3992]
with open('file.txt') as file:
positions = [0, *(file.tell() for line in file)]
for line_number in list_of_lines:
file.seek(positions[line_number - 1])
if 'this city' in next(file):
print(f"'this city' found in line #{line_number}")

Python: How can I read in a file and search for the line that has the string that indicates the data I need to extract?

I am fairly new to Python, and I have been given an assignment in my research group to extract the important data from an output file. The output file is very large, containing data split into sections. Each section is headed with a title in all capital letters, such as "SURFACE TEMPERATURE," and the following 100-600 lines all contain relevant data. Essentially, I need to read in the file and search for the line that has the string that indicates the data. The number of rows for each data set is fixed, but the location in the text file is not. I then need to save the desired data to a different list. Any help or direction would be appreciated.
I have a decent idea about how to open and read the file in python, but I am at a loss when trying to figure out how to search for the section of data and save it to a new list/array.
This is how I understand your data file to be structured:
TEST1
asdf
asdf
asdf
TEST2
asdf
asdf
asdf
DATA WE WANT
xxxx
xxxx
xxxx
To parse this, we would do the following:
# opening the datafile like this is a best practice
with open("tfile.txt") as infile:
data = infile.readlines()
# clean up the data
data = [x.strip() for x in data]
# set up the list we'll store the data in
data_list = []
# loop through the data
saving_data = False
for item in data:
if item == "DATA WE WANT":
# check if we're at the right header
print("Data found")
saving_data = True
continue
elif item == "":
# check to see if line is empty
saving_data = False
continue
elif item == item.upper():
# check to see if the current item is a header
print("Header:", item)
saving_data = False
continue
elif saving_data:
data_list.append(item)
print(data_list)
It's important to check everything before saving the data, as with a file as large as yours it can be hard to tell if you are successful or not.

Python 3 going through a file until EOF. File is not just a set of similar lines needing processing

The answers to questions of the type "How do I do "while not eof(file)""
do not quite cover my issue
I have a file with a format like
header block
data
another header block
more data (with arbitrary number of data lines in each data block)
...
I do not know how many header-data sets there are
I have successfully read the first block, then a set of data using loops that look for the blank line at the end of the data block.
I can't just use the "for each line in openfile" type approach as I need to read the header-data blocks one at a time and then process them.
How can I detect the last header-data block.
My current approach is to use a try except construction and wait for the exception. Not terribly elegant.
It's hard to answer without seeing any of your code...
But my guess is that you are reading the file with fp.read():
fp = open("a.txt")
while True:
data = fp.read()
Instead:
try to pass always the length of data you spected
Check if the read chunck is a empty string, not None
For example:
fp = open("a.txt")
while True:
header = fp.read(headerSize)
if header is '':
# End of file
break
read_dataSize_from_header
data = fp.read(dataSize)
if data is '':
# Error reading file
raise FileError('Error reading file')
process_your_data(data)
This is some time later but I post this for others who do this search.
The following script, suitably adjusted, will read a file and deliver lines until the EOF.
"""
Script to read a file until the EOF
"""
def get_all_lines(the_file):
for line in the_file:
if line.endswith('\n'):
line = line[:-1]
yield line
line_counter = 1
data_in = open('OAall.txt')
for line in get_all_lines(data_in):
print(line)
print(line_counter)
line_counter += 1
data_in.close()

IndexError: list index out of range, but list length OK

New to programming, looking for a deeper understanding on whats happening.
Goal: open a file and print the first 10 lines. (similar to head command)
Code:
with open('file') as f:
for i in range(0,10):
print([line.strip('\n') for line in f][i])
Result: prints first line fine, then returns the out of range error
File: Is a simple text file with 20 lines, no more than 50 chars per line
FYI - Removed range line and printed both type(list) and length(20). Printed specific indexes without issue (unless >1 in a row)
Able to get the desired result with different code, but trying to improve using with/as
You can actually iterate over a file. Which is what you should be doing here.
with open('file') as f:
for i, line in enumerate(file, start=1):
# Get out of the loop if we hit 10 lines
if i >= 10:
break
# Line already has a '\n' at the end
print(line, end='')
The reason that your code is failing is because of your list comprehension:
[line.strip('\n') for line in f]
The first time through your loop that consumes all of the lines in your file. Now your file has no more lines, so the next time through it creates a list of all the lines in your file and tries to get the [1]st element. But that doesn't exist because there are no lines at the end of your file.
If you wanted to keep your code mostly as-is you could do
lines = [line.rstrip('\n') for line in f]
for i in range(10):
print(lines[i])
But that's also silly, because you could just do
lines = f.readlines()
But that's also silly if you just want up to the 10th line, because you could do this:
with open('file') as f:
print('\n'.join(f.readlines()[:10]))
Some further explanation:
The shortest and worst way you could fix your code is by adding one line of code:
with open('file') as f:
for i in range(0,10):
f.seek(0) # Add this line
print([line.strip('\n') for line in f][i])
Now your code will work - but this is a horrible way to get your code to work. The reason that your code isn't working the way you expect in the first place is that files are consumable iterators. That means that when you read from them eventually you run out of things to read. Here's a simple example:
import io
file = io.StringIO('''
This is is a file
It has some lines
okay, only three.
'''.strip())
for line in file:
print(file.tell(), repr(line))
This outputs
18 'This is is a file\n'
36 'It has some lines\n'
53 'okay, only three.'
Now if you try to read from the file:
print(file.read())
You'll see that it doesn't output anything. That's because you've "consumed" the file. I mean obviously it's still on disk, but the iterator has reached the end of the file. But as shown, you can seek in the file.
print(file.tell())
file.seek(0)
print(file.tell())
print(file.read())
And you'll see your entire file printed. But what about those other positions?
file.seek(36)
print(file.read()) # => okay, only three.
As a side note, you can also specify how much to read:
file.seek(36)
print(file.read(4)) # => okay
print(file.tell()) # => 40
So when we read from a file or iterate over it we consume the iterator and get to the end of the file. Let's put your new tools to work and go back to your original code and explore what's happening.
with open('file') as f:
print(f.tell())
lines = [line.rstrip('\n') for line in f]
print(f.tell())
print(len([line for line in f]))
print(lines)
You'll see that you're at a different location in the file. And the second list comprehension produces an empty list. That's because when a list comprehension is evaluated it executes immediately. So when you do this:
for i in range(10):
print([line.strip('\n') for line in f][i])
What you're doing the first time, i = 0 and then the list comprehension reads to the end of the file. Now it takes the [0]th element of the list, or the first line in the file. But your file iterator is at the end of the file.
So now we get back to the beginning of the list and i = 1. Now we iterate to the end of the file, but we're already at the end so there are no lines to read, and we've got an empty list [] that we try to get the [0]th element of. But there's nothing there. So we get an IndexError.
List comprehensions can be useful, but when you're beginning it's usually much easier to write a for loop and then turn it into a list comprehension. So you might write something like this:
with open('file') as f:
for i, line in enumerate(file, start=10):
if i < 10:
print(line.rstrip())
Now, we shouldn't print inside a list comprehension, so instead we'll collect everything. We start out by putting what we want:
[line.rstrip()
Now add the for bit:
[line.rstrip() for i, line in enumerate(f)
And finally add the filter and our closing brace:
[line.rstrip() for i, line in enumerate(f) if i < 10]
For more on list comprehensions, this is a fantastic resource: http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/

Resources