Pulls last 24 hours of logs and clean them python 3.x

Pulls last 24 hours of logs and clean them python 3.x - python-3.x

I have three file: 2 .gz files and 1 .log file. These files are pretty big. Below I have a sample copy of my original data. I want to extract the entries that correspond to the last 24 hours.
a.log.1.gz
2018/03/25-00:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/25-10:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/25-20:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/25-24:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/26-00:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/26-10:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/26-15:08:51.066968 1 7FE9BDC91700 std:ZMD:
a.log.2.gz
2018/03/26-20:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/26-24:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
2018/03/27-00:08:50.486601 1.5M 7FE9D3D41706 qojfcmqcacaeia
2018/03/27-10:08:50.980519 16K 7FE9BD1AF707 user: number is 93823004
2018/03/27-20:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/27-24:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/28-00:08:51.066968 1 7FE9BDC91700 std:ZMD:
a.log
2018/03/28-10:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/28-20:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
I am getting the below result but it is not cleaned.
result.txt
2018/03/27-20:08:50.981908 1389 7FE9BDC2B707 user 7fb31ecfa700
2018/03/27-24:08:51.066967 0 7FE9BDC91700 Exit Status = 0x0
2018/03/28-00:08:51.066968 1 7FE9BDC91700 std:ZMD:
2018/03/28-10:08:48.638553 508 7FF4A8F3D704 snononsonfvnosnovoosr
2018/03/28-20:08:48.985053 346K 7FE9D2D51706 ahelooa afoaona woom
Below code pulls the last 24 hours of line.
from datetime import datetime, timedelta
import glob
import gzip
from pathlib import Path
import shutil
def open_file(path):
if Path(path).suffix == '.gz':
return gzip.open(path, mode='rt', encoding='utf-8')
else:
return open(path, encoding='utf-8')
def parsed_entries(lines):
for line in lines:
yield line.split(' ', maxsplit=1)
def earlier():
return (datetime.now() - timedelta(hours=24)).strftime('%Y/%m/%d-%H:%M:%S')
def get_files():
return ['a.log'] + list(reversed(sorted(glob.glob('a.log.*'))))
output = open('output.log', 'w', encoding='utf-8')
files = get_files()
cutoff = earlier()
for i, path in enumerate(files):
with open_file(path) as f:
lines = parsed_entries(f)
# Assumes that your files are not empty
date, line = next(lines)
if cutoff <= date:
# Skip files that can just be appended to the output later
continue
for date, line in lines:
if cutoff <= date:
# We've reached the first entry of our file that should be
# included
output.write(line)
break
# Copies from the current position to the end of the file
shutil.copyfileobj(f, output)
break
else:
# In case ALL the files are within the last 24 hours
i = len(files)
for path in reversed(files[:i]):
with open_file(path) as f:
# Assumes that your files have trailing newlines.
shutil.copyfileobj(f, output)
# Cleanup, it would get closed anyway when garbage collected or process exits.
output.close()
I want to use below function to clean the lines:
def _clean_logs(line):
# noinspection SpellCheckingInspection
lemmatizer = WordNetLemmatizer()
clean_line = clean_line.strip()
clean_line = clean_line.lstrip('0123456789.- ')
cleaned_log = " ".join(
[lemmatizer.lemmatize(word) for word in nltk.word_tokenize(clean_line)])
cleaned_log = cleaned_log.replace('"', ' ')
return cleaned_log
Now, I want to use the able clean function which will clean the dirty data. I am not sure how to use it while pulling the last 24 hours. I wanna make it memory efficient as well as fast.

To answer the "make it memory efficient" part of your question, you can use the re module in stead of replace:
for line in lines:
line = re.sub('[T:-]', '', line)
This will reduce the complexity of your code and give better performance

Related

How to convert the 50000 txt file into csv

I have many text files. I tried to convert the txt files into a single CSV file, but it is taking a huge time. I put the code on run mode at night and I slept, it processed only 4500 files, but still morning it is running.
There is any way to fast way to convert the text files into csv?
Here is my code:
import pandas as pd
import os
import glob
from tqdm import tqdm
# create empty dataframe
csvout = pd.DataFrame(columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"])
# get list of files
file_list = glob.glob(os.path.join(os.getcwd(), "train/", "*.txt"))
for filename in tqdm(file_list):
# next file/record
mydict = {}
with open(filename) as datafile:
# read each line and split on " " space
for line in tqdm(datafile):
# Note: partition result in 3 string parts, "key", " ", "value"
# array slice third parameter [::2] means steps=+2
# so only take 1st and 3rd item
name, var = line.partition(" ")[::2]
mydict[name.strip()] = var.strip()
# put dictionary in dataframe
csvout = csvout.append(mydict, ignore_index=True)
# write to csv
csvout.to_csv("train.csv", sep=";", index=False)
Here is my example text file.
ID 0xb379
Delivery_person_ID BANGRES18DEL02
Delivery_person_Age 34.000000
Delivery_person_Ratings 4.500000
Restaurant_latitude 12.913041
Restaurant_longitude 77.683237
Delivery_location_latitude 13.043041
Delivery_location_longitude 77.813237
Order_Date 25-03-2022
Time_Orderd 19:45
Time_Order_picked 19:50
Weather conditions Stormy
Road_traffic_density Jam
Vehicle_condition 2
Type_of_order Snack
Type_of_vehicle scooter
multiple_deliveries 1.000000
Festival No
City Metropolitian
Time_taken (min) 33.000000

CSV is a very simple data format for which you don't need any sophisticated tools to handle. Just text and separators.
In your hopefully simple case there is no need to use pandas and dictionaries.
Except your datafiles are corrupt missing some columns or having some additional columns to skip. But even in this case you can handle such issues better within your own code so you have more control over it and are able to get results within seconds.
Assuming your datafiles are not corrupt having all columns in the right order with no missing columns or having additional ones (so you can rely on their proper formatting), just try this code:
from time import perf_counter as T
sT = T()
filesProcessed = 0
columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"]
import glob, os
file_list = glob.glob(os.path.join(os.getcwd(), "train/", "*.txt"))
csv_lines = []
csv_line_counter = 0
for filename in file_list:
filesProcessed += 1
with open(filename) as datafile:
csv_line = ""
for line in datafile.read().splitlines():
# print(line)
var = line.partition(" ")[-1]
csv_line += var.strip() + ';'
csv_lines.append(str(csv_line_counter)+';'+csv_line[:-1])
csv_line_counter += 1
with open("train.csv", "w") as csvfile:
csvfile.write(';'+';'.join(columns)+'\n')
csvfile.write('\n'.join(csv_lines))
eT = T()
print(f'> {filesProcessed=}, {(eT-sT)=:8.6f}')
I guess you will get the result in a speed beyond your expectations (in seconds, not minutes or hours)
On my computer, estimating from processing time of 100 files the time required for 50.000 files will be about 3 seconds.

I could not replicate. I took the example data file and created 5000 copies of it. Then I ran your code using tqdm and without. The below shows without:
import time
import csv
import os
import glob
import pandas as pd
from tqdm import tqdm
csvout = pd.DataFrame(columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"])
file_list = glob.glob(os.path.join(os.getcwd(), "sample_files/", "*.txt"))
t1 = time.time()
for filename in file_list:
# next file/record
mydict = {}
with open(filename) as datafile:
# read each line and split on " " space
for line in datafile:
# Note: partition result in 3 string parts, "key", " ", "value"
# array slice third parameter [::2] means steps=+2
# so only take 1st and 3rd item
name, var = line.partition(" ")[::2]
mydict[name.strip()] = var.strip()
# put dictionary in dataframe
csvout = csvout.append(mydict, ignore_index=True)
# write to csv
csvout.to_csv("train.csv", sep=";", index=False)
t2 = time.time()
print(t2-t1)
The times I got where:
tqdm 33 seconds
no tqdm 34 seconds
Then I ran using the csv module:
t1 = time.time()
with open('output.csv', 'a', newline='') as csv_file:
columns =["ID","Delivery_person_ID" ,"Delivery_person_Age" ,"Delivery_person_Ratings","Restaurant_latitude","Restaurant_longitude","Delivery_location_latitude","Delivery_location_longitude","Order_Date","Time_Orderd","Time_Order_picked","Weather conditions","Road_traffic_density","Vehicle_condition","Type_of_order","Type_of_vehicle", "multiple_deliveries","Festival","City","Time_taken (min)"]
mydict = {}
d_Writer = csv.DictWriter(csv_file, fieldnames=columns, delimiter=',')
d_Writer.writeheader()
for filename in file_list:
with open(filename) as datafile:
for line in datafile:
name, var = line.partition(" ")[::2]
mydict[name.strip()] = var.strip()
d_Writer.writerow(mydict)
t2 = time.time()
print(t2-t1)
The time for this was:
csv 0.32231569290161133 seconds.

Try it like this.
import glob
with open('my_file.csv', 'a') as csv_file:
for path in glob.glob('./*.txt'):
with open(path) as txt_file:
txt = txt_file.read() + '\n'
csv_file.write(txt)

how to speed up looping over 4GB tab delimited text file

It took me over 3 minutes to loop over a 4gb text file, counting the number of lines, number of words and chars per line as I go. Is there a faster way to do this?
This is my code:
import time
import csv
import sys
csv.field_size_limit(sys.maxsize)
i=0
countwords={}
countchars={}
start=time.time()
with open("filename.txt", "r", encoding="utf-8") as file:
for line in csv.reader(file, delimiter="\t"):
i+=1
countwords[i]=len(str(line).split())
countchars[i]=len(str(line))
if i%10000==0:
print(i)
end=time.time()
if i>0:
print(i)
print(sum(countwords.values())/i)
print(sum(countchars.values())/i)
print(end-start)

From my limited tested (on a unix dictionary) I get only a minor speedup using numpy, but any win is a win. I'm not sure if using csvreader is a good way of parsing out tabbed delimited text, but I have not checked whether this gives a more optimal speed.
import time
import numpy
# Holds count of words and letters per line of input
countwords = numpy.array( [] )
countchars = numpy.array( [] )
# Holds total count of words and letters per file
word_sum = 0
char_sum = 0
start = time.time()
file_in = open( "filename.txt", "rt", encoding="utf-8" )
for line in file_in:
# cleanup the line, split it into fields by TAB character
line = line.strip()
fields = line.split( '\t' )
# Count the fields, and the letters of each field's content
field_count = len( fields )
char_count = len( line ) - field_count # don't count the '\t' chars too
# keep a separate count of the fields and letters by line
numpy.append( countwords, field_count )
numpy.append( countchars, char_count )
# Keep a running total to save summation at the end
word_sum += field_count
char_sum += char_count
file_in.close()
end = time.time()
print("Total Words: %3d" % ( word_sum ) )
print("Total Letters: %3d" % ( char_sum ) )
print("Elapsed Time: %.2f" % ( end-start ) )

You can avoid allocating extra data, and use lists instead of dictionaries:
import time
import csv
import sys
csv.field_size_limit(sys.maxsize)
countwords=0
countchars=0
start=time.time()
with open("filename.txt", "r", encoding="utf-8") as file:
for i, line in enumerate(csv.reader(file, delimiter="\t")):
words = str(line).split() #we allocate just 1 extra string
wordsLen = len(words)
countwords += wordsLen
# for avoiding posible allocation we iterate throug the chars of the words
# we already have, then we need to add the spaces in between which is
# wordsLen - 1
countchars += len(itertools.chain.from_iterable(words)) + wordsLen - 1)
if i%10000==0:
print(i)
end=time.time()
if i>0:
print(i)
print(countwords/i)
print(countchars/i)
print(end-start)

I managed to write another version of a speedy code (using an idea I saw in a different thread), but it currently has a disadvantage compared to Kingsley's code using numpy, because it does not save data per line, but only aggregate data. In any case, here it is:
import time
start=time.time()
f = open("filename.txt", 'rb')
lines = 0
charcount=0
wordcount=0
#i=10000
buf_size = 1024 * 1024
read_f = f.raw.read
buf = read_f(buf_size)
while buf:
lines += buf.count(b'\t')
'''while lines/i>1:
print(i)
i+=10000'''
charcount+=len((buf.strip()))
wordcount+=len((buf.strip()).split())
buf = read_f(buf_size)
end=time.time()
print(end-start)
print(lines)
print(charcount/lines)
print(wordcount/lines)

unable to read the .WAV file using python

I am trying to read a FM signal which is recorded as WAV file using GNU radio Companion, using python. I am attaching the .grc file used.
I can clearly hear the recorded signal but reading the data gives a null ([] ) value.
The python code
import soundfile as sf
data, fs = sf.read('/home/fm_record_RSM_10_01_2019_dat.wav')
for i in data:
print(i)
this gives
data
array([], dtype=float64)
fs
96000
When the following code is used,
import wave
input_wave_file= wave.open('/home/fm_record_RSM_10_01_2019_dat.wav', 'r')
nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
another error is raised as given below
Error Traceback (most recent call last)
<ipython-input-3-5009fe3506e7> in <module>()
1 import wave
2
----> 3 input_wave_file= wave.open('/home/fm_record_RSM_10_01_2019_dat.wav', 'r')
4 nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
5 frame_data = input_wave_file.readframes(5)
~/anaconda3/lib/python3.7/wave.py in open(f, mode)
508 mode = 'rb'
509 if mode in ('r', 'rb'):
--> 510 return Wave_read(f)
511 elif mode in ('w', 'wb'):
512 return Wave_write(f)
~/anaconda3/lib/python3.7/wave.py in __init__(self, f)
162 # else, assume it is an open file object already
163 try:
--> 164 self.initfp(f)
165 except:
166 if self._i_opened_the_file:
~/anaconda3/lib/python3.7/wave.py in initfp(self, file)
131 raise Error('file does not start with RIFF id')
132 if self._file.read(4) != b'WAVE':
--> 133 raise Error('not a WAVE file')
134 self._fmt_chunk_read = 0
135 self._data_chunk = None
Error: not a WAVE file
Could someone help me to find what the problem could be? Is it because of any mistake in the setting of record wav block in .grc file or any mistake in python file? Kindly help
Thanks a lot
Msr

#! /usr/bin/env python3
import soundfile as sf
import wave
import sys
if len(sys.argv) < 2:
print("Expected filename.wav on cmdline")
quit(1)
data, fs = sf.read(sys.argv[1])
for i in range(10):
print(i)
print('...')
input_wave_file= wave.open(sys.argv[1], 'r')
nc,sw,fr,nf,ct,cn = input_wave_file.getparams()
print('nc', nc)
print('sw', sw)
print('fr', fr)
print('nf', nf)
print('ct', ct)
print('cn', cn)
chunk = 1024
data = input_wave_file.readframes(chunk)
print('data[0:10] =', data[0:10])
print('data[0:10] =', end='')
for i in range(10):
print(data[i],' ',end='')
print('')
In a linux environment, I put the above into a file named playsound.py .
Then I executed (at the cmdline prompt)
$ chmod +x playsound.py
$ ./playsound.py file.wav
[ 0.06454468 0.05557251]
[ 0.06884766 0.05664062]
[ 0.0552063 0.06777954]
[ 0.04733276 0.0708313 ]
[ 0.05505371 0.065979 ]
[ 0.05358887 0.06677246]
[ 0.05621338 0.06045532]
[ 0.04891968 0.06298828]
[ 0.04986572 0.06817627]
[ 0.05410767 0.06661987]
...
nc 2
sw 2
fr 44100
nf 32768
ct NONE
cn not compressed
data[0:10] = b'C\x08\x1d\x07\xd0\x08#\x07\x11\x07'
data[0:10] =67 8 29 7 208 8 64 7 17 7
file.wav was some existing .wav file I had handy.
I previously tried
for i in data:
print(i)
as you had done, that worked also, but the output was too much.
I think you should check that the filename you are supplying points to a valid WAV file.
For instance, the path you list is "/home/filename.wav" .
Usually it will be at least "/home/username/filename.wav"

Python Download Latest CSV from FTP server [duplicate]

I am using ftplib to connect to an ftp site. I want to get the most recently uploaded file and download it. I am able to connect to the ftp server and list the files, I also have put them in a list and got the datefield converted. Is there any function/module which can get the recent date and output the whole line from the list?
#!/usr/bin/env python
import ftplib
import os
import socket
import sys
HOST = 'test'
def main():
try:
f = ftplib.FTP(HOST)
except (socket.error, socket.gaierror), e:
print 'cannot reach to %s' % HOST
return
print "Connect to ftp server"
try:
f.login('anonymous','al#ge.com')
except ftplib.error_perm:
print 'cannot login anonymously'
f.quit()
return
print "logged on to the ftp server"
data = []
f.dir(data.append)
for line in data:
datestr = ' '.join(line.split()[0:2])
orig-date = time.strptime(datestr, '%d-%m-%y %H:%M%p')
f.quit()
return
if __name__ == '__main__':
main()
RESOLVED:
data = []
f.dir(data.append)
datelist = []
filelist = []
for line in data:
col = line.split()
datestr = ' '.join(line.split()[0:2])
date = time.strptime(datestr, '%m-%d-%y %H:%M%p')
datelist.append(date)
filelist.append(col[3])
combo = zip(datelist,filelist)
who = dict(combo)
for key in sorted(who.iterkeys(), reverse=True):
print "%s: %s" % (key,who[key])
filename = who[key]
print "file to download is %s" % filename
try:
f.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
except ftplib.err_perm:
print "Error: cannot read file %s" % filename
os.unlink(filename)
else:
print "***Downloaded*** %s " % filename
return
f.quit()
return
One problem, is it possible to retrieve the first element from the dictionary? what I did here is that the for loop runs only once and exits thereby giving me the first sorted value which is fine, but I don't think it is a good practice to do it in this way..

For those looking for a full solution for finding the latest file in a folder:
MLSD
If your FTP server supports MLSD command, a solution is easy:
entries = list(ftp.mlsd())
entries.sort(key = lambda entry: entry[1]['modify'], reverse = True)
latest_name = entries[0][0]
print(latest_name)
LIST
If you need to rely on an obsolete LIST command, you have to parse a proprietary listing it returns.
Common *nix listing is like:
-rw-r--r-- 1 user group 4467 Mar 27 2018 file1.zip
-rw-r--r-- 1 user group 124529 Jun 18 15:31 file2.zip
With a listing like this, this code will do:
from dateutil import parser
# ...
lines = []
ftp.dir("", lines.append)
latest_time = None
latest_name = None
for line in lines:
tokens = line.split(maxsplit = 9)
time_str = tokens[5] + " " + tokens[6] + " " + tokens[7]
time = parser.parse(time_str)
if (latest_time is None) or (time > latest_time):
latest_name = tokens[8]
latest_time = time
print(latest_name)
This is a rather fragile approach.
MDTM
A more reliable, but a way less efficient, is to use MDTM command to retrieve timestamps of individual files/folders:
names = ftp.nlst()
latest_time = None
latest_name = None
for name in names:
time = ftp.voidcmd("MDTM " + name)
if (latest_time is None) or (time > latest_time):
latest_name = name
latest_time = time
print(latest_name)
For an alternative version of the code, see the answer by #Paulo.
Non-standard -t switch
Some FTP servers support a proprietary non-standard -t switch for NLST (or LIST) command.
lines = ftp.nlst("-t")
latest_name = lines[-1]
See How to get files in FTP folder sorted by modification time.
Downloading found file
No matter what approach you use, once you have the latest_name, you download it as any other file:
with open(latest_name, 'wb') as f:
ftp.retrbinary('RETR '+ latest_name, f.write)
See also
Get the latest FTP folder name in Python
How to get FTP file's modify time using Python ftplib

Why don't you use next dir option?
ftp.dir('-t',data.append)
With this option the file listing is time ordered from newest to oldest. Then just retrieve the first file in the list to download it.

With NLST, like shown in Martin Prikryl's response,
you should use sorted method:
ftp = FTP(host="127.0.0.1", user="u",passwd="p")
ftp.cwd("/data")
file_name = sorted(ftp.nlst(), key=lambda x: ftp.voidcmd(f"MDTM {x}"))[-1]

If you have all the dates in time.struct_time (strptime will give you this) in a list then all you have to do is sort the list.
Here's an example :
#!/usr/bin/python
import time
dates = [
"Jan 16 18:35 2012",
"Aug 16 21:14 2012",
"Dec 05 22:27 2012",
"Jan 22 19:42 2012",
"Jan 24 00:49 2012",
"Dec 15 22:41 2012",
"Dec 13 01:41 2012",
"Dec 24 01:23 2012",
"Jan 21 00:35 2012",
"Jan 16 18:35 2012",
]
def main():
datelist = []
for date in dates:
date = time.strptime(date, '%b %d %H:%M %Y')
datelist.append(date)
print datelist
datelist.sort()
print datelist
if __name__ == '__main__':
main()

I don't know how it's your ftp, but your example was not working for me. I changed some lines related to the date sorting part:
import sys
from ftplib import FTP
import os
import socket
import time
# Connects to the ftp
ftp = FTP(ftpHost)
ftp.login(yourUserName,yourPassword)
data = []
datelist = []
filelist = []
ftp.dir(data.append)
for line in data:
col = line.split()
datestr = ' '.join(line.split()[5:8])
date = time.strptime(datestr, '%b %d %H:%M')
datelist.append(date)
filelist.append(col[8])
combo = zip(datelist,filelist)
who = dict(combo)
for key in sorted(who.iterkeys(), reverse=True):
print "%s: %s" % (key,who[key])
filename = who[key]
print "file to download is %s" % filename
try:
ftp.retrbinary('RETR %s' % filename, open(filename, 'wb').write)
except ftplib.err_perm:
print "Error: cannot read file %s" % filename
os.unlink(filename)
else:
print "***Downloaded*** %s " % filename
ftp.quit()

How to calculate from a dictionary in python

import operator
with open("D://program.txt") as f:
Results = {}
for line in f:
part_one,part_two = line.split()
Results[part_one] = part_two
c=sum(int(Results[x]) for x in Results)
r=c/12
d=len(Results)
F=max(Results.items(), key=operator.itemgetter(1))[0]
u=min(Results.items(), key=operator.itemgetter(1))[0]
print ("Number of entries are",d)
print ("Student with HIGHEST mark is",F)
print ("Student with LOWEST mark is",u)
print ("Avarage mark is",r)
Results = [ (v,k) for k,v in Results.items() ]
Results.sort(reverse=True)
for v,k in Results:
print(k,v)
import sys
orig_stdout = sys.stdout
f = open('D://programssr.txt', 'w')
sys.stdout = f
print ('Number of entries are',d)
print ("Student with HIGHEST mark is",F)
print ("Student with LOWEST mark is",u)
print ("Avarage mark is",r)
for v,k in Results:
print(k,v)
sys.stdout = orig_stdout
f.close()
I want to read a txt file but problem is it cant compute the results i want to write in a new file because of the NAMES and MARKS in file.if you remove them it works fine.i want to make calculations without removing NAMES and MARKS in txt file..Help what i am i doing wrong
NAMES MARKS
Lux 95
Veron 70
Lesley 88
Sticks 80
Tipsey 40
Joe 62
Goms 18
Wesley 35
Villa 11
Dentist 72
Onty 50

Just consume the first line using next() function, before looping over it:
with open("D://program.txt") as f:
Results = {}
next(f)
for line in f:
part_one,part_two = line.split()
Results[part_one] = part_two
Note that file objects are iterator-like object (one shot iterable) and when you loop over them you consume the items and you have no access to them anymore.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Pulls last 24 hours of logs and clean them python 3.x - python-3.x

To answer the "make it memory efficient" part of your question, you can use the re module in stead of replace: for line in lines: line = re.sub('[T:-]', '', line) This will reduce the complexity of your code and give better performance

Related

How to convert the 50000 txt file into csv

how to speed up looping over 4GB tab delimited text file

unable to read the .WAV file using python

Python Download Latest CSV from FTP server [duplicate]

How to calculate from a dictionary in python

Categories

Resources