CSV reader with .txt file [duplicate] - python-3.x

I use python and I don't know how to do.
I want to read lots of lines in files. But I have to read from second lines. All files have different lines, So I don't know how to do.
Code example is that it read from first line to 16th lines.
But I have to read files from second lines to the end of lines.
Thank you!:)
with open('filename') as fin:
for line in islice(fin, 1, 16):
print line

You should be able to call next and discard the first line:
with open('filename') as fin:
next(fin) # cast into oblivion
for line in fin:
... # do something
This is simple and easy because of the nature of fin, being a generator.

with open("filename", "rb") as fin:
print(fin.readlines()[1:])

Looking at the documentation for islice
itertools.islice(iterable, stop)
itertools.islice(iterable, start, stop[, step])
Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line).
I think you can just tell it to start at the second line and iterate until the end. e.g.
with open('filename') as fin:
for line in islice(fin, 2, None): # <--- change 1 to 2 and 16 to None
print line

Related

Python efficient way to search for a pattern in text file

I need to find a pattern in a text file, which isn't big.
Therefore loading the entire file into RAM isn't a concern for me - as advised here:
I tried to do it in two ways:
with open(inputFile, 'r') as file:
for line in file.readlines():
for date in dateList:
if re.search('{} \d* 1'.format(date), line):
OR
with open(inputFile, 'r') as file:
contents = file.read()
for date in dateList:
if re.search('{} \d* 1'.format(date), contents):
The second one proved to be much faster.
Is there an explanation for this, other than the fact that I am using one less loop with the second approach?
As pointed out in the comments, the two codes are not equivalent as the second one only look for the first match in the whole file. Besides this, the first is also more expensive because the (relatively expensive) format over all dates is called for each line. Storing the regexp and precompiling them should help a lot. Even better: you can generate a regexp to match all the dates at once using something like:
regexp = '({}) \d* 1'.format('|'.join('{}'.format(date) for date in dateList))
with open(inputFile, 'r') as file:
contents = file.read()
# Search the first matching date existing in dateList
if re.search(regexp, contents):
Note that you can use findall if you want all of them.

Output data from subprocess command line by line

I am trying to read a large data file (= millions of rows, in a very specific format) using a pre-built (in C) routine. I want to then yeild the results of this, line by line, via a generator function.
I can read the file OK, but where as just running:
<command> <filename>
directly in linux will print the results line by line as it finds them, I've had no luck trying to replicate this within my generator function. It seems to output the entire lot as a single string that I need to split on newline, and of course then everything needs reading before I can yield line 1.
This code will read the file, no problem:
import subprocess
import config
file_cmd = '<command> <filename>'
for rec in (subprocess.check_output([file_cmd], shell=True).decode(config.ENCODING).split('\n')):
yield rec
(ENCODING is set in config.py to iso-8859-1 - it's a Swedish site)
The code I have works, in that it gives me the data, but in doing so, it tries to hold the whole lot in memory. I have larger files than this to process which are likely to blow the available memory, so this isn't an option.
I've played around with bufsize on Popen, but not had any success (and also, I can't decode or split after the Popen, though I guess the fact I need to split right now is actually my problem!).
I think I have this working now, so will answer my own question in the event somebody else is looking for this later ...
proc = subprocess.Popen(shlex.split(file_cmd), stdout=subprocess.PIPE)
while True:
output = proc.stdout.readline()
if output == b'' and proc.poll() is not None:
break
if output:
yield output.decode(config.ENCODING).strip()

Below python code running fine at first, but 2nd time getting loop since i have used same directory, where i need to change to avoid infinite loop?

My code running first fist time correctly since new.txt was not there. but second time getting infinite loop.
for file in files: # here getting loop
try:
for line in open(file):
line=line.strip()
print(line)
with open (os.path.join(path,new),"a") as file: # creating new file and strong the exist file data's
file.write(line)
file.write("\n")
file.close()
Finally I can understand what the problem. When you open a 'new.txt' file you iterate it's lines, but on each iteration you add new line to it, so the lines number becomes "infinite" and you get infinite loop.

Python maths quiz alphabetical sorting

I'm struggling to get my code to work on python it is meant to print the data in alphabetical order but doesn't can anyone help?
if choice.lower() == 'az':
dictionary={}
fi = open("class1.txt",'r')
data = fi.readlines()
for line in sorted(data):
print(data.rstrip());
You don't need to use readlines here, sorted is able to iterate over the file just fine. You should be printing line.strip() instead of data.strip()
with open("class1.txt",'r') as fi:
for line in sorted(fi):
print(line.rstrip())
I've also shown you how to use a context manager here (the with line). This causes the file to be closed automatically at the end of the block
You're applying rstrip wrong:
print(line.rstrip());
should do the trick.

Checking/Writing lines to a .txt file using Python

I'm new both to this site and python, so go easy on me. Using Python 3.3
I'm making a hangman-esque game, and all is working bar one aspect. I want to check whether a string is in a .txt file, and if not, write it on a new line at the end of the .txt file. Currently, I can write to the text file on a new line, but if the string already exists, it still writes to the text file, my code is below:
Note that my text file has each string on a seperate line
write = 1
if over == 1:
print("I Win")
wordlibrary = file('allwords.txt')
for line in wordlibrary:
if trial in line:
write = 0
if write == 1:
with open("allwords.txt", "a") as text_file:
text_file.write("\n")
text_file.write(trial)
Is this really the indentation from your program?
As written above, in the first iteration of the loop on wordlibrary,
the trial is compared to the line, and since (from your symptoms) it is not contained in the first line, the program moves on to the next part of the loop: since write==1, it will append trial to the text_file.
cheers,
Amnon
You dont need to know the number of lines present in the file beforehand. Just use a file iterator. You can find the documentation here : http://docs.python.org/2/library/stdtypes.html#bltin-file-objects
Pay special attention to the readlines method.

Resources