Need assistance on creating for loop - python-3.x

I'm trying to write bytes from one file to a second file, then go back to the first file and delete the bytes written. I'm doing this one byte at a time (first byte essentially copied and written to 2nd file, then that byte is removed from the first file).
The problem I'm having is creating a for loop (assuming that's the best way to go about this) to make this happen. My current code is below:
in_file = open('file', "rb")
data = in_file.read()
length = len(in_file.read())
in_file.close()
out_file = open('file2', "wb")
out_file.write(data[length:length+1])
out_file.close()
in_file = open('file', "wb")
in_file.write(data[1:])
in_file.close()
in_file = open('file', "rb")
data = in_file.read()
length = len(in_file.read())
in_file.close()
out_file = open('file2', "ab")
out_file.write(data[length:length+1])
out_file.close()
in_file = open('file', "wb")
in_file.write(data[1:])
in_file.close()
in_file = open('file', "rb")
data = in_file.read()
length = len(in_file.read())
in_file.close()
out_file = open('file2', "ab")
out_file.write(data[length:length+1])
out_file.close()
in_file = open('file', "wb")
in_file.write(data[1:])
in_file.close()
I guess the way I saw this happening is I get the first byte written outside of the loop, and then I have a for loop for appending each subsequent byte between the two files. I've tried creating a for loop for that sequence but I keep receiving errors about trying to access the closed file, so I'm not sure when/where to "close" my file. Reason I'm doing this is eventually I will convert each byte (files I'm dealing with are obfuscated bytes and I need to convert them back) to a difference byte value.
I appreciate any assistance!

Keep the 2 files open until you're done. Closing and opening files can't happen as fast your program tries to execute them.
You may have to flush the file object before you do the final (and only) close, i.e.
in_file.close(); // no need to flush the in-file
out_file.flush(); // do flush the out-file
out_file.close();

Related

How can I delete every second line in avery big text file?

I have a very big text file and I want to delete every second line. How can I do it in an effective way?
I have written a code like this:
_file = open("merged_DGM.txt", "r")
text = _file.readlines()
for i, j in enumerate(text):
if i % 2 == 0:
del text[i]
_file.close()
_file = open("half_DGM.txt", "w")
for i in text:
_file.write(i)
_file.close()
It works for small textfiles. but for big files, it loads the whole text into the variable. After 10 minutes it could not solve the problem.
Any suggestions would be appreciated.
The file object returned by open iherits from io.IOBase and can be iterated. By directly iteration over the file you avoid loading your whole file into the memory at once.
with open("merged_DGM.txt", "r") as in_file and open("half_DGM.txt", "w") as out_file:
for index, line in enumerate(in_file):
if index % 2:
out_file.write(line)

Python 3.7.0 on Windows 10 unexpected behavior with open()

I'm new to Python and am seeing something unexpected based on other languages
I've worked with.
This code writes to a log file.
import datetime
import time
date = datetime.datetime.today().strftime('%Y-%m-%d')
mtime = time.strftime("%H:%M:%S")
myfile = "log_file." + date + ".txt"
# fh = open(myfile, "a") # Read program won't read new records if the open
# is done outside the loop
for x in range(100):
fh = open(myfile, "a") # Read program works as expected if this open
mtime = time.strftime("%H:%M:%S")
msg = str (mtime + " This is entry number " + str(x+1) + "\n")
fh.write(msg)
time.sleep( 2 )
fh.close
This code prints out new records written to the log file
import datetime
import time
date = datetime.datetime.today().strftime('%Y-%m-%d')
myfile = "log_file." + date + ".txt"
# This reads through all the records currently in the file.
lastLine = None
with open(myfile,'r') as f:
while True:
line = f.readline()
if not line:
break
# print(line)
lastLine = line
# This prints out all of the new lines that are added to the file.
while True:
with open(myfile,'r') as f:
lines = f.readlines()
if lines[-1] != lastLine:
lastLine = lines[-1]
print(lines[-1])
time.sleep(1)
If I place the open() in the write code before the for loop the read code never
sees the new records.
If I place the open() in the write code inside the loop the read code prints out
the new lines added to the file as expected. Is this correct behavior? If so, why?
The file is running in buffered mode. The reason that it works inside of the loop is that they file is being opened repeatedly and the buffer is likely flushed. If you need writes to the file to be visible quickly in the reader, then you can disable buffering when you open the file with the buffering=0 keyword argument. This should make the new lines visible quickly in the reader. You can also explicitly call f.flush() in the writer. See the docs on open() for more details.

Python - Spyder 3 - Open a list of .csv files and remove all double quotes in every file

I've read every thing I can find and tried about 20 examples from SO and google, and nothing seems to work.
This should be very simple, but I cannot get it to work. I just want to point to a folder, and replace every double quote in every file in the folder. That is it. (And I don't know Python well at all, hence my issues.) I have no doubt that some of the scripts I've tried to retask must work, but my lack of Python skill is getting in the way. This is as close as I've gotten, and I get errors. If I don't get errors it seems to do nothing. Thanks.
import glob
import csv
mypath = glob.glob('\\C:\\csv\\*.csv')
for fname in mypath:
with open(mypath, "r") as infile, open("output.csv", "w") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
writer.writerow(item.replace("""", "") for item in row)
You don't need to use csv-specific file opening and writing, I think that makes it more complex. How about this instead:
import os
mypath = r'\path\to\folder'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = fpath + '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace('"', '')) # Remove each " and write the line
Let me know if this works, and respond with any error messages you may have.
I found the solution to this based on the original answer provided by u/Jeff. It was actually smart quotes (u'\u201d') to be exact, not straight quotes. That is why I could get nothing to work. That is a great way to spend like two days, now if you'll excuse me I have to go jump off the roof. But for posterity, here is what I used that worked. (And note - there is the left curving smart quote as well - that is u'\u201c'.
mypath = 'C:\\csv\\'
myoutputpath = 'C:\\csv\\output\\'
for file in os.listdir(mypath): # This will loop through every file in the folder
if '.csv' in file: # Check if it's a csv file
fpath = os.path.join(mypath, file)
fpath_out = os.path.join(myoutputpath, file) #+ '_output' # Create an output file with a similar name to the input file
with open(fpath) as infile:
lines = infile.readlines() # Read all lines
with open(fpath_out, 'w') as outfile:
for line in lines: # One line at a time
outfile.write(line.replace(u'\u201d', ''))# Remove each " and write the line
infile.close()
outfile.close()

add new row to numpy using realtime reading

I am using a microstacknode accelerometer and intend to save it into csv file.
while True:
numpy.loadtxt('foo.csv', delimiter=",")
raw = accelerometer.get_xyz(raw=True)
g = accelerometer.get_xyz()
ms = accelerometer.get_xyz_ms2()
a = numpy.asarray([[raw['x'],raw['y'],raw['z']]])
numpy.savetxt("foo.csv",a,delimiter=",",newline="\n")
However, the saving is only done on 1 line. Any help given? Still quite a noobie on python.
NumPy is not the best solution for this type of things.
This should do what you intend:
while True:
raw = accelerometer.get_xyz(raw=True)
fobj = open('foo.csv', 'a')
fobj.write('{},{},{}\n'.format(raw['x'], raw['y'], raw['z']))
fobj.close()
Here fobj = open('foo.csv', 'a') opens the file in append mode. So if the file already exists, the next writing will go to the end of file, keeping the data in the file.
Let's have look at your code. This line:
numpy.loadtxt('foo.csv', delimiter=",")
reads the whole file but doe not do anything with the at it read, because you don't assign to a variable. You would need to do something like this:
data = numpy.loadtxt('foo.csv', delimiter=",")
This line:
numpy.savetxt("foo.csv",a,delimiter=",",newline="\n")
Creates a new file with the name foo.csv overwriting the existing one. Therefore, you see only one line, the last one written.
This should do the same but dos not open and close the file all the time:
with open('foo.csv', 'a') as fobj:
while True:
raw = accelerometer.get_xyz(raw=True)
fobj.write('{},{},{}\n'.format(raw['x'], raw['y'], raw['z']))
The with open() opens the file with the promise to close it even in case of an exception. For example, if you break out of the while True loop with Ctrl-C.

Merging multiple text files into one and related problems

I'm using Windows 7 and Python 3.4.
I have several multi-line text files (all in Persian) and I want to merge them into one under one condition: each line of the output file must contain the whole text of each input file. It means if there are nine text files, the output text file must have only nine lines, each line containing the text of a single file. I wrote this:
import os
os.chdir ('C:\Dir')
with open ('test.txt', 'w', encoding = 'UTF8') as OutFile:
with open ('news01.txt', 'r', encoding = 'UTF8') as InFile:
while True:
_Line = InFile.readline()
if len (_Line) == 0:
break
else:
_LineString = str (_Line)
OutFile.write (_LineString)
It worked for that one file but it looks like it takes more than one line in output file and also the output file contains disturbing characters like: &amp, &nbsp and things like that. But the source files don't contain any of them.
Also, I've got some other texts: news02.txt, news03.txt, news04.txt ... news09.txt.
Considering all these:
How can I correct my code so that it reads all files one after one, putting each in only one line?
How can I clean these unfamiliar and strange characters or prevent them to appear in my final text?
Here is an example that will do the merging portion of your question:
def merge_file(infile, outfile, separator = ""):
print(separator.join(line.strip("\n") for line in infile), file = outfile)
def merge_files(paths, outpath, separator = ""):
with open(outpath, 'w') as outfile:
for path in paths:
with open(path) as infile:
merge_file(infile, outfile, separator)
Example use:
merge_files(["C:\file1.txt", "C:\file2.txt"], "C:\output.txt")
Note this makes the rather large assumption that the contents of 'infile' can fit into memory. Reasonable for most text files, but possibly quite unreasonable otherwise. If your text files will be very large, you can this alternate merge_file implementation:
def merge_file(infile, outfile, separator = ""):
for line in infile:
outfile.write(line.strip("\n")+separator)
outfile.write("\n")
It's slower, but shouldn't run into memory problems.
Answering question 1:
You were right about the UTF-8 part.
You probably want to create a function which takes multiple files as a tuple of files/strings of file directories or *args. Then, read all input files, and replace all "\n" (newlines) with a delimiter (Default ""). out_file can be in in_files, but makes the assumption that the contents of files can be loaded in to memory. Also, out_file can be a file object, and in_files can be file objects.
def write_from_files(out_file, in_files, delimiter="", dir="C:\Dir"):
import _io
import os
import html.parser # See part 2 of answer
os.chdir(dir)
output = []
for file in in_files:
file_ = file
if not isinstance(file_, _io.TextIOWrapper):
file_ = open(file_, "r", -1, "UTF-8") # If it isn't a file, make it a file
file_.seek(0, 0)
output.append(file_.read().replace("\n", delimiter)) # Replace all newlines
file_.close() # Close file to prevent IO errors # with delimiter
if not isinstance(out_file, _io.TextIOWrapper):
out_file = open(out_file, "w", -1, "UTF-8")
html.parser.HTMLParser().unescape("\n".join(output))
out_file.write(join)
out_file.close()
return join # Do not have to return
Answering question 2:
I think you may of copied from a webpage. This does not happen to me. The &amp and &nbsp are the HTML entities, (&) and ( ). You may need to replace them with their corresponding character. I would use HTML.parser. As you see in above, it turns HTML escape sequences into Unicode literals. E.g.:
>>> html.parser.HTMLParser().unescape("Alpha &lt β")
'Alpha < β'
This will not work in Python 2.x, as in 3.x it was renamed. Instead, replace the incorrect lines with:
import HTMLParser
HTMLParser.HTMLParser().unescape("\n".join(output))

Resources