How to write for loop output in formatted way - python-3.x

I'm new to python, I want to write I and Q. For every I and Q having 200000 samples data each. I want arrays of I1 to I3000 and Q1 to Q3000.
I'm using this code but every time all values are going into I and Q variable. I'm attaching my code.
Could you please help me with that? I can't upload my file here it's very big in size and this is binary file.
What want exactly is
I1=[....]
Q1=[....]
....
....
I3000=[...]
Q3000=[...]
with open(file, 'rb') as fd:
b= constBitStream(fd)
for rc in range(0,3000):
for j in range(0,200000):
aux=b.read('bits:8')
I=aux.int
I_ch=[]
I_ch.append(I)
aux=b.read('bits:8')
Q=aux.int
Q_ch=[]
Q_ch.append(Q)
``
What I want exactly is
I1=[....]
Q1=[....]
....
....
I3000=[...]
Q3000=[...]

Related

py2neo cursor appears to consume everything into memory rather than stream data

I am running a query to a Neo4J server, which I expect to return >100M rows (but just a few columns) and then write the results into a CSV file. This works well for queries that return up to 10-20M rows but becomes tricky as the resultant rows go up into 10^8 numbers.
I thought, writing the results row by row (ideally buffered) should be a solution but the csv.Writer appears to only write into disk once the whole code executes (i.e. at the end of the iteration), rather than in chunks as expected. In this example below, I tried explicitly flushing the file (which did not work). I also do not get any output on stdout indicating that the iteration is not occurring as intended.
The mem usage of the process is growing rapidly however, over 12GBs last I checked. That makes me think that the cursor is trying to get all the data before starting iteration, which it should not do, unless I misunderstood something.
Any ideas?
from py2neo import Graph
import csv
cursor = g.run(query)
with open('bigfile.csv', 'w') as csvfile:
fieldnames = cursor.keys()
writer = csv.Writer(csvfile)
# writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
# writer.writeheader()
i = 0
j = 1
for rec in cursor:
# writer.writerow(dict(rec))
writer.writerow(rec.values())
i +=1
if i == 50000:
print(str(i*j) + '...')
csvfile.flush()
i = 0
j +=1
Isn't the main problem the size of the query, rather than the method of writing the results to the CSV file? If you're chunking the writing process, perhaps you should chunk the querying process aswell, since the results are stored in memory while the file writing is taking place.

Reading file and getting values from a file. It shows only first one and others are empty

I am reading a file by using a with open in python and then do all other operation in the with a loop. While calling the function, I can print only the first operation inside the loop, while others are empty. I can do this by using another approach such as readlines, but I did not find why this does not work. I thought the reason might be closing the file, but with open take care of it. Could anyone please suggest me what's wrong
def read_datafile(filename):
with open(filename, 'r') as f:
a = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==2]
b = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==3]
c = [lines.split("\n")[0] for number, lines in enumerate(f) if number ==2]
return a, b, c
read_datafile('data_file_name')
I only get values for a and all others are empty. When 'a' is commented​, I get value for b and others are empty.
Updates
The file looks like this:
-0.6908270760153553 -0.4493128078936575 0.5090918714784820
0.6908270760153551 -0.2172871921063448 0.5090918714784820
-0.0000000000000000 0.6666999999999987 0.4597549674638203
0.3097856229862140 -0.1259623621214220 0.5475896447896115
0.6902143770137859 0.4593623621214192 0.5475896447896115
The construct
with open(filename) as handle:
a = [line for line in handle if condition]
b = [line for line in handle]
will always return an empty b because the iterator in a already consumed all the data from the open filehandle. Once you reach the end of a stream, additional attempts to read anything will simply return nothing.
If the input is seekable, you can rewind it and read all the same lines again; or you can close it (explicitly, or implicitly by leaving the with block) and open it again - but a much more efficient solution is to read it just once, and pick the lines you actually want from memory. Remember that reading a byte off a disk can easily take several orders of magnitude more time than reading a byte from memory. And keep in mind that the data you read could come from a source which is not seekable, such as standard output from another process, or a client on the other side of a network connection.
def read_datafile(filename):
with open(filename, 'r') as f:
lines = [line for line in f]
a = lines[2]
b = lines[3]
c = lines[2]
return a, b, c
If the file could be too large to fit into memory at once, you end up with a different set of problems. Perhaps in this scenario, where you only seem to want a few lines from the beginning, only read that many lines into memory in the first place.
What exactly are you trying to do with this script? The lines variable here may not contain what you want: it will contain a single line because the file gets enumerated by lines.

Nested For loop over csv files

I have 2 .csv datasets from the same source. I was attempting to check if any of the items from the first dataset are still present in the second.
#!/usr/bin/python
import csv
import json
import click
#click.group()
def cli(*args, **kwargs):
"""Command line tool to compare and generate a report of item that still persists from one report to the next."""
pass
#click.command(help='Compare the keysets and return a list of keys old keys still active in new keyset.')
#click.option('--inone', '-i', default='keys.csv', help='specify the file of the old keyset')
#click.option('--intwo', '-i2', default='keys2.csv', help='Specify the file of the new keyset')
#click.option('--output', '-o', default='results.json', help='--output, -o, Sets the name of the output.')
def compare(inone, intwo, output):
csvfile = open(inone, 'r')
csvfile2 = open(intwo, 'r')
jsonfile = open(output, 'w')
reader = csv.DictReader(csvfile)
comparator = csv.DictReader(csvfile2)
for line in comparator:
for row in reader:
if row == line:
print('#', end='')
json.dump(row, jsonfile)
jsonfile.write('\n')
print('|', end='')
print('-', end='')
cli.add_command(compare)
if __name__ == '__main__':
cli()
say each csv files has 20 items in it. it will currently iterate 40 times and end when I was expecting it to iterate 400 times and create a report of items remaining.
Everything but the iteration seems to be working. anyone have thoughts on a better approach?
Iterating 40 times sounds just about right - when you iterate through your DictReader, you're essentially iterating through the wrapped file lines, and once you're done iterating it doesn't magically reset to the beginning - the iterator is done.
That means that your code will start iterating over the first item in the comparator (1), then iterate over all items in the reader (20), then get the next line from the comparator(1), then it won't have anything left to iterate over in the reader so it will go to the next comparator line and so on until it loops over the remaining comparator lines (18) - resulting in total of 40 loops.
If you really want to iterate over all of the lines (and memory is not an issue), you can store them as lists and then you get a new iterator whenever you start a for..in loop, so:
reader = list(csv.DictReader(csvfile))
comparator = list(csv.DictReader(csvfile2))
Should give you an instant fix. Alternatively, you can reset your reader 'steam' after the loop with csvfile.seek(0).
That being said, if you're going to compare lines only, and you expect that not many lines will differ, you can load the first line in csv.reader() to get the 'header' and then forgo the csv.DictReader altogether by comparing the lines directly. Then when there is a change you can pop in the line into the csv.reader() to get it properly parsed and then just map it to the headers table to get the var names.
That should be significantly faster on large data sets, plus seeking through the file can give you the benefit of never having the need to store in memory more data than the current I/O buffer.

Split large strings by delimiters

I am trying to process the output of a system('./foo') command. If I directly redirect the output to a file with system('./foo > output') and read the file by dlmread into MATLAB, it works fine, but I'm trying to avoid writing a huge ASCII file (about 1e7 lines) on the hard disk every time I do this.
So I want to deal with the output directly by reading it into a huge string and splitting the string. It works fine for small files:
[a,b] = system('./foo')
b=strsplit(b);
cellfun(#str2num, bb);
b=cellfun(#str2num, b(1:end),'UniformOutput',0);
b=cell2mat(b);
Unfortunately, this consumes already in the step of the strsplit operation way too much memory, so that MATLAB gets killed by the OOM killer.
I found the alternative:
b=textscan(b,'%s','delimiter',' ','multipleDelimsAsOne',1);
But it also consumes way too much memory.
Can somebody help me with a better idea how to split that string of numbers and read it into a matrix or generally how to avoid writing the output of the command to a file on the hard disk?
Edit: (I'm writing here, because in the comments is not enough space...)
#MZimmerman6 I tried now a version by dlmread with and without pre-allocation and your proposal as well as I understood it:
In fact the loop is much slower than the dlmread.
clear all
close all
tic
ttags1=dlmread('tmp.txt',' ',1,3);
toc
clear all
tic
[~,result]=system('perl -e ''while(<>){};print$.,"\n"'' tmp.txt');
numLines1=str2double(result);
ttags=zeros(numLines1,1);
ttags=dlmread('tmp.txt',' ',1,3);
toc
clear all
tic
fid = fopen('tmp.txt');
count = 1;
[~,result]=system('perl -e ''while(<>){};print$.,"\n"'' tmp.txt');
numLines1=str2double(result);
temp = cell(numLines1,1);
for i = 1:numLines1
tline = fgetl(fid);
if ischar(tline)
vals = textscan(tline,'%f','delimiter',',');
temp{i} = transpose(vals{1});
end
end
fclose(fid);
temp = cell2mat(temp);
toc
The result is:
Elapsed time is 19.762470 seconds.
Elapsed time is 21.546079 seconds.
Elapsed time is 796.755343 seconds.
Thank you & Best Regards
Am I doing something wrong?
You should not try to read the entire file into memory, as this can be extremely memory heavy. I would recommend reading the file line by line, and processing each individually, then store the results into a cell array. You can then, once the parsing is done, convert that into a normal matrix.
The first thing I could do is create a small Perl script to count the number of lines in the file you are reading, so you can pre-allocate memory for the data. Call this file countlines.pl. Information gathered from here
Perl - Countlines.pl
while (<>) {};
print $.,"\n";
This file will only be two lines, but will quickly count the total lines in the file.
You can then use the result of this file to pre-allocate and then do your line by line parsing. I used in my testing a simple comma separated file, so you can adjust textscan to handle things as you want.
MATLAB Script
% get number of lines in data file
numLines = str2double(perl('countlines.pl','text.txt'));
fid = fopen('text.txt');
count = 1;
temp = cell(numLines,1);
for i = 1:numLines
tline = fgetl(fid);
if ischar(tline)
vals = textscan(tline,'%f','delimiter',',');
temp{i} = transpose(vals{1});
end
end
fclose(fid);
temp = cell2mat(temp);
This should run relatively quickly depending on your file size, and do what you want. Of course you can edit how the parsing is done inside the loop, but this should be a good starting point.
Note for the future, do not try to read large amounts of stuff into memory if it is not completely necessary

How do I write a Python program that computes the average from a .dat file?

I have this so far but I don't know how to write over the .dat file:
def main():
fname = input("Enter filename:")
infile = open(fname, "r")
data = infile.read()
print(data)
for line in infile.readlines():
score = int(line)
counts[score] = counts[score]+1
infile.close()
total=0
for c in enumerate(counts):
total = total + i*c
average = float(total)/float(sum(counts))
print(average)
main()
Here is my .dat file:
4
3
5
6
7
My statistics professor expects us to learn Python to compute the mean and standard deviation. All I need to know is how to do the mean and then I've got the rest figured out. I want to know how does Python write over each line in a .dat file. Could someone tell me how to fix this code? I've never done programming before.
To answer your question, as I understand it, in three parts:
How to read the file in
in your example you use
infile.read()
which reads the entire contents of the file into a string and takes you to the end of file. Therefore the following
infile.readlines()
will read nothing more. You should omit the first read().
How to compute the mean
There are many ways to do this in python - more or less elegant - and also I guess it depends on exactly what the problem is. But in the simplest case you can just sum and count the values as you go , then divide sum by count at the end to get the result:
infile = open("d.dat", "r")
total = 0.0
count = 0
for line in infile.readlines():
print ("reading in line: ",line)
try:
line_value = float(line)
total += line_value
count += 1
print ("value = ",line_value, "running total =",total, "valid lines read = ",count)
except:
pass #skipping non-numeric lines or characters
infile.close()
The try/except part is just in case you have lines or characters in the file that can't be turned into floats, these will be skipped.
How to write to the .dat file
Finally you seem to be asking how to write the result back out to the d.dat file. Not sure whether you really need to do this, it should be acceptable to just display the result as in the above code. However if you do need to write it back to the same file, just close it after reading from it, reopen it for writing (in 'append' mode so output goes to the end of the file), and output the result using write().
outfile = open("d.dat","a")
outfile.write("\naverage = final total / number of data points = " + str(total/count)+"\n")
outfile.close()
fname = input("Enter filename:")
infile = open(fname, "r")
data = infile.readline() #Reads first line
print(data)
data = infile.readline() #Reads second line
print(data)
You can put this in a loop.
Also, these values will come in as Strings convert them to floats using float(data) each time.
Also, the guys over at StackOverflow are not as bad at math as you think. This could have easily been answered there. (And maybe in a better fashion)

Resources