I have searched a lot for this but can't find answers anywhere. I am trying to do something like the following:
cat somefile.txt | grep somepattern | ./script.lua
I haven't found a single resource on handling piped io in Lua, and can't figure out how to do it. Is there a good way, non hackish way to do tackle it? Preferably buffered for lower memory usage, but I'll settle for reading the whole file at once if thats the only alternative.
It would be really disappointing to have to write it into a temp file and then load it into the program.
Thanks in advance.
The standard lirary has an io.stdin and an io.stdout that you can use for input and output without havig to resort to temporary files. You can also use io.read isntead of someFile:read and it will read from stdin by default.
http://www.lua.org/pil/21.1.html
The buffering is responsibility of the operating system that is providing the pipes. You don't need to worry too much about it when writing your programs.
edit: Apparently when you mentioned buffering you were thinking about reading part of the file as opposed to loading the whole file into a string. io.read can take a numeric parameter to read up to a certain number of bytes from input, returning nil if no characters could be read.
local size = 2^13 -- good buffer size (8K)
while true do
local block = io.read(size)
if not block then break end
io.write(block)
end
Another (simpler) alternative is the io.lines() iterator but without a filename inside the parentheses. Example:
for line in io.lines() do
print(line)
end
UPDATE: To get a number of characters you can write a wrapper around this. Example:
function io.chars(n,filename)
n = n or 1 --default number of characters to read at a time
local chars = ''
local wrap, yield = coroutine.wrap, coroutine.yield
return wrap(function()
for line in io.lines(filename) do
line = chars .. line .. '\n'
while #line >= n do
yield(line:sub(1,n))
line = line:sub(n+1)
end
chars = line
end
if chars ~= '' then yield(chars) end
end)
end
for text in io.chars(30) do
io.write(text)
end
Related
I try to read all the strings from the file "Text.txt" and add the strings to a vector by using this code:
std::ifstream in;
in.open("Text.txt");
std::vector<std::string> vec;
while (!in.eof()) {
in >> str;
vec.push_back(str);
}
The problem is that I read the last string twice.
Any idea why this is happening?
Thank you!
It's explained elsewhere on this site already.
In this order:
in.eof() checks eofbit which is false. No read operation has read the end of file yet. The loop continues.
in >> str encounters the end of file and sets eofbit. This also leaves str unchanged from the last iteration,
you push the old (unchanged) str which is now in your vector twice,
you exit the loop when in.eof() checks eofbit.
Your misunderstanding is that in.eofis doing something to detect the
end of file condition -- but, it's not. It just checks eofbit.
eofbit isn't set until the >> operation is performed.
Carefully read documentation for ios::eof, which I shall excerpt here:
std::ios::eof
Returns true if the eofbit error state flag is set for the stream.
This flag is set by all standard input operations when the End-of-File
is reached in the sequence associated with the stream.
Note that the value returned by this function depends on the last
operation performed on the stream (and not on the next).
To fix the problem
in >> str will return whether or not a string was read. Just base your loop condition on that.
while(in >> str)
vec.push_back(str);
I'm currently learning to use Python for binary files. I came across this code in the book I'm reading:
FILENAME = 'pc_rose_copy.txt'
def display_contents(filename):
fp = open(filename, 'rb')
print(fp.read())
fp.close()
def encrypt(filename):
fp = open(filename, 'r+b')
text = fp.read()
fp.seek(0)
for c in text:
if c <= 128:
fp.write(bytes([c+128]))
else:
fp.write(bytes([c-128]))
fp.close()
display_contents(FILENAME)
encrypt(FILENAME)
display_contents(FILENAME)
I've several doubts regarding this code for which I can't find an answer in the book:
1) In line 13 ("if c <= 128"), since the file was opened in binary mode, each character is read as its index in the ASCII table (i.e., that is equivalent to 'if ord(c) <= 128' had the file not been in binary mode)?
2) If so, then what's the point in checking if any character's index is higher than 128, since this is a .txt with a passage from Romeo and Juliet?
3) This point is more of a curiosity, so pardon naivety. I know this doesn't apply in this case, but say the script encounters a 'c' with a byte value of 128, and so adds 128 to it. What would 256 byte look like -- would it be 11111111 00000001?
What's really happening is that the script is toggling the most significant bit of every byte. This is equivalent to adding/subtracting 128 to each byte. You can see this by looking at the file contents before/after running the script (xxd -b file.txt on linux or mac will let you see the exact bits/bytes).
Here's a run on some sample text:
File Contents Before:
11110000 10011111 10011000 10000100 00001010
File Contents After:
01110000 00011111 00011000 00000100 10001010
Running the script twice (or any even number of times) restores the original text by toggling all of the high bits back to the original values.
Question / Answer:
1) If the file is ASCII-encoded, yes. e.g. for a file abc\n, the values of c are 97, 98, 99, and 10 (newline). You can verify this by adding print(c) inside the loop. This script will also work* on non-ASCII encoded files (the example above is UTF-8).
2) So that we can flip the bits. Even if we were only handling ASCII files (which isn't guaranteed), the bytes we get from encrypting ASCII files will be larger than 128, since we've added 128 to each byte. So we still need to handle that case in order to decrypt our own files.
3) As is, the script crashes, because bytes() requires values in the range 0 <= x < 256 (see documentation). You can create a file that breaks the script with echo -n -e '\x80\x80\x80' > 128.txt. The script should be using < instead to handle this case properly.
* Except for 3)
I think that the encrypt function is also meant to be a decrypt function.
The encrypt goes from a text file to a binary file with only high bytes. But the else clause is for going back from high byte to text. I think that if you added an extra encrypt(FILENAME) you'd get the original file back.
'c' cannot really be 128, in a text file. The highest value there would be 126 (~), 127 is the del "character". But c=128 and adding 128 as bytes would be 0 (wrap around) as we work modulo 256. In C this would be the case (for unsigned char).
I am having trouble determining when I have reached the end of a file in python with file.readline
fi = open('myfile.txt', 'r')
line = fi.readline()
if line == EOF: //or something similar
dosomething()
c = fp.read()
if c is None:
will not work because then I will loose data on the next line, and if a line only has a carriage return I will miss an empty line.
I have looked a dozens or related posts, and they all just use the inherent loops that just break when they are done. I am not looping so this doesn't work for me. Also I have file sizes in the GB with 100's of thousands of lines. A script could spend days processing a file. So I need to know how to tell when I am at the end of the file in python3. Any help is appreciated. Thank you!
I ran in to this same exact problem. My specific issue was iteration over two files, where the shorter one was only supposed to read a line on specific reads of the longer file.
As some mentioned here the natural pythonic way to iterate line by line is to, well, just iterate. My solution to stick with this 'naturalness' was to just utilize the iterator property of a file manually. Something like this:
with open('myfile') as lines:
try:
while True: #Just to fake a lot of readlines and hit the end
current = next(lines)
except StopIteration:
print('EOF!')
You can of course embellish this with your own IOWrapper class, but this was enough for me. Just replace all calls to readline to calls of next, and don't forget to catch the StopIteration.
The simplest way to check whether you've reached EOF with fi.readline() is to check the truthiness of the return value;
line = fi.readline()
if not line:
dosomething() # EOF reached
Reasoning
According to the official documentation
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.
and the only falsy string in python is the empty string ('').
You can use the output of the tell() function to determine if the last readline changed the current position of the stream.
fi = open('myfile.txt', 'r')
pos = fi.tell()
while (True):
li = fi.readline()
newpos = fi.tell()
if newpos == pos: # stream position hasn't changed -> EOF
break
else:
pos = newpos
According to the Python Tutorial:
f.tell() returns an integer giving the file object’s current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode.
...
In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero.
Since the value returned from tell() can be used to seek(), they would have to be unique (even if we can't guarantee what they correspond to). Therefore, if the value of tell() before and after a readline() is unchanged, the stream position is unchanged, and the EOF has been reached (or some other I/O exception of course). Reading an empty line will read at least the newline and advance the stream position.
This is a demonstrative example using f.tell() and f.read() with a chunk of data:
Assuming my input.txt file contain:
hello
hi
hoo
foo
bar
Test:
with open('input.txt', 'r') as f:
# Read chunk of data
chunk = 4
while True:
line = f.read(chunk)
if not line:
line = "i've read Nothing"
print("EOF reached. What i read when i reach EOF:", line)
break
else:
print('Read: {} at position: {}'.format(line.replace('\n', ''), f.tell()))
Will output:
Read: hell at position: 4
Read: ohi at position: 9
Read: hoo at position: 14
Read: foo at position: 19
Read: bar at position: 24
EOF reached. What i read when i reach EOF: i've read Nothing
with open(FILE_PATH, 'r') as fi:
for line in iter(fi.readline, ''):
parse(line)
I am reading from a file in Fortran which has an undetermined number of floating point values on each line (for now, there are about 17 values on a line). I would like to read the 'n'th value on each line to a given floating point variable. How should i go about doing this?
In C the way I wrote it was to read the entire line onto the string and then do something like the following:
for(int il = 0; il < l; il++)
{
for(int im = -il; im <= il; im++)
pch = strtok(NULL, "\t ");
}
for(int im = -l; im <= m; im++)
pch = strtok(NULL, "\t ");
dval = atof(pch);
Here I am continually reading a value and throwing it away (thus shortening the string) until I am ready to accept the value I am trying to read.
Is there any way I can do this in Fortran? Is there a better way to do this in Fortran? The problem with my Fortran code seems to be that read(tline, '(f10.15)') tline1 does not shorten tline (tline is my string holding the entire line and tline1 what i am trying to parse it into), thus I cannot use the same method as I did in my C routine.
Any help?
The issue is that Fortran is a record-based I/O system while C is stream-based.
If you have access to a Fortran 2003 compliant compiler (modern versions of gfortran should work), you can use the stream ACCESS specifier to do what you want.
An example can be found here.
Of course, if you were really inclined, you could just use your C function directly from Fortran. Interfacing the two languages is generally simple, typically only requiring a wrapper with a lowercase name and an appended underscore (depending on compiler and platform of course). Passing arrays or strings back and forth is not so trivial typically; but for this example that wouldn't be needed.
Once the data is in a character array, you can read it into another variable as you are doing with the ADVANCE=no signature, ie.
do i = 1, numberIWant
read(tline, '(F10.15)', ADVANCE="no") tline1
end do
where tline should contain your number at the end of the loop.
Because of the record-based I/O, a READ statement will typically throw out what is after the end of the record. But the ADVANCE=no tells it not to.
If you know exactly at what position the value you want starts, you can use the T edit descriptor to initiate the next read from that position.
Let's say, for instance, that the width of each field is 10 characters and you want to read the fifth value. The read statement will then look something like the following.
read(file_unit, '(t41, f10.5)') value1
P.s.: You can dynamically create a format string at runtime, with the correct number after the t, by using a character variable as format and use an internal file write to put in this number.
Let's say you want the value that starts at position n. It will then look something like this (I alternated between single and double quotes to try to make it more clear where each string starts and stops):
write(my_format, '(a, i0, a)') "(t", n, ', f10.5)'
read(file_unit, my_format) value1
Using read(*,*) in Fortran doesn't seem to work if the string to be read from the user contains spaces.
Consider the following code:
character(Len = 1000) :: input = ' '
read(*,*) input
If the user enters the string "Hello, my name is John Doe", only "Hello," will be stored in input; everything after the space is disregarded. My assumption is that the compiler assumes that "Hello," is the first argument, and that "my" is the second, so to capture the other words, we'd have to use something like read(*,*) input1, input2, input3... etc. The problem with this approach is that we'd need to create large character arrays for each input, and need to know exactly how many words will be entered.
Is there any way around this? Some function that will actually read the whole sentence, spaces and all?
character(100) :: line
write(*,'("Enter some text: ",\)')
read(*,'(A)') line
write(*,'(A)') line
end
... will read a line of text of maximum length 100 (enough for most practical purposes) and write it out back to you. Modify to your liking.
Instead of read(*, *), try read(*, '(a)'). I'm no Fortran expert, but the second argument to read is the format specifier (equivalent to the second argument to sscanf in C). * there means list format, which you don't want. You can also say a14 if you want to read 14 characters as a string, for example.