Read system call on text files

Read system call on text files - linux

So I have the code below that i received to an examination and these two parts didn't really knew how to solve them.
#define MAX_LINE 4096
char line[MAX_LINE];
fd = open("../.././test.txt". O_RDONLY);
read(fd, line, MAX_LINE);
read(fd, line, MAX_LINE);
Explain which is the minimum and the maximum number of text lines that could be read by the given code.
Change the code to read exactly on line of text.
Thank you !

Explain which is the minimum and the maximum number of text lines that could be read by the given code.
Minimum is 0, or a fraction. Maximum is probably 2047 or 2048 if the line terminator is assumed to be \n alone, or 4096/3 +/- 1 if it is \r\n, or 4096 if the lines are allowed to be empty and the line terminator is assumed to be \n. I would say the question is radically underspecified, and complain.
Change the code to read exactly on line of text.
Again this is radically underspecified. If use of stdio is allowed, which isn't stated, and aren't system calls, there are several choices. If it isn't, you would have to write a loop and a string-concatenation.

I think fgets() is appropriate to read one line.
#define MAX_LINE 4096
char line[MAX_LINE];
FILE *fp;
fp = fopen("../.././test.txt", "r");
fgets(line, sizeof(line), fp); // read one line

Related

What's really happening in this Python encryption script?

I'm currently learning to use Python for binary files. I came across this code in the book I'm reading:
FILENAME = 'pc_rose_copy.txt'
def display_contents(filename):
fp = open(filename, 'rb')
print(fp.read())
fp.close()
def encrypt(filename):
fp = open(filename, 'r+b')
text = fp.read()
fp.seek(0)
for c in text:
if c <= 128:
fp.write(bytes([c+128]))
else:
fp.write(bytes([c-128]))
fp.close()
display_contents(FILENAME)
encrypt(FILENAME)
display_contents(FILENAME)
I've several doubts regarding this code for which I can't find an answer in the book:
1) In line 13 ("if c <= 128"), since the file was opened in binary mode, each character is read as its index in the ASCII table (i.e., that is equivalent to 'if ord(c) <= 128' had the file not been in binary mode)?
2) If so, then what's the point in checking if any character's index is higher than 128, since this is a .txt with a passage from Romeo and Juliet?
3) This point is more of a curiosity, so pardon naivety. I know this doesn't apply in this case, but say the script encounters a 'c' with a byte value of 128, and so adds 128 to it. What would 256 byte look like -- would it be 11111111 00000001?

What's really happening is that the script is toggling the most significant bit of every byte. This is equivalent to adding/subtracting 128 to each byte. You can see this by looking at the file contents before/after running the script (xxd -b file.txt on linux or mac will let you see the exact bits/bytes).
Here's a run on some sample text:
File Contents Before:
11110000 10011111 10011000 10000100 00001010
File Contents After:
01110000 00011111 00011000 00000100 10001010
Running the script twice (or any even number of times) restores the original text by toggling all of the high bits back to the original values.
Question / Answer:
1) If the file is ASCII-encoded, yes. e.g. for a file abc\n, the values of c are 97, 98, 99, and 10 (newline). You can verify this by adding print(c) inside the loop. This script will also work* on non-ASCII encoded files (the example above is UTF-8).
2) So that we can flip the bits. Even if we were only handling ASCII files (which isn't guaranteed), the bytes we get from encrypting ASCII files will be larger than 128, since we've added 128 to each byte. So we still need to handle that case in order to decrypt our own files.
3) As is, the script crashes, because bytes() requires values in the range 0 <= x < 256 (see documentation). You can create a file that breaks the script with echo -n -e '\x80\x80\x80' > 128.txt. The script should be using < instead to handle this case properly.
* Except for 3)

I think that the encrypt function is also meant to be a decrypt function.
The encrypt goes from a text file to a binary file with only high bytes. But the else clause is for going back from high byte to text. I think that if you added an extra encrypt(FILENAME) you'd get the original file back.
'c' cannot really be 128, in a text file. The highest value there would be 126 (~), 127 is the del "character". But c=128 and adding 128 as bytes would be 0 (wrap around) as we work modulo 256. In C this would be the case (for unsigned char).

Write char in line as many times as the length of previous line

I would like to write a char as many times as the length of the previous line
Starting point:
This is a line and I want to write a char under it.
I want to write char = as many times as the length of this line. This is the result I would like to obtain:
This is a line and I want to write a char under it.
===================================================
How I can achieve this result with the most reduced key/commands combination?

I suspect there is something shorter but the following works with the cursor starting over the initial line:
Yp:s/./=/g
It duplicates the line (Yp), then replaces each character on the new line with an = (:s/./=/g)
Update
An even shorter version from Doktor OSwaldo
YpVr=
Duplicates the line, selects it and replaces all characters with =
And, if you are using this a lot, it'll be even shorter as a macro.

Handling piped io in Lua

I have searched a lot for this but can't find answers anywhere. I am trying to do something like the following:
cat somefile.txt | grep somepattern | ./script.lua
I haven't found a single resource on handling piped io in Lua, and can't figure out how to do it. Is there a good way, non hackish way to do tackle it? Preferably buffered for lower memory usage, but I'll settle for reading the whole file at once if thats the only alternative.
It would be really disappointing to have to write it into a temp file and then load it into the program.
Thanks in advance.

The standard lirary has an io.stdin and an io.stdout that you can use for input and output without havig to resort to temporary files. You can also use io.read isntead of someFile:read and it will read from stdin by default.
http://www.lua.org/pil/21.1.html
The buffering is responsibility of the operating system that is providing the pipes. You don't need to worry too much about it when writing your programs.
edit: Apparently when you mentioned buffering you were thinking about reading part of the file as opposed to loading the whole file into a string. io.read can take a numeric parameter to read up to a certain number of bytes from input, returning nil if no characters could be read.
local size = 2^13 -- good buffer size (8K)
while true do
local block = io.read(size)
if not block then break end
io.write(block)
end

Another (simpler) alternative is the io.lines() iterator but without a filename inside the parentheses. Example:
for line in io.lines() do
print(line)
end
UPDATE: To get a number of characters you can write a wrapper around this. Example:
function io.chars(n,filename)
n = n or 1 --default number of characters to read at a time
local chars = ''
local wrap, yield = coroutine.wrap, coroutine.yield
return wrap(function()
for line in io.lines(filename) do
line = chars .. line .. '\n'
while #line >= n do
yield(line:sub(1,n))
line = line:sub(n+1)
end
chars = line
end
if chars ~= '' then yield(chars) end
end)
end
for text in io.chars(30) do
io.write(text)
end

Strange behaviour or may be bug in sprintf() in c++

I just found strange thing about sprintf() (c++ library function).
have a look at these two solutions
Time limit Exceeded solution
Accepted Solution
the only difference between them is that, I used
sprintf(a,"%d%c",n,'\0');
in TLE solution,
in AC solution I replaced above sprintf() with
sprintf(a,"%d",n);
You can also observe that ACed solution took only 0.01s and 2.8MB memory
but TLE solution took around 11.8MB
check here
And one more thing program that gave TLE runs in 0s in IDEONE with extreme input data
so is it a bug in CODECHEF itself
Somebody please explain me is this a bug or some considerable unknown operation is happening here.
Thanks in advance.

First of all, the differences in the codes is not with sscanf, but with sprintf. A diff of the code explains:
--- ac.c 2014-05-24 14:31:18.074977661 -0500
+++ tle.c 2014-05-24 14:30:52.270650109 -0500
## -4,7 +4,7 ##
string mul(string m, int n){
char a[4];
-sprintf(a,"%d",n);
+sprintf(a,"%d%c",n,'\0');
int l1 = strlen(a);
//printf("len : %d\n",l1);
int l2 = m.length();
Second, by explicitly packing the string with %c and '\0', you are reducing the size of the integer that can be stored in a by 1. You need to check the return of sprintf. man printf:
Upon successful return, these functions return the number of characters printed (not including the trailing '\0' used to end output to strings).
In your case you are most likely writing beyond the end of your a[4] string and are experiencing undefined results. With a[4] you have space for only 999\0. When you explicitly add %c + '\0', you reduce that to 99\0\0. If your number exceeds 99, then sprintf will write beyond the end of the string because you are explicitly packing an additional '\0'. In the original case sprintf(a,"%d",n); 999 can be stored without issue relying on sprintf to append '\0' as a[3].
test with n = 9999, sprintf will still store the number in a, but will return 5 which exceeds the space available a[4], meaning that the behavior of your code is a dice-roll at that point.

MS Visual c++ 2008 char buffer longer than defined

I've got a char* buffer to hold a file that i read in binary mode. I know the length of the file is 70 bytes and this is the value being used to produce a buffer of the correct size. The problem is, there is 17 or 18 extra spaces in the array so some random characters are being added to the end. Could the be a unicode issue?
ulFLen stores the size of the file in bytes and has the correct value (70 for the file i'm testing on)
//Set up a buffer to store the file
pcfBuffer = new char[ulFLen];
//Reading the file
cout<<"Inputting File...";
fStream.seekg(0,ios::beg);
fStream.read(pcfBuffer,ulFLen);
if(!fStream.good()){cout<<"FAILED"<<endl;}else{cout<<"SUCCESS"<<endl;}

As it is a char array, you probably forgot a terminating NUL character.
The right way in this case would be:
//Set up a buffer to store the file and a terminating NUL character
pcfBuffer = new char[ulFLen+1];
//Reading the file
cout<<"Inputting File...";
fStream.seekg(0,ios::beg);
fStream.read(pcfBuffer,ulFLen);
if(!fStream.good()){cout<<"FAILED"<<endl;}else{cout<<"SUCCESS"<<endl;}
// Add NUL character
pcfBuffer[ulFLen] = 0;
But note that you only need a terminating NUL character for routines that depend on it, like string routines or when using printf using %s. If you use routines that use the fact that you know the length (70 characters), it will work without a NUL character, too.

Add following snippet after the data has been read, it will add the terminating Zero which is needed.
And btw it should be pcfBuffer = new char[ulFLen+1];
size_t read_count = fStream.gcount();
if(read_count<=ulFlen)
pcfBuffer[read_count]=0;
This will work no matter how much data has to be read (in your case gcount() should always return 70, so you could do following instead: pcfBuffer[70]=0;)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Read system call on text files - linux

I think fgets() is appropriate to read one line. #define MAX_LINE 4096 char line[MAX_LINE]; FILE *fp; fp = fopen("../.././test.txt", "r"); fgets(line, sizeof(line), fp); // read one line

Related

What's really happening in this Python encryption script?

Write char in line as many times as the length of previous line

Handling piped io in Lua

Strange behaviour or may be bug in sprintf() in c++

MS Visual c++ 2008 char buffer longer than defined

Categories

Resources