Write unformatted (binary data) to stdout

Write unformatted (binary data) to stdout - io

I want to write unformatted (binary) data to STDOUT in a Fortran 90 program. I am using AIX Unix and unfortunately it won't let me open unit 6 as "unformatted". I thought I would try and open /dev/stdout instead under a different unit number, but /dev/stdout does not exist in AIX (although this method worked under Linux).
Basically, I want to pipe my programs output directly into another program, thus avoiding having an intermediate file, a bit like gzip -c does. Is there some other way I can achieve this, considering the two problems I have encountered above?

I would try to convert the data by TRANSFER() to a long character and print it with nonadvancing i/o. The problem will be your processors' limit for the record length. If it is too short you will end up having an unexpected end of record sign somewhere. Also your processor may not write the unprintable characters the way you would like.
i.e., something like
character(len=max_length) :: buffer
buffer = transfer(data,buffer)
write(*,'(a)',advance='no') trim(buffer)
The largest problem I see in the unprintable characters. See also A suprise with non-advancing I/O
---EDIT---
Another possibility, try to use file /proc/self/fd/1 or /dev/fd/1
test:
open(11,file='/proc/self/fd/1',access='stream',action='write')
write(11) 11
write(11) 1.1
close(11)
end

This is more of a comment/addition to #VladimirF than a new answer, but I can't add those yet. You can first inquire about the location of the preconnected I/O units and then open the unformatted connection:
character(1024) :: stdout
inquire(6, name = stdout)
open(11, file = stdout, access = 'stream', action = 'write')
This is probably the most convenient way, but it uses stream access, a Fortran 2003 feature. Without this, you can only use sequential access (which adds header data to each record) or direct access (which does not add headers but requires a fixed record length).

Related

Unexpected periodic, non-continuous output for OCaml program

Someone reports that given a stream of strings on the serial port which is pipelined to the OCaml program below, the output of the program is not continuous, but instead it appears in chunks (of a few tens of lines), as if buffered.
What can be the cause of the non-continuous output?
(The output buffer should be flushed after each new line due to the use of '%!'. So this shouldn't be the cause, right?)
let tp = ref 0
let get_next_entry ic =
try
let (ts, pred, v) = Scanf.fscanf ic " #%d %s#(%d)\n" (fun x y z -> (x,y,z)) in
Printf.printf "at timepoint %d (timestamp %d): %s(%d)\n%!" !tp ts pred v;
incr tp;
true
with End_of_file ->
false
let _ =
while get_next_entry stdin do
()
done
The OCaml version used is 4.05.

It is a threefold problem. From the least likely to the most likely.
The glitching output
It is all in the eye of the beholder, as how the program output will look like depends on the environment in which it is run, i.e., on a program that runs your program and renders this on a visual device. In other words, it involves a lot of variables that are beyond the context of this program.
With that said, let me explain what flush means for the printf function. The printf facility relies on buffered channels. And each channel is roughly a pair of a buffer and system-specific file descriptor. When someone (including printf) outputs to a channel, the information first goes into the buffer and remains there until the next portion of information overrides the buffer (i.e., there is no more space in the buffer) or until the flush function is called explicitly. Then the buffer is flushed, which means that the information in the buffer is transferred to the operating system (e.g., using the write system call or library function).
What happens afterward is system dependent. If the file descriptor was associated with a regular file, then you might expect that the information will be passed to it entirely(though the file system has its own hierarchy of caches, so there're caveats also). If the descriptor was associated with a Unix-style shell process through a pipe, then it will go into the pipe's buffer, extracted from it by the shell and printed using a terminal interface, usually fulfilled with some terminal emulator. By default shells are line-buffered, so the line should be printed as a whole unless the user of the shell changes its parameters somehow.
Basically, I hope you get the idea, it is not your program which is actually manipulating with the terminal and lighting up pixels on your monitors. Your program is just outputting data and some other program is receiving this data and drawing it on the screen. And this some other program (a terminal, or terminal emulator, e.g., minicom) is making this output glitchy, not your program. Your program is doing its best to be printed correctly - full line or nothing.
Your program is glitching
And it is. The in_channel is also buffered, so it will accumulate a few bytes before calling sprintf. Therefore, you can just read from the buffered channel and expect a realtime response to it. The most reliable way for you would be to use the Unix module and process the input using your own buffering.
The glitching input
Finally, the input program can also give you the information in chunks. This is especially true for serial interfaces, so make sure that you have correctly set up your terminal interface using the Unix.tcsetattr function. In particular, when your program is blocked on the input, the operating system may decide not to wake it up on each arrived character or line. This behavior is controlled by the terminal interface (see the Canonical and Non-canonical modes. If your input doesn't have newlines, then you shall use the non-canonical mode).
Finally, the device itself could be acting jittering, and if you have an oscilloscope nearby you can observe the signals it is sending. And make sure that you have configured your serial port as prescribed in the user manual of your device.

One possibility is that fscanf is waiting until it sees everything it's looking for.

How do I seek for holes and data in a sparse file in golang [duplicate]

I want to copy files from one place to another and the problem is I deal with a lot of sparse files.
Is there any (easy) way of copying sparse files without becoming huge at the destination?
My basic code:
out, err := os.Create(bricks[0] + "/" + fileName)
in, err := os.Open(event.Name)
io.Copy(out, in)

Some background theory
Note that io.Copy() pipes raw bytes – which is sort of understandable once you consider that it pipes data from an io.Reader to an io.Writer which provide Read([]byte) and Write([]byte), correspondingly.
As such, io.Copy() is able to deal with absolutely any source providing
bytes and absolutely any sink consuming them.
On the other hand, the location of the holes in a file is a "side-channel" information which "classic" syscalls such as read(2) hide from their users.
io.Copy() is not able to convey such side-channel information in any way.
IOW, initially, file sparseness was an idea to just have efficient storage of the data behind the user's back.
So, no, there's no way io.Copy() could deal with sparse files in itself.
What to do about it
You'd need to go one level deeper and implement all this using the syscall package and some manual tinkering.
To work with holes, you should use the SEEK_HOLE and SEEK_DATA special values for the lseek(2) syscall which are, while formally non-standard, are supported by all major platforms.
Unfortunately, the support for those "whence" positions is not present
neither in the stock syscall package (as of Go 1.8.1)
nor in the golang.org/x/sys tree.
But fear not, there are two easy steps:
First, the stock syscall.Seek() is actually mapped to lseek(2)
on the relevant platforms.
Next, you'd need to figure out the correct values for SEEK_HOLE and
SEEK_DATA for the platforms you need to support.
Note that they are free to be different between different platforms!
Say, on my Linux system I can do simple
$ grep -E 'SEEK_(HOLE|DATA)' </usr/include/unistd.h
# define SEEK_DATA 3 /* Seek to next data. */
# define SEEK_HOLE 4 /* Seek to next hole. */
…to figure out the values for these symbols.
Now, say, you create a Linux-specific file in your package
containing something like
// +build linux
const (
SEEK_DATA = 3
SEEK_HOLE = 4
)
and then use these values with the syscall.Seek().
The file descriptor to pass to syscall.Seek() and friends
can be obtained from an opened file using the Fd() method
of os.File values.
The pattern to use when reading is to detect regions containing data, and read the data from them – see this for one example.
Note that this deals with reading sparse files; but if you'd want to actually transfer them as sparse – that is, with keeping this property of them, – the situation is more complicated: it appears to be even less portable, so some research and experimentation is due.
On Linux, it appears you could try to use fallocate(2) with
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE to try to punch a hole at the
end of the file you're writing to; if that legitimately fails
(with syscall.EOPNOTSUPP), you just shovel as many zeroed blocks to the destination file as covered by the hole you're reading – in the hope
the OS will do the right thing and will convert them to a hole by itself.
Note that some filesystems do not support holes at all – as a concept.
One example is the filesystems in the FAT family.
What I'm leading you to is that inability of creating a sparse file might
actually be a property of the target filesystem in your case.
You might find Go issue #13548 "archive/tar: add support for writing tar containing sparse files" to be of interest.
One more note: you might also consider checking whether the destination directory to copy a source file resides in the same filesystem as the source file, and if this holds true, use the syscall.Rename() (on POSIX systems)
or os.Rename() to just move the file across different directories w/o
actually copying its data.

You don't need to resort to syscalls.
package main
import "os"
func main() {
f, _ := os.Create("/tmp/sparse.dat")
f.Write([]byte("start"))
f.Seek(1024*1024*10, 0)
f.Write([]byte("end"))
}
Then you'll see:
$ ls -l /tmp/sparse.dat
-rw-rw-r-- 1 soren soren 10485763 Jun 25 14:29 /tmp/sparse.dat
$ du /tmp/sparse.dat
8 /tmp/sparse.dat
It's true you can't use io.Copy as is. Instead you need to implement an alternative to io.Copy which reads a chunk from the src, checks if it's all '\0'. If it is, just dst.Seek(len(chunk), os.SEEK_CUR) to skip past that part in dst. That particular implementation is left as an exercise to the reader :)

How does the bittorrent assemble the missing pieces?

I use BitTorrent and sometimes encounter files that do not have seed(missing pieces).
At that time, we sometimes force the file transfer to end and try to open the incompleted files (for example, an image file).
If we are lucky, may be able to see the downloaded image even if some parts are lost.
I would like to artificially reproduce this situation, and here's how I tried:
1) spliting a bmp image file of about 1 megabyte into 16 kilobytes by the Linux split command,
2) and then make just one of the divided files 0 kilobytes.
3) after that, rejoin all the files with the cat command.
However, in this case, unlike the torrent's "lost pieces" situation, the file becomes completely corrupt and can not be read.
Theoretically it does not seem like anything special, but what's wrong? And how can I achieve what I want?
I would appreciate your help.

Use dd:
dd if=/dev/zero of=image.jpg bs=1 conv=notrunc seek=X count=Y
being X the offset in the file you want to erase and Y the number of bytes.
About the corruption, it depends on the type of file, the piece you are losing and the program you are using to read it.
For instance, JPG files use a variable bit-length encoding, meaning that just losing one bit may corrupt all the file from that point on. But just for that, there can be resyncronization points where the bitstream is reset, so from that point on, the file will look ok. But those resync points are optional when writing the file, and not every reader honor them in case of corruption...
And anyway, losing part of the headers will make the file totally unreadable.

Windows console application with gets() ROP exploit

I'm trying (for learning purposes) to take advantage of gets() function vulnerability using return-oriented programming (ROP) technique. The target program is a Windows console application that in some point asks for some input, and then uses gets() to store the input in the local 80 characters long array.
I created a file that contains 80 'a' characters in the beginning + some extra characters + 0x5da06c48 address for overwriting the old EIP pointer.
I'm opening the file in text editor and copy-pasting the content into the console as input. I've used IDA Pro (or OllyDbg) to set a breakpoint right after the return from the gets() function and noticed that the address was corrupted - it was set to 0x3fa03f48 (two 3f substitutions).
I've tried other addresses as well - part of them works well, but most of the times the address is being corrupted (sometimes characters missing or substituted, sometimes truncated).
How to get over this problem? Any suggestion will be highly appreciated!

Copy-Pasting binary data is hit-and-miss. Have you tried feeding the input into your test program directly from the file using input redirection?

First of all keep track of the Endianness of your platform. If you think your bits are in the right order but you are still getting malformed input, it might be that your shell/text editor isn't binary safe. You are better off writing an exploit for this flaw in a scripting language such as Python, using the Subprocess library which allows you to write data directly to an arbitrary process's stdin pipe.

Doing file operations with 64-bit addresses in C + MinGW32

I'm trying to read in a 24 GB XML file in C, but it won't work. I'm printing out the current position using ftell() as I read it in, but once it gets to a big enough number, it goes back to a small number and starts over, never even getting 20% through the file. I assume this is a problem with the range of the variable that's used to store the position (long), which can go up to about 4,000,000,000 according to http://msdn.microsoft.com/en-us/library/s3f49ktz(VS.80).aspx, while my file is 25,000,000,000 bytes in size. A long long should work, but how would I change what my compiler(Cygwin/mingw32) uses or get it to have fopen64?

The ftell() function typically returns an unsigned long, which only goes up to 232 bytes (4 GB) on 32-bit systems. So you can't get the file offset for a 24 GB file to fit into a 32-bit long.
You may have the ftell64() function available, or the standard fgetpos() function may return a larger offset to you.

You might try using the OS provided file functions CreateFile and ReadFile. According to the File Pointers topic, the position is stored as a 64bit value.

Unless you can use a 64-bit method as suggested by Loadmaster, I think you will have to break the file up.
This resource seems to suggest it is possible using _telli64(). I can't test this though, as I don't use mingw.

I don't know of any way to do this in one file, a bit of a hack but if splitting the file up properly isn't a real option, you could write a few functions that temp split the file, one that uses ftell() to move through the file and swaps ftell() to a new file when its reaching the split point, then another that stitches the files back together before exiting. An absolutely botched up approach, but if no better solution comes to light it could be a way to get the job done.

I found the answer. Instead of using fopen, fseek, fread, fwrite... I'm using _open, lseeki64, read, write. And I am able to write and seek in > 4GB files.
Edit: It seems the latter functions are about 6x slower than the former ones. I'll give the bounty anyone who can explain that.
Edit: Oh, I learned here that read() and friends are unbuffered. What is the difference between read() and fread()?

Even if the ftell() in the Microsoft C library returns a 32-bit value and thus obviously will return bogus values once you reach 2 GB, just reading the file should still work fine. Or do you need to seek around in the file, too? For that you need _ftelli64() and _fseeki64().
Note that unlike some Unix systems, you don't need any special flag when opening the file to indicate that it is in some "64-bit mode". The underlying Win32 API handles large files just fine.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string