Writing number with a lot of digits into file using MPIR - ubuntu-14.04

I'm using MPIR/Ubuntu 14.04.
I have big integer that have a lot of digits, like 2^1920, and don't know how to write it into file *.txt
FILE *result;
result=fopen("Number.txt","a+");
gmp_fprintf(result,"%d",xyz);
fclose(result);
didn't work.
Are there some other options I can use?

The gmp_printf() function (thus subsequently gmp_fprintf() as well) requires special format specifier for mpz_t object (which I guess xyz is). You should use %Zd instead of plain %d, that does not work. To be pedantic it's undefined behavior to use inadequate f.s. in C.
If you don't need "full-featured" formatted output, then you might also take a look at mpz_out_str(), that allows to specify base (like 2 or 10):
size_t mpz_out_str (FILE *stream, int base, const mpz_t op)
Alternatively you might use mpz_out_raw() function that just "dumps" the whole number as it is stored in binary format:
size_t mpz_out_raw (FILE *stream, const mpz_t op)
Output op on stdio stream stream, in raw binary format. The integer is
written in a portable format, with 4 bytes of size information, and
that many bytes of limbs. Both the size and the limbs are written in
decreasing significance order (i.e., in big-endian).

Related

What is the different between a null terminated string and a string that is not terminated by null in x86 assembly language

I'm currently learning assembly programming by following Kip Irvine's "assembly language x86 programming" book.
In the book, the author states
The most common type of string ends with a null byte (containing 0).
Called a null-terminated string
In the subsequent section of the book, the author had a string example without the null byte
greeting1 \
BYTE "Welcome to the Encryption Demo program "
So I was just wondering, what is the different between a null terminated string and a string that is not terminated by null in x86 assembly language? Are they interchangeable? Or they are not equivalent of each other?
There's nothing specific to asm here; it's the same issue in C. It's all about how you store strings in memory and keep track of where they end.
what is the different between a null terminated string and a string that is not terminated by null?
A null-terminated string has a 0 byte after it, so you can find the end with strlen. (e.g. with a slow repne scasb). This makes is usable as an implicit-length string, like C uses.
NASM Assembly - what is the ", 0" after this variable for? explains the NASM syntax for creating one in static storage with db. db usage in nasm, try to store and print string shows what happens when you forget the 0 terminator.
Are they interchangeable?
If you know the length of a null-terminated string, you can pass pointer+length to a function that wants an explicit-length string. That function will never look at the 0 byte, because you will pass a length that doesn't include the 0 byte. It's not part of the string data proper.
But if you have a string without a terminator, you can't pass it to a function or system-call that wants a null-terminated string. (If the memory is writeable, you could store a 0 after the string to make it into a null-terminated string.)
In Linux, many system calls take strings as C-style implicit-length null-terminated strings. (i.e. just a char* without passing a length).
For example, open(2) takes a string for the path: int open(const char *pathname, int flags); You must pass a null-terminated string to the system call. It's impossible to create a file with a name that includes a '\0' in Linux (same as most other Unix systems), because all the system calls for dealing with files use null-terminated strings.
OTOH, write(2) takes a memory buffer which isn't necessarily a string. It has the signature ssize_t write(int fd, const void *buf, size_t count);. It doesn't care if there's a 0 at buf+count because it only looks at the bytes from buf to buf+count-1.
You can pass a string to write(). It doesn't care. It's basically just a memcpy into the kernel's pagecache (or into a pipe buffer or whatever for non-regular files). But like I said, you can't pass an arbitrary non-terminated buffer as the path arg to open().
Or they are not equivalent of each other?
Implicit-length and explicit-length are the two major ways of keeping track of string data/constants in memory and passing them around. They solve the same problem, but in opposite ways.
Long implicit-length strings are a bad choice if you sometimes need to find their length before walking through them. Looping through a string is a lot slower than just reading an integer. Finding the length of an implicit-length string is O(n), but an explicit-length string is of course O(1) time to find the length. (It's already known!). At least the length in bytes is known, but the length in Unicode characters might not be known, if it's in a variable-length encoding like UTF-8 or UTF-16.
How a string is terminated has nothing to do with assembly. Historically, '$', CRLF [10,13] or [0A,0D] and those are sometimes reversed as with GEDIT under Linux. Conventions are determined by how your system is going to interact with itself or other systems. As an example, my applications are strictly oriented around ASCII, therefore, if I would read a file that's UTF-8 or 16 my application would fail miserably. NULLs or any kind of termination could be optional.
Consider this example
Title: db 'Proto_Sys 1.00.0', 0, 0xc4, 0xdc, 0xdf, 'CPUID', 0, 'A20', 0
db 'AXCXDXBXSPBPSIDIESDSCSSSFSGS'
Err00: db 'Retry [Y/N]', 0
I've implemented a routine where if CX=0 then it's assumed a NULL terminated string is to be displayed, otherwise only one character is read and repeated CX times. That is why 0xc4 0xdc 0xdf are not terminated. Similarly, there isn't a terminator before 'Retry [Y/N]' because the way my algo is designed, there doesn't need to be.
The only thing you need concern yourself with is what is the source of your data or does your application need to be compatible with something else. Then you just simply implement whatever you need to make it work.

Is there a popular Linux/Unix format for binary diffs?

I'm going to be producing binary deltas of multi-gigabyte files.
Naively, I'm intending to use the following format:
struct chunk {
uint64_t offset;
uint64_t length;
uint8_t data[];
};
struct delta {
uint8_t file_a_checksum[32]; // These are calculated while the
uint8_t file_b_checksum[32]; // gzipped chunks are being written
uint8_t chunks_checksum[32]; // at the 96 octet offset.
uint8_t gzipped_chunks[];
};
I only need to apply these deltas to the original file_a that was used to generate a delta.
Is there anything I'm missing here?
Is there an existing binary delta format which has the features I'm looking for, yet isn't too much more complex?
For arbitrary binaries, of course it makes sense to use a general purpose tool:
xdelta
bspatch
rdiff-backup (rsync)
git diff
(Yes, git diff works on files that aren't under version control. git diff --binary --no-index dir1/file.bin dir2/file.bin )
I would usually recommend a generic tool before writing your own, even if there is a little overhead. While none of the tools in the above list produce binary diffs in a format quite as ubiquitous as the "unified diff" format, they are all "close to" standard tools.
There is one other fairly standardised format that might be relevant for you: the humble hexdump. The xxd tool dumps binaries into a fairly standard text format by default:
0000050: 2020 2020 5858 4428 3129 0a0a 0a0a 4e08 XXD(1)....N.
That is, offset followed by a series of byte values. The exact format is flexible and configurable with command-line switches.
However, xxd can also be used in reverse mode to write those bytes instead of dumping them.
So if you have a file called patch.hexdump:
00000aa: bbccdd
Then running xxd -r patch.hexdump my.binary will modify the file my.binary to modify three bytes at offset 0xaa.
Finally, I should also mention that dd can seek into a binary file and read/write a given number of bytes, so I guess you could use "shell script with dd commands" as your patch format.

(open + write) vs. (fopen + fwrite) to kernel /proc/

I have a very strange bug. If I do:
int fd = open("/proc/...", O_WRONLY);
write(fd, argv[1], strlen(argv[1]));
close(fd);
everything is working including for a very long string which length > 1024.
If I do:
FILE *fd = fopen("/proc/...", "wb");
fwrite(argv[1], 1, strlen(argv[1]), fd);
fclose(fd);
the string is cut around 1024 characters.
I'm running an ARM embedded device with a 3.4 kernel. I have debugged in the kernel and I see that the string is already cut when I reach the very early function vfs_write (I spotted this function with a WARN_ON instruction to get the stack).
The problem is the same with fputs vs. puts.
I can use fwrite for a very long string (>1024) if I write to a standard rootfs file. So the problem is really linked how the kernel handles /proc.
Any idea what's going on?
Probably the problem is with buffers.
The issue is that special files, such as those at /proc are, well..., special, they are not always simple stream of bytes, and have to be written to (or read from) with specific sizes and or offsets. You do not say what file you are writing to, so it is impossible to be sure.
Then, the call to fwrite() assumes that the output fd is a simple stream of bytes, so it does smart fancy things, such as buffering and splicing and copying the given data. In a regular file it will just work, but in a special file, funny things may happen.
Just to be sure, try to run strace with both versions of your program and compare the outputs. If you wish, post them for additional comments.

Fastest Way to Copy Buffer or C-String into a std::string

Let's say I have char buffer[64] and uint32_t length, and buffer might or might not be null terminated. If it is null terminated, the rest of the buffer will be filled with nulls. the length variable holds the length of buffer.
I would like to copy it into a std::string without extra nulls at the end of the string object.
Originally, I tried:
std::string s(buffer, length);
which copies the extra nulls when buffer is filled with nulls at the end.
I can think of:
char buffer2[128];
strncpy(buffer2, buffer, 128);
const std::sring s(buffer2);
But it is kind of wasteful because it copies twice.
I wonder whether there is a faster way. I know I need to benchmark to tell exactly which way is faster...but I would like to look at some other solutions and then benchmark...
Thanks in advance.
If you can, I'd simply add a '\0' at the end of your buffer and
then use the c-string version of the string constructor.
If you can't, you need to determine if there's a '\0' in your
buffer and while you're at it, you might as well count the number of
characters you encounter before the '\0'. You can then use that
count with the (buffer,length) form of the string constructor:
#include <string.h>
//...
std::string s(buffer, strnlen(buffer, length));
If you can't do 1. and don't want to iterate over buffer twice (once to determine the length, once in the string constructor), you could do:
char last_char = buffer[length-1];
buffer[length-1] = '\0';
std::string s(buffer); //the c-string form since we're sure there's a '\0' in the buffer now
if(last_char!='\0' && s.length()==(length-1)) {
//With good buffer sizes, this might not need to cause reallocation of the strings internal buffer
s.push_back(last_char);
}
I leave the benchmarking to you. It is possible that the c-string version of the constructor uses something like strlen internally anyway to avoid reallocations so there might not be much to gain from using the c-string version of the string constructor.
You can use all the canonical way to do this.
Faster way is surely implement by yourself smartpointer (or use anything already done as std::shared_ptr
).
Each smartpointer (sp) point to first char of array and contain.
Each time you do array.copy you don't do a true copy, but you simply add a reference do that array.
So, a "copy" take O(1) instead of O(N)

Safe alternative to _vscwprintf

In MSVC, a number of string functions offer the original, a safe version, and a strsafe version. For example, sprintf, sprintf_s, and StringCchPrintf are all equivalents, increasing in safety (by some metric).
Now, I have a bit of code that does:
int bufsize = _vscwprintf(fmt, args) + 1;
std::vector<wchar_t> buffer(bufsize);
int len = _vsnwprintf_s(&buffer[0], bufsize, bufsize-1, fmt, args);
To allocate a buffer of the correct size.
While looking through the strsafe functions, I found an alternative for _vsnwprintf_s, but none for _vscwprintf. A check of Google didn't seem to return anything.
Is there a strsafe way of writing that bit of code, or alternate functions for both that I'm missing, or is mixing an original and strsafe function acceptable? (no safety warning are given about the current way, on /w4 with PREfast all rules)
_vscwprintf() merely computes the size of the wchar_t[] array you need to safely format the string, it doesn't actually write anything to a buffer. Accordingly you don't need and there is no safe version of the function.

Resources