Write log in stdout (MPI)

Write log in stdout (MPI) - cygwin

I using MPI on Windows with Cygwin. I try to use critical section for write log some one, but what I would not do I always get a mixed log.
setbuf(stdout, 0);
int totalProcess;
MPI_Comm_size(MPI_COMM_WORLD, &totalProcess);
int processRank;
MPI_Comm_rank(MPI_COMM_WORLD, &processRank);
int rank = 0;
while (rank < totalProcess) {
if (processRank == rank) {
printf("-----%d-----\n", rank);
printf("%s", logBuffer);
printf("-----%d-----\n", rank);
//fflush(stdout);
}
rank ++;
MPI_Barrier(MPI_COMM_WORLD);
}
I run mpi at single machine (emulation mode):
mpirun -v -np 2 ./bin/main.out
I want dedicated space log per process, what I do wrong?
(When I wrote it I think it would not work correctly...)

This is the same problem asked about here; there is enough buffering going on at various different layers that there's no guarantee that the final output will reflect the order that the individual processes wrote, although in practice it can work for "small enough" outputs.
But if the goal is something like a logfile, MPI-IO provides mechanisms for you to write to a file in exactly such a way - MPI_File_write_ordered, which writes output in order of processors to the file. As an example:
#include <string.h>
#include <stdio.h>
#include "mpi.h"
int main(int argc, char** argv)
{
int rank, size;
MPI_File logfile;
char mylogbuffer[1024];
char line[128];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_File_open(MPI_COMM_WORLD, "logfile.txt", MPI_MODE_WRONLY | MPI_MODE_CREATE,
MPI_INFO_NULL, &logfile);
/* write initial message */
sprintf(mylogbuffer,"-----%d-----\n", rank);
sprintf(line,"Message from proc %d\n", rank);
for (int i=0; i<rank; i++)
strcat(mylogbuffer, line);
sprintf(line,"-----%d-----\n", rank);
strcat(mylogbuffer, line);
MPI_File_write_ordered(logfile, mylogbuffer, strlen(mylogbuffer), MPI_CHAR, MPI_STATUS_IGNORE);
/* write another message */
sprintf(mylogbuffer,"-----%d-----\nAll done\n-----%d-----\n", rank, rank);
MPI_File_write_ordered(logfile, mylogbuffer, strlen(mylogbuffer), MPI_CHAR, MPI_STATUS_IGNORE);
MPI_File_close(&logfile);
MPI_Finalize();
return 0;
}
Compiling and running gives:
$ mpicc -o log log.c -std=c99
$ mpirun -np 5 ./log
$ cat logfile.txt
-----0-----
-----0-----
-----1-----
Message from proc 1
-----1-----
-----2-----
Message from proc 2
Message from proc 2
-----2-----
-----3-----
Message from proc 3
Message from proc 3
Message from proc 3
-----3-----
-----4-----
Message from proc 4
Message from proc 4
Message from proc 4
Message from proc 4
-----4-----
-----0-----
All done
-----0-----
-----1-----
All done
-----1-----
-----2-----
All done
-----2-----
-----3-----
All done
-----3-----
-----4-----
All done
-----4-----

Related

iotop does not show any disk read statistics

I am running a test to check disk read statistics. Here is the code for the same:
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
int main(int argc, char* argv)
{
int count=1000, size;
char block[4096]="0";
int fd = open("file1.txt",O_RDONLY | O_SYNC);
//int pid = getpid();
system("pid=$(ps -a | grep 'a.out' | awk '{print $1}'); iotop -bokp $pid > test1c.out &");
system("echo 'Starts reading in 10'");
srand(time(NULL));
system("sleep 1");
while(count--){
int random = (rand()%16)*666;
printf("%d;",random);
lseek(fd, random, SEEK_SET);
size = read(fd,block,4096);
printf("Number of bytes read: %d\n", size);
fsync(fd);
//printf("Read 4kb from the file.\n");
}
system("sleep 1");
system("killall iotop");
}
As you can see, I am opening a large file, getting the PID of my a.out file, and passing it to iotop. After that I am randomly seeking to a 4kb block in the file and reading data.
If you run this code on your system, you'll realize that iotop output shows 0kb reads throughout, which makes no sense. Am I doing something wrong?

Clearing the caches solved the issue. I found the script for clearing caches on this page:
https://www.tecmint.com/clear-ram-memory-cache-buffer-and-swap-space-on-linux/
sync; echo 1 > /proc/sys/vm/drop_caches
sync; echo 2 > /proc/sys/vm/drop_caches
sync; echo 3 > /proc/sys/vm/drop_caches
Does the trick!

Only one thread created with MPI & no speed-up with OpenMP

I actually have two questions but it seems they may be connected:
1) I've tried to run basic MPI example:
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("I am %d from %d\n", rank, size);
MPI_Finalize();
return 0;
}
It has to output something like:
I am 0 from 2
I am 1 from 2
Although I'm getting the following:
$ mpicc mpi_hello.c -o hello
$ mpirun -np 4 ./hello
I am 0 from 1
I am 0 from 1
I am 0 from 1
I am 0 from 1
$ mpirun -np 2 ./hello
I am 0 from 1
I am 0 from 1
Is it somehow connected to thread definition in Linux? I'm running it on Ubuntu 16.04.
2) My OpenMP program:
#include <omp.h>
#include <math.h>
#include <time.h>
#include <iostream>
#include <stdio.h>
const int N = 10000;
int matrix[N][N];
int main()
{
#pragma omp parallel num_threads(2)
#pragma omp for
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
matrix[i][j] = 1+i;
clock_t t;
t = clock();
#pragma omp parallel num_threads(2)
#pragma omp for
for (int i = 0; i < N; i++)
{
matrix[i][i] = 0;
for (int j = 0; j< N; j++)
if (j != i)
matrix[i][i] += sin(cos(log(matrix[i][j] + matrix[j][i])));
}
t = clock() - t;
std::cout << "It took " << ((float)t)/CLOCKS_PER_SEC << " sec" << std::endl;
return 0;
}
It works correctly and uses 2 threads. However, it loads 2 processors (~100% CPU) and takes the same time (~34 seconds) as the similiar consequtive one (loads 1 processor ~50% CPU). I know that OpenMP may need some time to start, but how can it result in the same duration of programs?

Answering to the MPI part of the question.
Do you have MPICH installed? If is that so, try to compile and run like this:
$ mpicc.mpich mpi_hello.c -o hello
$ mpirun.mpich -np 4 ./hello
I am 0 from 4
I am 1 from 4
I am 2 from 4
I am 3 from 4
It should work.

Real parallelism in Linux shell

I am trying to have real parallelism on Linux shell, but I can't achieve it.
I have two programs. Allones, that only prints '1' character, and allzeros, that only prints 0 characters.
When I execute "./allones & ./allzeros &", I get big prints of '0's, and big prints of '1's, that mix in big chunks (e.g. 1111....111000...0000111...111000...000"). My processor has 8 cores.
However, when I executed my own program on a multi-core FPGA (with no OS), (If I distribute programs on different cores) I get something like "011000101000011010...".
How can I run it on Linux to get a result similar to what I get on a multi-core FPGA?

Sounds like you're experiencing libc's default line buffering:
Here's a test program spam.c:
#include <stdio.h>
int main(int argc, char** argv) {
while(1) {
printf("%s", argv[1]);
}
}
We can run it with:
$ ./spam 0 & ./spam 1 & sleep 1; killall spam
11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111(...)000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000(...)
On my systems, each block is exactly 1024 bytes long, strongly hinting at a buffering issue.
Here's the same code with a fflush to prevent buffering:
#include <stdio.h>
int main(int argc, char** argv) {
while(1) {
printf("%s", argv[1]);
fflush(stdout);
}
}
This is the new output:
100111001100110011001100110011001100110011100111001110011011001100110011001100110011001100110011001100110011001100110011001100011000110001100110001100100110011001100111001101100110011001100110011001100110000000000110010011000110011

Linux read() system call takes longer than my expectation ( serial port programming )

I am trying to read data sent from the tty/USB0 and print it out with byte format.
Question:
I expect to print out the data once the reading bytes reach 40 However, the time takes much longer than I expect. The read() system call hangs and I believe the data should already be larger than 40. The data will finally be printed out but it should not take so long. Did I make anything wrong in this programming ?
thanks
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <termios.h>
#include <stdio.h>
#define BAUDRATE B9600
#define MODEMDEVICE "/dev/ttyUSB0"
#define FALSE 0
#define TRUE 1
main()
{
int fd,c, res;
struct termios oldtio,newtio;
unsigned char buf[40];
fd = open(MODEMDEVICE, O_RDWR | O_NOCTTY );
if (fd <0) {perror(MODEMDEVICE); exit(-1); }
tcgetattr(fd,&oldtio);
bzero(&newtio, sizeof(newtio));
newtio.c_cflag = BAUDRATE | CS8 | CLOCAL | CREAD;
newtio.c_iflag = IGNPAR | ICRNL;
newtio.c_oflag = 1;
newtio.c_lflag = ICANON;
tcflush(fd, TCIOFLUSH);
tcsetattr(fd,TCSANOW,&newtio);
int i;
while (1) {
res = read(fd,buf,40);
if(res==40){
printf("res reaches 40 \n");
}
printf("res: %d\n",res);
for(i=0;i<res;++i){
printf("%02x ", buf[i]);
}
return;
}
}
--------------------raw mode code------------------------
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <termios.h>
#include <stdio.h>
#define BAUDRATE B9600
#define MODEMDEVICE "/dev/ttyUSB0"
#define _POSIX_SOURCE 1 /* POSIX compliant source */
#define FALSE 0
#define TRUE 1
volatile int STOP=FALSE;
main()
{
int fd,c, res;
struct termios oldtio,newtio;
unsigned char buf[255];
fd = open(MODEMDEVICE, O_RDWR | O_NOCTTY );
if (fd <0) {perror(MODEMDEVICE); exit(-1); }
tcgetattr(fd,&oldtio); /* save current port settings */
bzero(&newtio, sizeof(newtio));
newtio.c_cflag = BAUDRATE | CRTSCTS | CS8 | CLOCAL | CREAD;
newtio.c_iflag = IGNPAR;
newtio.c_oflag = 0;
/* set input mode (non-canonical, no echo,...) */
newtio.c_lflag = 0;
newtio.c_cc[VTIME] = 0;
newtio.c_cc[VMIN] = 40;
tcflush(fd, TCIFLUSH);
tcsetattr(fd,TCSANOW,&newtio);
int i;
while (STOP==FALSE) {
res = read(fd,buf,255);
for( i=0;i<res;++i){
printf("%02x \n", buf[i]);
}
}
tcsetattr(fd,TCSANOW,&oldtio);
}
It now can print out the data once buffer capacity is full ( which is 40 ).
1 question:
When I modified the printf
printf("%02x ", buf[i]); ( remove "\n" )
It will not print out when the buffer is full until more bytes are received. Why this happens?
Thanks

You need to switch the terminal to raw mode to disable line buffering.
Citing this answer:
The terms raw and cooked only apply to terminal drivers. "Cooked" is
called canonical and "raw" is called non-canonical mode.
The terminal driver is, by default a line-based system: characters are
buffered internally until a carriage return (Enter or Return) before
it is passed to the program - this is called "cooked". This allows
certain characters to be processed (see stty(1)), such as Cntl-D,
Cntl-S, Ctrl-U Backspace); essentially rudimentary line-editing. The
terminal driver "cooks" the characters before serving them up.
The terminal can be placed into "raw" mode where the characters are
not processed by the terminal driver, but are sent straight through
(it can be set that INTR and QUIT characters are still processed).
This allows programs like emacs and vi to use the entire screen more
easily.
You can read more about this in the "Canonical mode" section of the
termios(3) manpage.
See e.g. this or this how to achieve that programmatically (did not check the code, but it should be easy to find it).
Alternatively you could use e.g. strace or ltrace to check what stty -F /dev/ttyUSB0 raw does (or read the manual page where it is described).
EDIT>
Regarding printf without a newline -- fflush(stdout); immediately after it should help (another line-buffering is taking place).
You might consider reading this and maybe this.

Detach a linux process from pseudo-tty, but keep the tty running?

I want to debug a console linux application with 2 xterm windows: one window used for gdb and another used for the application (e.g. mc).
What I do now is run 'tty && sleep 1024d' in the second xterm window (this gives me its pseudo-tty name) and then run 'tty ' in gdb to redirect the program to that other xterm window. However, GDB warns that it cannot set a controlling terminal and certain minor functions don't work (e.g. handling window resizing), as 'sleep 1024d' is still running on that xterm window.
Any better way to do it (rather than launching the process from the shell and attaching to it from gdb)?

I have somewhat modified the program given in a related bug to store the filename somewhere
http://sourceware.org/bugzilla/show_bug.cgi?id=11403
here is an example using it:
$ xterm -e './disowntty ~/tty.tmp' & sleep 1 && gdb --tty $(cat ~/tty.tmp) /usr/bin/links
/* tty;exec disowntty */
#include <sys/ioctl.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <stdlib.h>
#include <signal.h>
static void
end (const char *msg)
{
perror (msg);
for (;;)
pause ();
}
int
main (int argc, const char *argv[])
{
FILE *tty_name_file;
const char *tty_filename;
if (argc <= 1)
return 1;
else
tty_filename = argv[1];
void (*orig) (int signo);
setbuf (stdout, NULL);
orig = signal (SIGHUP, SIG_IGN);
if (orig != SIG_DFL)
end ("signal (SIGHUP)");
/* Verify we are the sole owner of the tty. */
if (ioctl (STDIN_FILENO, TIOCSCTTY, 0) != 0)
end ("TIOCSCTTY");
printf("%s %s\n", tty_filename, ttyname(STDIN_FILENO));
tty_name_file = fopen(tty_filename, "w");
fprintf(tty_name_file, "%s\n", ttyname(STDIN_FILENO));
fclose(tty_name_file);
/* Disown the tty. */
if (ioctl (STDIN_FILENO, TIOCNOTTY) != 0)
end ("TIOCNOTTY");
end ("OK, disowned");
return 1;
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Write log in stdout (MPI) - cygwin

Related

iotop does not show any disk read statistics

Only one thread created with MPI & no speed-up with OpenMP

Real parallelism in Linux shell

Linux read() system call takes longer than my expectation ( serial port programming )

Detach a linux process from pseudo-tty, but keep the tty running?

Categories

Resources