Only one thread created with MPI & no speed-up with OpenMP - linux

I actually have two questions but it seems they may be connected:
1) I've tried to run basic MPI example:
#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
int rank, size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("I am %d from %d\n", rank, size);
MPI_Finalize();
return 0;
}
It has to output something like:
I am 0 from 2
I am 1 from 2
Although I'm getting the following:
$ mpicc mpi_hello.c -o hello
$ mpirun -np 4 ./hello
I am 0 from 1
I am 0 from 1
I am 0 from 1
I am 0 from 1
$ mpirun -np 2 ./hello
I am 0 from 1
I am 0 from 1
Is it somehow connected to thread definition in Linux? I'm running it on Ubuntu 16.04.
2) My OpenMP program:
#include <omp.h>
#include <math.h>
#include <time.h>
#include <iostream>
#include <stdio.h>
const int N = 10000;
int matrix[N][N];
int main()
{
#pragma omp parallel num_threads(2)
#pragma omp for
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
matrix[i][j] = 1+i;
clock_t t;
t = clock();
#pragma omp parallel num_threads(2)
#pragma omp for
for (int i = 0; i < N; i++)
{
matrix[i][i] = 0;
for (int j = 0; j< N; j++)
if (j != i)
matrix[i][i] += sin(cos(log(matrix[i][j] + matrix[j][i])));
}
t = clock() - t;
std::cout << "It took " << ((float)t)/CLOCKS_PER_SEC << " sec" << std::endl;
return 0;
}
It works correctly and uses 2 threads. However, it loads 2 processors (~100% CPU) and takes the same time (~34 seconds) as the similiar consequtive one (loads 1 processor ~50% CPU). I know that OpenMP may need some time to start, but how can it result in the same duration of programs?

Answering to the MPI part of the question.
Do you have MPICH installed? If is that so, try to compile and run like this:
$ mpicc.mpich mpi_hello.c -o hello
$ mpirun.mpich -np 4 ./hello
I am 0 from 4
I am 1 from 4
I am 2 from 4
I am 3 from 4
It should work.

Related

Will fallocate on mmap file reduce the memory consumption?

I am trying to mmap a 64M file into memory, then read & write into it. But sometimes a certain range within this file will no longer be used, so I called fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 16 << 20, 16 << 20); to release a 16M range inside it.
However, I find that the memory consumption seems not changed(both from free -m and from cat /proc/meminfo).
I understand that fallocate will dig some holes inside the file and return such range back to the filesystem. But I'm not sure whether it will reduce memory consumption for mmaped files.
If it does not reduce the memory consumption, where does such range go? Can another process get the underlying memory previously allocated to it?
The big.file is a normal 64M file instead of a sparse file
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <linux/falloc.h>
#include <stdint.h>
int main(int argc, char *argv[])
{
uint8_t *addr;
int fd = open("home/big.file", O_RDWR | O_CREAT, 0777);
if (fd < 0)
return -1;
addr = (uint8_t *)mmap(NULL, MMAP_SIZE , PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if (addr < 0)
return -1;
printf("data[0x%x] = %d\n", offset, addr[offset]);
getchar();
fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, 16 << 20, 16 << 20);
getchar();
close(fd);
return 0;
}

syscall lseek() is slower in ubuntu 2.6.32 vs 2.6.24

I am running the following code on ubuntu 14 and 18. It is 6X slower on 18 on same hardware. Is there something I am doing wrong?
main(int argc, char *argv[])
{
int fd;
off_t m;
time_t start, ed;
int i, k;
if (argc<2) exit(0);
fd = open(argv[1],O_RDWR|O_CREAT);
if (fd<0) {
printf("cannot open file %s\n", argv[1]);
exit(0);
}
start = time(0L);
for(k=0; k<100; ++k) {
for(i=0;i<500000;++i) {
m = lseek(fd, 0, 0);
if (m== -1) {
printf("lseek failed\n");
exit(0);
}
}
}
ed = time(0L);
printf("Time: %ld\n",ed-start);
}
On ubuntu 14 this takes 4 seconds
On ubuntu 18 it takes 24 seconds
Hardware is same
zx485 was right. the Spectre prevention in kernel was slowing it down by a factor of 6.
Disabling the protection with the following on redhat 7 changed it to normal.
# echo 0 > /sys/kernel/debug/x86/pti_enabled
# echo 0 > /sys/kernel/debug/x86/retp_enabled
# echo 0 > /sys/kernel/debug/x86/ibrs_enabled

Execute a command in another terminal via /dev/pts

I have a terminal that uses STDIN 3 (/proc/xxxx/fd/0 -> /dev/pts/3)
So if (in another terminal) I do:
echo 'do_something_command' > /dev/pts/3
The command is shown in my first (pts/3) terminal, but the command is not executed. And if (in this terminal pts/3) I'm in a program waiting for some data from stdin, the data is written on screen but the program does not capture it from stdin.
What I want to do is execute the command "do_something_command" and not only show it.
Can someone explain this behavior to me? How do I achieve my intention?
I completely get what you are asking. You can achieve this by writing and executing a small piece of code in C yourself. This should give you some idea.
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/ioctl.h>
#include <string.h>
#include <unistd.h>
void print_help(char *prog_name) {
printf("Usage: %s [-n] DEVNAME COMMAND\n", prog_name);
printf("Usage: '-n' is an optional argument if you want to push a new line at the end of the text\n");
printf("Usage: Will require 'sudo' to run if the executable is not setuid root\n");
exit(1);
}
int main (int argc, char *argv[]) {
char *cmd, *nl = "\n";
int i, fd;
int devno, commandno, newline;
int mem_len;
devno = 1; commandno = 2; newline = 0;
if (argc < 3) {
print_help(argv[0]);
}
if (argc > 3 && argv[1][0] == '-' && argv[1][1] == 'n') {
devno = 2; commandno = 3; newline=1;
} else if (argc > 3 && argv[1][0] == '-' && argv[1][1] != 'n') {
printf("Invalid Option\n");
print_help(argv[0]);
}
fd = open(argv[devno],O_RDWR);
if(fd == -1) {
perror("open DEVICE");
exit(1);
}
mem_len = 0;
for (i = commandno; i < argc; i++) {
mem_len += strlen(argv[i]) + 2;
if (i > commandno) {
cmd = (char *)realloc((void *)cmd, mem_len);
} else { // i == commandno
cmd = (char *)malloc(mem_len);
}
strcat(cmd, argv[i]);
strcat(cmd, " ");
}
if (newline == 0)
usleep(225000);
for (i = 0; cmd[i]; i++)
ioctl (fd, TIOCSTI, cmd+i);
if (newline == 1)
ioctl (fd, TIOCSTI, nl);
close(fd);
free((void *)cmd);
exit (0);
}
Compile and execute it with sudo permissions. For example, if you want to execute a command on /dev/pts/3, then simply do a sudo ./a.out -n /dev/pts/3 whoami, runs a whoami on /dev/pts/3.
This code was completely taken from this page.
You seem to use the wrong quotes around the command.
Either remove the quotes and the echo command, or use echo and back-ticks (`).
Try:
echo `date` > /dev/pts/3
or just
date > /dev/pts/3
Note that whatever runs on /dev/pts/3 wouldn't be able to read what pops up "from behind".

Write log in stdout (MPI)

I using MPI on Windows with Cygwin. I try to use critical section for write log some one, but what I would not do I always get a mixed log.
setbuf(stdout, 0);
int totalProcess;
MPI_Comm_size(MPI_COMM_WORLD, &totalProcess);
int processRank;
MPI_Comm_rank(MPI_COMM_WORLD, &processRank);
int rank = 0;
while (rank < totalProcess) {
if (processRank == rank) {
printf("-----%d-----\n", rank);
printf("%s", logBuffer);
printf("-----%d-----\n", rank);
//fflush(stdout);
}
rank ++;
MPI_Barrier(MPI_COMM_WORLD);
}
I run mpi at single machine (emulation mode):
mpirun -v -np 2 ./bin/main.out
I want dedicated space log per process, what I do wrong?
(When I wrote it I think it would not work correctly...)
This is the same problem asked about here; there is enough buffering going on at various different layers that there's no guarantee that the final output will reflect the order that the individual processes wrote, although in practice it can work for "small enough" outputs.
But if the goal is something like a logfile, MPI-IO provides mechanisms for you to write to a file in exactly such a way - MPI_File_write_ordered, which writes output in order of processors to the file. As an example:
#include <string.h>
#include <stdio.h>
#include "mpi.h"
int main(int argc, char** argv)
{
int rank, size;
MPI_File logfile;
char mylogbuffer[1024];
char line[128];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_File_open(MPI_COMM_WORLD, "logfile.txt", MPI_MODE_WRONLY | MPI_MODE_CREATE,
MPI_INFO_NULL, &logfile);
/* write initial message */
sprintf(mylogbuffer,"-----%d-----\n", rank);
sprintf(line,"Message from proc %d\n", rank);
for (int i=0; i<rank; i++)
strcat(mylogbuffer, line);
sprintf(line,"-----%d-----\n", rank);
strcat(mylogbuffer, line);
MPI_File_write_ordered(logfile, mylogbuffer, strlen(mylogbuffer), MPI_CHAR, MPI_STATUS_IGNORE);
/* write another message */
sprintf(mylogbuffer,"-----%d-----\nAll done\n-----%d-----\n", rank, rank);
MPI_File_write_ordered(logfile, mylogbuffer, strlen(mylogbuffer), MPI_CHAR, MPI_STATUS_IGNORE);
MPI_File_close(&logfile);
MPI_Finalize();
return 0;
}
Compiling and running gives:
$ mpicc -o log log.c -std=c99
$ mpirun -np 5 ./log
$ cat logfile.txt
-----0-----
-----0-----
-----1-----
Message from proc 1
-----1-----
-----2-----
Message from proc 2
Message from proc 2
-----2-----
-----3-----
Message from proc 3
Message from proc 3
Message from proc 3
-----3-----
-----4-----
Message from proc 4
Message from proc 4
Message from proc 4
Message from proc 4
-----4-----
-----0-----
All done
-----0-----
-----1-----
All done
-----1-----
-----2-----
All done
-----2-----
-----3-----
All done
-----3-----
-----4-----
All done
-----4-----

Detach a linux process from pseudo-tty, but keep the tty running?

I want to debug a console linux application with 2 xterm windows: one window used for gdb and another used for the application (e.g. mc).
What I do now is run 'tty && sleep 1024d' in the second xterm window (this gives me its pseudo-tty name) and then run 'tty ' in gdb to redirect the program to that other xterm window. However, GDB warns that it cannot set a controlling terminal and certain minor functions don't work (e.g. handling window resizing), as 'sleep 1024d' is still running on that xterm window.
Any better way to do it (rather than launching the process from the shell and attaching to it from gdb)?
I have somewhat modified the program given in a related bug to store the filename somewhere
http://sourceware.org/bugzilla/show_bug.cgi?id=11403
here is an example using it:
$ xterm -e './disowntty ~/tty.tmp' & sleep 1 && gdb --tty $(cat ~/tty.tmp) /usr/bin/links
/* tty;exec disowntty */
#include <sys/ioctl.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <stdlib.h>
#include <signal.h>
static void
end (const char *msg)
{
perror (msg);
for (;;)
pause ();
}
int
main (int argc, const char *argv[])
{
FILE *tty_name_file;
const char *tty_filename;
if (argc <= 1)
return 1;
else
tty_filename = argv[1];
void (*orig) (int signo);
setbuf (stdout, NULL);
orig = signal (SIGHUP, SIG_IGN);
if (orig != SIG_DFL)
end ("signal (SIGHUP)");
/* Verify we are the sole owner of the tty. */
if (ioctl (STDIN_FILENO, TIOCSCTTY, 0) != 0)
end ("TIOCSCTTY");
printf("%s %s\n", tty_filename, ttyname(STDIN_FILENO));
tty_name_file = fopen(tty_filename, "w");
fprintf(tty_name_file, "%s\n", ttyname(STDIN_FILENO));
fclose(tty_name_file);
/* Disown the tty. */
if (ioctl (STDIN_FILENO, TIOCNOTTY) != 0)
end ("TIOCNOTTY");
end ("OK, disowned");
return 1;
}

Resources