OpenMPI runtime error : Hello World - openmpi

I'm able to successfully compile my code when I execute the make command. However, when I run the code as:
mpirun -np 4 test
The error generated is:
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
Process name: [[63067,1],2]
Exit code: 1
--------------------------------------------------------------------------
I have no multiple mpi installations so I don't expect there to be a problem.
I've been having trouble with my Hello World OpenMPI program. My main file is :
#include <iostream>
#include "mpi.h"
using namespace std;
int main(int argc, const char * argv[]) {
MPI_Init(NULL, NULL);
int size, rank;
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
cout << "The number of spawned processes are " << size << "And this is the process " << rank;
MPI_Finalize();
return 0;
}
My makefile is:
# Compiler
CXX = mpic++
# Compiler flags
CFLAGS = -Wall -lm
# Header and Library Paths
INCLUDE = -I/usr/local/include -I/usr/local/lib -I..
LIBRARY_INCLUDE = -L/usr/local/lib
LIBRARIES = -l mpi
# the build target executable
TARGET = test
all: $(TARGET)
$(TARGET): main.cpp
$(CXX) $(CFLAGS) -o $(TARGET) main.cpp $(INCLUDE) $(LIBRARY_INCLUDE) $(LIBRARIES)
clean:
rm $(TARGET)
The output of: mpic++ --version is:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.1.0 (clang-802.0.42)
Target: x86_64-apple-darwin16.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
And that for mpirun --version is:
mpirun (Open MPI) 2.1.1
Report bugs to http://www.open-mpi.org/community/help/
What could be causing the issue?

This is now resolved. It turns out that I have to execute with
mpirun -np 4 ./test
Ref: users-request#lists.open-mpi.org

Related

application using lttng compile errors with aarch64-xilinx-linux-g++

I am trying to porting lttng on xilinx mpsoc with linux OS, I have write a demo as same as lttng "Record user application events", it runs on Ubuntu perfectly
g++ -c -I. hello-tp.c
g++ -c hello.c
g++ -o hello hello-tp.o hello.o -llttng-ust -ldl
but when I compile it on arm linux platform I got errors:
aarch64-xilinx-linux-g++ -mcpu=cortex-a72.cortex-a53 -march=armv8-a+crc -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security -Werror=format-security --sysroot=/home/david/project/zcu102/images/linux/sdk/sysroots/cortexa72-cortexa53-xilinx-linux -O2 -pipe -g -feliminate-unused-debug-types -c -I. hello-tp.c
In file included from hello-tp.c:4:
hello-tp.h:16:27: error: expected constructor, destructor, or type conversion before ‘(’ token
16 | LTTNG_UST_TRACEPOINT_EVENT(hello_world, my_first_tracepoint, LTTNG_ARGS, LTTNG_FIELDS)
| ^
make: *** [Makefile:14: hello-tp.o] Error 1
here is the code
hello-tp.h:
#undef LTTNG_UST_TRACEPOINT_PROVIDER
#define LTTNG_UST_TRACEPOINT_PROVIDER hello_world
#undef LTTNG_UST_TRACEPOINT_INCLUDE
#define LTTNG_UST_TRACEPOINT_INCLUDE "./hello-tp.h"
#if !defined(_HELLO_TP_H) || defined(LTTNG_UST_TRACEPOINT_HEADER_MULTI_READ)
#define _HELLO_TP_H
#include <lttng/tracepoint.h>
#define LTTNG_ARGS LTTNG_UST_TP_ARGS(int, my_integer_arg, char *, my_string_arg)
#define LTTNG_FIELDS LTTNG_UST_TP_FIELDS(lttng_ust_field_string(my_string_field, my_string_arg) lttng_ust_field_integer(int, my_integer_field, my_integer_arg))
LTTNG_UST_TRACEPOINT_EVENT(hello_world, my_first_tracepoint, LTTNG_ARGS, LTTNG_FIELDS)
#endif /* _HELLO_TP_H */
#include <lttng/tracepoint-event.h>
hello-tp.c
#define LTTNG_UST_TRACEPOINT_CREATE_PROBES
#define LTTNG_UST_TRACEPOINT_DEFINE
#include "hello-tp.h"
hello.c
#include <stdio.h>
#include "hello-tp.h"
int main(int argc, char *argv[])
{
unsigned int i;
puts("Hello, World!\nPress Enter to continue...");
/*
* The following getchar() call only exists for the purpose of this
* demonstration, to pause the application in order for you to have
* time to list its tracepoints. You don't need it otherwise.
*/
getchar();
/*
* An lttng_ust_tracepoint() call.
*
* Arguments, as defined in `hello-tp.h`:
*
* 1. Tracepoint provider name (required)
* 2. Tracepoint name (required)
* 3. `my_integer_arg` (first user-defined argument)
* 4. `my_string_arg` (second user-defined argument)
*
* Notice the tracepoint provider and tracepoint names are
* C identifiers, NOT strings: they're in fact parts of variables
* that the macros in `hello-tp.h` create.
*/
lttng_ust_tracepoint(hello_world, my_first_tracepoint, 23,
"hi there!");
for (i = 0; i < argc; i++) {
lttng_ust_tracepoint(hello_world, my_first_tracepoint,
i, argv[i]);
}
puts("Quitting now!");
lttng_ust_tracepoint(hello_world, my_first_tracepoint,
i * i, "i^2");
return 0;
}
Makefile
APP = hello
# Add any other object files to this list below
APP_OBJS = hello-tp.o hello.o
all: build
build: $(APP)
$(APP): $(APP_OBJS)
$(CXX) -o $# $(APP_OBJS) $(LDFLAGS) -llttng -ldl
hello-tp.o : hello-tp.c hello-tp.h
$(CXX) $(CXXFLAGS) -c -I. $<
hello.o : hello.c
$(CXX) $(CXXFLAGS) -c $<
clean:
rm -f $(APP) *.o
Is there anyone met such issue? I guess the problem is caused by complier but I don't find any clue...
I just ran into this problem. Check your LTTNG version. The 2.13 release (current) uses LTTNG_UST_TRACEPOINT_PROVIDER. However, older releases uses TRACEPOINT_PROVIDER. The prefix LTTNG_UST has been added all over the place. See https://lttng.org/man/3/lttng-ust/v2.13/#doc-_compatibility_with_previous_apis

Compiling issue using boost mpi: Fatal error in PMPI_Errhandler_set: invalid communicator

I just installed boost 1.56.0 and boost.mpi following this guide. However,
I have an error when executing the first example of the Boost.mpi tutorial
#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <iostream>
namespace mpi = boost::mpi;
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
std::cout << "I am process " << world.rank() << " of " << world.size()
<< "." << std::endl;
return 0;
}
I successfully compile the previous program on linux with mpic++ using the command
mpic++ test.cpp -otest_mpi -lboost_mpi -lboost_serialization -lboost_system -lboost_filesystem -lboost_iostreams -lboost_graph_parallel
Then, when doing
mpirun -np 4 ./test_mpi
I obtain the following errors:
Fatal error in PMPI_Errhandler_set: invalid communicator, error stack:
PMPI_Errhandler_set(118): MPI_Errhandler_set(comm=0xb7151be0, errh=0xb71822c0) failed
PMPI_Errhandler_set(70): Invalid communicator
I am working on a virtual machine on which both openmpi and mpich are installed. I read somewhere else that the problem might be caused by a conflict between the two compilers. I also verified that MPI (without boost) works correctly running the "Hello, World!" script.
Do you have any idea about what is causing the error and how I can solve it?
Thanks in advance,
Pierpaolo

gperftools cpu profiler does not support multi process?

according to the document, http://gperftools.googlecode.com/svn/trunk/doc/cpuprofile.html, the cpu profiles does support multi process and will generate independent output file:
If your program forks, the children will also be profiled (since they
inherit the same CPUPROFILE setting). Each process is profiled
separately; to distinguish the child profiles from the parent profile
and from each other, all children will have their process-id appended
to the CPUPROFILE name.
but when I try as follow:
// main_cmd_argv.cpp
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <gperftools/profiler.h>
int loop(int n) {
int sum = 0;
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
sum = i + j;
if (sum %3 == 0) {
sum /= 3;
}
}
}
return 0;
}
int main(int argc, char* argv[]) {
printf("%s\n%s\n", getenv("CPUPROFILE"), getenv("CPUPROFILESIGNAL"));
if (argc > 1 && strcmp(argv[1], "-s")==0) {
// single process
loop(100000);
printf("stoped\n");
} else if (argc > 1 && strcmp(argv[1], "-m")==0) {
// multi process
pid_t pid = fork();
if (pid < 0) {
printf("fork error\n");
return -1;
}
if (pid == 0) {
loop(100000);
printf("child stoped\n");
} else if (pid > 0) {
loop(10000);
printf("father stoped\n");
wait(NULL);
}
}
return 0;
}
// makefile
GPerfTools=/home/adenzhang/tools/gperftools
CCFLAGS=-fno-omit-frame-pointer -g -Wall
ALL_BINS=main_cmd_argv
all:$(ALL_BINS)
main_cmd_argv:main_cmd_argv.o
g++ $(CCFLAGS) -o $# $^ -L./ -L$(GPerfTools)/lib -Wl,-Bdynamic -lprofiler -lunwind
.cpp.o:
g++ $(CCFLAGS) -c -I./ -I$(GPerfTools)/include -fPIC -o $# $<
clean:
rm -f $(ALL_BINS) *.o *.prof
// shell command
$ make
g++ -fno-omit-frame-pointer -g -Wall -c -I./ -I/home/adenzhang/tools/gperftools/include -fPIC -o main_cmd_argv.o main_cmd_argv.cpp
g++ -fno-omit-frame-pointer -g -Wall -o main_cmd_argv main_cmd_argv.o -L./ -L/home/adenzhang/tools/gperftools/lib -Wl,-Bdynamic -lprofiler -lunwind
$ env CPUPROFILE=main_cmd_argv.prof ./main_cmd_argv -s
젩n_cmd_argv.prof
(null)
stoped
PROFILE: interrupts/evictions/bytes = 6686/3564/228416
$ /home/adenzhang/tools/gperftools/bin/pprof --text ./main_cmd_argv ./main_cmd_argv.prof
Using local file ./main_cmd_argv.
Using local file ./main_cmd_argv.prof.
Removing killpg from all stack traces.
Total: 6686 samples
6686 100.0% 100.0% 6686 100.0% loop
0 0.0% 100.0% 6686 100.0% __libc_start_main
0 0.0% 100.0% 6686 100.0% _start
0 0.0% 100.0% 6686 100.0% main
$ rm main_cmd_argv.prof
$ env CPUPROFILE=main_cmd_argv.prof ./main_cmd_argv -m
젩n_cmd_argv.prof
(null)
father stoped
child stoped
PROFILE: interrupts/evictions/bytes = 0/0/64
PROFILE: interrupts/evictions/bytes = 68/36/2624
$ ls
main_cmd_argv main_cmd_argv.cpp main_cmd_argv.o main_cmd_argv.prof Makefile
$ /home/adenzhang/tools/gperftools/bin/pprof --text ./main_cmd_argv ./main_cmd_argv.prof
Using local file ./main_cmd_argv.
Using local file ./main_cmd_argv.prof.
$
It semms that gperf does not support multi process, could anyone please explain? thanks!
Quite old, don't know if you found an answer or not, but...
Seems like every thread/fork should register itself using ProfilerRegisterThread();
You can find more information in those two issues: Here and Here.
Also here is an example code, similar to your test case where the forks can be registered.
I'm currently using gperftools to profile a mpi program and come across this problem. After googling I find that ProfilerStart(_YOUR_PROF_FILE_NAME_) and ProfilerStop() ought be called during every sub-process is executed, and _YOUR_PRO_FILE_NAME_ must be different among different process. Then you could analysis performance of every process.
link(also asked by ZRJ):
https://groups.google.com/forum/#!topic/google-perftools/bmysZILR4ik

Linking cuda object file

I have one .cu file that contains my cuda kernel, and a wrapper function that calls the kernel. I have a bunch of .c files as well, one of which contains the main function. One of these .c files calls the wrapper function from the .cu to invoke the kernel.
I compile these files as follows:
LIBS=-lcuda -lcudart
LIBDIR=-L/usr/local/cuda/lib64
CFLAGS = -g -c -Wall -Iinclude -Ioflib
NVCCFLAGS =-g -c -Iinclude -Ioflib
CFLAGSEXE =-g -O2 -Wall -Iinclude -Ioflib
CC=gcc
NVCC=nvcc
objects := $(patsubst oflib/%.c,oflib/%.o,$(wildcard oflib/*.c))
table-hash-gpu.o: table-hash.cu table-hash.h
$(NVCC) $(NVCCFLAGS) table-hash.cu -o table-hash-gpu.o
main: main.c $(objects) table-hash-gpu.o
$(CC) $(CFLAGSEXE) $(objects) table-hash-gpu.o -o udatapath udatapath.c $(LIBS) $(LIBDIR)
So far everything is fine. table-hash-gpu.cu calls a function from one of the .c files. When linking for main, I get the error that the function is not present. Can someone please tell me what is going on?
nvcc compiles both device and host code using the host C++ compiler, which implies name mangling. If you need to call a function compiled with a C compiler in C++, you must tell the C++ compiler that it uses C calling conventions. I presume that the errors you are seeing are analogous to this:
$ cat cfunc.c
float adder(float a, float b, float c)
{
return a + 2.f*b + 3.f*c;
}
$ cat cumain.cu
#include <cstdio>
float adder(float, float, float);
int main(void)
{
float result = adder(1.f, 2.f, 3.f);
printf("%f\n", result);
return 0;
}
$ gcc -m32 -c cfunc.c
$ nvcc -o app cumain.cu cfunc.o
Undefined symbols:
"adder(float, float, float)", referenced from:
_main in tmpxft_0000b928_00000000-13_cumain.o
ld: symbol(s) not found
collect2: ld returned 1 exit status
Here we have code compiled with nvcc (so the host C++ compiler) trying to call a C function and getting a link error, because the C++ code expects a mangled name for adder in the supplied object file. If the main is changed like this:
$ cat cumain.cu
#include <cstdio>
extern "C" float adder(float, float, float);
int main(void)
{
float result = adder(1.f, 2.f, 3.f);
printf("%f\n", result);
return 0;
}
$ nvcc -o app cumain.cu cfunc.o
$ ./app
14.000000
It works. Using extern "C" to qualify the declaration of the function to the C++ compiler, it will not use C++ mangling and linkage rules when referencing adder and the resulting code links correctly.

Why I'm not getting "Multiple definition" error from the g++?

I tried to link my executable program with 2 static libraries using g++. The 2 static libraries have the same function name. I'm expecting a "multiple definition" linking error from the linker, but I did not received. Can anyone help to explain why is this so?
staticLibA.h
#ifndef _STATIC_LIBA_HEADER
#define _STATIC_LIBA_HEADER
int hello(void);
#endif
staticLibA.cpp
#include "staticLibA.h"
int hello(void)
{
printf("\nI'm in staticLibA\n");
return 0;
}
output:
g++ -c -Wall -fPIC -m32 -o staticLibA.o staticLibA.cpp
ar -cvq ../libstaticLibA.a staticLibA.o
a - staticLibA.o
staticLibB.h
#ifndef _STATIC_LIBB_HEADER
#define _STATIC_LIBB_HEADER
int hello(void);
#endif
staticLibB.cpp
#include "staticLibB.h"
int hello(void)
{
printf("\nI'm in staticLibB\n");
return 0;
}
output:
g++ -c -Wall -fPIC -m32 -o staticLibB.o staticLibB.cpp
ar -cvq ../libstaticLibB.a staticLibB.o
a - staticLibB.o
main.cpp
extern int hello(void);
int main(void)
{
hello();
return 0;
}
output:
g++ -c -o main.o main.cpp
g++ -o multipleLibsTest main.o -L. -lstaticLibA -lstaticLibB -lstaticLibC -ldl -lpthread -lrt
The linker does not look at staticLibB, because by the time staticLibA is linked, there are no unfulfilled dependencies.
That's an easy one. An object is only pulled out of a library if the symbol referenced hasn't already been defined. Only one of the hellos are pulled (from A). You'd get errors if you linked with the .o files.
When the linker tries to link main.o into multipleLibsTest and sees that hello() is unresolved, it starts searching the libraries in the order given on the command line. It will find the definition of hello() in staticLibA and will terminate the search.
It will not look in staticLibB or staticLibC at all.
If staticLibB.o contained another symbol not in staticLibA and that was pulled into the final executable, you then get a multiple definition of hello error, as individual .o files are pulled out of the library and two of them would have hello(). Reversing the order of staticLibA and staticLibB on the link command line would then make that error go away.

Resources