Segmentierungsfehler bei MPI_Info_create und MPI_FinalizeC++

Programme in C++. Entwicklerforum
Anonymous
 Segmentierungsfehler bei MPI_Info_create und MPI_Finalize

Post by Anonymous »

Für den Quellcode wie folgt:

Code: Select all

#include 
#include 
#include 

using namespace std;

int main(int argc, char **argv){

int allocresult, infocreateresult;
int tabsize = atoi(*(argv + 1));

int *myrank = new int(0);
int *ranks = new int(0);

MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, myrank);
MPI_Comm_size(MPI_COMM_WORLD, ranks);

MPI_Info *info1;
infocreateresult = MPI_Info_create(info1);
double **tab1init;
double *tab1;
MPI_Aint size1;

if(!*myrank){
// initialization block
size1 = tabsize * sizeof(double);
} else {
int workchunk = tabsize;
workchunk /= (*ranks - 1);

if(*myrank == *ranks - 1){
workchunk += tabsize % (*ranks - 1);
}
size1 = workchunk * sizeof(double);
}

allocresult = MPI_Alloc_mem(size1, *info1, tab1init);

tab1 = *tab1init;

// final block
MPI_Info_free(info1);
MPI_Free_mem(tab1);
MPI_Finalize();

return 0;
}
Ich habe je nach verwendeter OpenMPI-Version unterschiedliche Ergebnisse erhalten:

Code: Select all

$ mpirun --version
mpirun (Open MPI) 5.0.8
$ mpiCC --version
g++ (GCC) 15.2.1 20251211 (Red Hat 15.2.1-5)

$ mpiCC test99.cpp
$ mpirun -n 3 a.out 210
[grad:78087:0:78087] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f707ae61978)
[grad:78085:0:78085] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f83ecc61978)
[grad:78086:0:78086] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f7382461978)
==== backtrace (tid:  78087) ====
0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f707a1b2df4]
1  /lib64/libucs.so.0(+0x17aed) [0x7f707a1b4aed]
2  /lib64/libucs.so.0(+0x17cbd) [0x7f707a1b4cbd]
3  /lib64/libc.so.6(+0x1a070) [0x7f707aa28070]
4  /usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Info_create+0x26) [0x7f707b07d5c6]
5  a.out() [0x4005c0]
6  /lib64/libc.so.6(+0x3575) [0x7f707aa11575]
7  /lib64/libc.so.6(__libc_start_main+0x88) [0x7f707aa11628]
8  a.out() [0x400445]
=================================
==== backtrace (tid:  78085) ====
0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f83eccb3df4]
1  /lib64/libucs.so.0(+0x17aed) [0x7f83eccb5aed]
2  /lib64/libucs.so.0(+0x17cbd) [0x7f83eccb5cbd]
3  /lib64/libc.so.6(+0x1a070) [0x7f83ec828070]
4  /usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Info_create+0x26) [0x7f83ece7d5c6]
5  a.out() [0x4005c0]
6  /lib64/libc.so.6(+0x3575) [0x7f83ec811575]
7  /lib64/libc.so.6(__libc_start_main+0x88) [0x7f83ec811628]
8  a.out() [0x400445]
=================================
==== backtrace (tid:  78086) ====
0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f7381d8bdf4]
1  /lib64/libucs.so.0(+0x17aed) [0x7f7381d8daed]
2  /lib64/libucs.so.0(+0x17cbd) [0x7f7381d8dcbd]
3  /lib64/libc.so.6(+0x1a070) [0x7f7382028070]
4  /usr/lib64/openmpi/lib/libmpi.so.40(PMPI_Info_create+0x26) [0x7f738267d5c6]
5  a.out() [0x4005c0]
6  /lib64/libc.so.6(+0x3575) [0x7f7382011575]
7  /lib64/libc.so.6(__libc_start_main+0x88) [0x7f7382011628]
8  a.out() [0x400445]
=================================
--------------------------------------------------------------------------
prterun noticed that process rank 0 with PID 78085 on node grad exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------
und für ältere OpenMPI-Version:

Code: Select all

$ mpirun --version
mpirun (Open MPI) 4.1.1
$ mpiCC --version
g++ (GCC) 11.5.0 20240719 (Red Hat 11.5.0-11)

$ mpiCC test99.cpp
$ mpirun -n 3 a.out 200
[vmi2927342:31228:0:31228] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:  31228) ====
0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f0bb0b80714]
1  /lib64/libucs.so.0(+0x2a2ac) [0x7f0bb0b822ac]
2  /lib64/libucs.so.0(+0x2a46a) [0x7f0bb0b8246a]
3  /lib64/libc.so.6(__cxa_finalize+0x60) [0x7f0bba241870]
4  /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(+0x3987) [0x7f0bb808b987]
=================================
[vmi2927342:31228] *** Process received signal ***
[vmi2927342:31228] Signal: Segmentation fault (11)
[vmi2927342:31228] Signal code:  (-6)
[vmi2927342:31228] Failing at address: 0x3e8000079fc
[vmi2927342:31228] [ 0] /lib64/libc.so.6(+0x3fc30)[0x7f0bba23fc30]
[vmi2927342:31228] [ 1] /lib64/libc.so.6(__cxa_finalize+0x60)[0x7f0bba241870]
[vmi2927342:31228] [ 2] /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(+0x3987)[0x7f0bb808b987]
[vmi2927342:31228] *** End of error message ***
[vmi2927342:31227:0:31227] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:  31227) ====
0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7f89601ef714]
1  /lib64/libucs.so.0(+0x2a2ac) [0x7f89601f12ac]
2  /lib64/libucs.so.0(+0x2a46a) [0x7f89601f146a]
3  /lib64/libc.so.6(__cxa_finalize+0x60) [0x7f8964441870]
4  /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(+0x3987) [0x7f896216b987]
=================================
[vmi2927342:31227] *** Process received signal ***
[vmi2927342:31227] Signal: Segmentation fault (11)
[vmi2927342:31227] Signal code:  (-6)
[vmi2927342:31227] Failing at address:  0x3e8000079fb
[vmi2927342:31227] [ 0] /lib64/libc.so.6(+0x3fc30)[0x7f896443fc30]
[vmi2927342:31227] [ 1] /lib64/libc.so.6(__cxa_finalize+0x60)[0x7f8964441870]
[vmi2927342:31227] [ 2] /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(+0x3987)[0x7f896216b987]
[vmi2927342:31227] *** End of error message ***
[vmi2927342:31226:0:31226] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil))
==== backtrace (tid:  31226) ====
0  /lib64/libucs.so.0(ucs_handle_error+0x2e4) [0x7fdb3c04c714]
1  /lib64/libucs.so.0(+0x2a2ac) [0x7fdb3c04e2ac]
2  /lib64/libucs.so.0(+0x2a46a) [0x7fdb3c04e46a]
3  /lib64/libc.so.6(__cxa_finalize+0x60) [0x7fdb44c41870]
4  /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(+0x3987) [0x7fdb3f69c987]
=================================
[vmi2927342:31226] *** Process received signal ***
[vmi2927342:31226] Signal: Segmentation fault (11)
[vmi2927342:31226] Signal code:  (-6)
[vmi2927342:31226] Failing at address: 0x3e8000079fa
[vmi2927342:31226] [ 0] /lib64/libc.so.6(+0x3fc30)[0x7fdb44c3fc30]
[vmi2927342:31226] [ 1] /lib64/libc.so.6(__cxa_finalize+0x60)[0x7fdb44c41870]
[vmi2927342:31226] [ 2] /usr/lib64/openmpi/lib/openmpi/mca_pml_ucx.so(+0x3987)[0x7fdb3f69c987]
[vmi2927342:31226] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node vmi2927342 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
  • OpenMPI 5 gibt zu Beginn während der Ausführung von
    MPI_Info_create auf. Dies ist bei Version 4 nicht der Fall.
  • OpenMPI Version 4 führt alles bis zu MPI_Finalize am Ende aus, wenn
    segfault für alle Ränge mit (Null) zurückgegeben wird.
Im Quellcode kann kein Fehler festgestellt werden, der diese Probleme verursachen könnte.

Quick Reply

Change Text Case: 
   
  • Similar Topics
    Replies
    Views
    Last post