I am Noriyuki Kushida.
If anyone has an experience in using OpenACC+ PGI Compiler in Oct file, please help me.
What I am trying to do is basically the same within the following URL, but with Oct file;
The above example works, but with Octave, I get the following error;
=== Error Message ===
Failing in Thread:1
call to cuMemcpyDtoHAsync returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
I have talked with PGI support, and they admitted that there is a behaviour which looks an issue of the compiler.
(If I understand correctly, the table which links host pointers and device pointers looks not correct)
However, because this issue only appears with Octave, they were not sure if they can fix this.
So, if anyone knows how to use OpenACC in Oct file, please help me.
GCC 8 worked when I switched “Complex” to “double”, but the entire code is huge and I cannot convert all Complex variables to double practically.
With Complex, GCC 8 cannot compile the code.
So, I would like to stick to PGI compiler in this context.
(Again, this implies for me that this is a problem in the compiler, and there is nothing to do on the Octave side though)
OS: Ubuntu 18.04
Octave: 4.2.2 installed with apt
PGI compiler: 19.5
I am going to attach the sample code just in case.
Please copy the files, and type the following commands;
% sh compile.sh
% octave callOct.m
If there is missing information, please let me know.
Dr Noriyuki Kushida
=== testFFTGPU.cc ===
void inv_CUFFT(Complex *in_data, Complex *out_data, int nc, int nr, void *stream)
cufftResult ResPlan = cufftPlan2d(&plan, nc,nr, CUFFT_Z2Z);
cufftResult ResExec = cufftExecZ2Z(plan,
DEFUN_DLD(testFFTGPU, args, ,
Complex *pmat = reinterpret_cast<Complex *> (const_cast<Complex *>(Matrix.fortran_vec()));
Complex *pout = reinterpret_cast<Complex *> (const_cast<Complex *>(out.fortran_vec()));
static dim_vector dv = Matrix.dims();
int Nc = dv(0);
int Nr = dv(1);
#pragma acc data copy(pmat[0:Nc*Nr],pout[0:Nc*Nr])
void *stream = acc_get_cuda_stream(acc_async_sync);
#pragma acc host_data use_device(pmat,pout)
retval(0) = out;
=== end testFFTGPU.cc ==
=== callOct.m ===
mat = rand(100,100);
mat1 = mat;
mat2 = mat;
fftGPU = testFFTGPU(mat2);
ifft_mat1 = ifft2(mat1);
fftGPU - ifft_mat1
=== end callOct.m ===
=== compile.sh ===
env CXX="pgc++ -noswitcherror " \
CXXFLAGS="-fast -acc -ta=tesla:cc70,cuda9.2 -Minfo=accel -Mcudalib=cufft" \
DL_LD="pgc++ -noswitcherror" \
=== end compile.sh ===