OpenACC(PGI) in OCT file

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

OpenACC(PGI) in OCT file

KUSHIDA Noriyuki

Dear All,

 

I am Noriyuki Kushida.

If anyone has an experience in using OpenACC+ PGI Compiler in Oct file, please help me.

 

What I am trying to do is basically the same within the following URL, but with Oct file;

https://devtalk.nvidia.com/default/topic/523203/openacc-toolkit/fft-using-openacc/

 

The above example works, but with Octave, I get the following error;

=== Error Message ===

Failing in Thread:1

call to cuMemcpyDtoHAsync returned error 700: Illegal address during kernel execution

 

Failing in Thread:1

call to cuMemFreeHost returned error 700: Illegal address during kernel execution

====================

 

 

I have talked with PGI support, and they admitted that there is a behaviour which looks an issue of the compiler.

(If I understand correctly, the table which links host pointers and device pointers looks not correct)

However, because this issue only appears with Octave, they were not sure if they can fix this.

So, if anyone knows how to use OpenACC in Oct file, please help me.

 

GCC 8 worked when I switched “Complex” to “double”, but the entire code is huge and I cannot convert all Complex variables to double practically.

With Complex, GCC 8 cannot compile the code.

So, I would like to stick to PGI compiler in this context.

(Again, this implies for me that this is a problem in the compiler, and there is nothing to do on the Octave side though)

 

Environment:

OS: Ubuntu 18.04

Octave: 4.2.2 installed with apt

PGI compiler: 19.5

Machine: dgxstation

 

I am going to attach the sample code just in case.

Please copy the files, and type the following commands;

% sh compile.sh

% octave callOct.m

 

If there is missing information, please let me know.

 

Best regards,

Dr Noriyuki Kushida

 

 

=== testFFTGPU.cc ===

#include<math.h>

#include <octave/oct.h>

#include <octave/parse.h>

#include <complex>

#include <chrono>

#include <fftw3.h>

#include "openacc.h"

#include "cufft.h"

 

void inv_CUFFT(Complex *in_data, Complex *out_data, int nc, int nr, void *stream)

{

    cufftHandle plan;

    cufftResult ResPlan = cufftPlan2d(&plan, nc,nr, CUFFT_Z2Z);

    cufftSetStream(plan, (cudaStream_t)stream);

    cufftResult ResExec = cufftExecZ2Z(plan,

                                      (cufftDoubleComplex*)in_data,

                                      (cufftDoubleComplex*)out_data,

                                      CUFFT_INVERSE);

    cufftDestroy(plan);

}

 

 

DEFUN_DLD(testFFTGPU, args, ,

            "main body;")

{

   ComplexMatrix Matrix(args(0).complex_matrix_value());

   octave_value_list retval;

 

   ComplexMatrix out(Matrix.dims());

 

   Complex *pmat = reinterpret_cast<Complex *> (const_cast<Complex *>(Matrix.fortran_vec()));

   Complex *pout = reinterpret_cast<Complex *> (const_cast<Complex *>(out.fortran_vec()));

 

   static dim_vector dv = Matrix.dims();

 

   int Nc = dv(0);

   int Nr = dv(1);

 

#pragma acc data copy(pmat[0:Nc*Nr],pout[0:Nc*Nr])

{

   void *stream = acc_get_cuda_stream(acc_async_sync);

#pragma acc host_data use_device(pmat,pout)

{

   inv_CUFFT(pmat,pout,Nc,Nr,stream);

}

}

 

   retval(0) = out;

   return retval;

}

=== end testFFTGPU.cc ==

 

=== callOct.m ===

mat = rand(100,100);

mat1 = mat;

mat2 = mat;

fftGPU = testFFTGPU(mat2);

 

ifft_mat1 = ifft2(mat1);

fftGPU - ifft_mat1

=== end callOct.m ===

=== compile.sh ===

#!/bin/bash

env CXX="pgc++ -noswitcherror " \

CXXFLAGS="-fast -acc -ta=tesla:cc70,cuda9.2 -Minfo=accel -Mcudalib=cufft" \

DL_LD="pgc++ -noswitcherror" \

mkoctfile testFFTGPU.cc

=== end compile.sh ===