I have recently started to develop a one-stop library for high-performance scientific computing, which would essentially provide support to various parallel computing frameworks like OpenMP for deployment on CPU, CUDA for Nvidia GPUs, OpenCL for AMD platforms(This can even be used on other platforms as well). I also plan to add cluster support using openMPI
Scientific computing packages like NumPy which are based on BLAS do quite well when the data is small but the power of multithreading really shows when data goes really high. I took a [5000, 10000] array and multiplied it with a scalar, 2.0, NumPy took about an average of 90 seconds whereas parallelism using openMP did it in 3. This when I decided build such a library.
But to manage such a huge project, I need guidance, support and would also require help, and thus reaching out to the connoisseurs of scientific computing.
I think of building the whole library in C++ and providing language bindings. PLease let me know your thoughts and feedback.