Cython error message: Buffer has wrong number of dimensions (expected 1, got 2)

I'm trying to code the least squares estimator in Cython for learning purposes. I got this basic version working:

import cython
import numpy as np
from scipy.linalg import inv
cimport numpy as np


def ols_c(np.ndarray x, np.ndarray y):
  cdef int nrowx = x.shape[0]
  cdef int ncolx = x.shape[1]
  cdef np.ndarray beta = np.zeros([ncolx,1], dtype=float) 
  cdef np.ndarray a1 = np.zeros([ncolx, ncolx], dtype=float)
  cdef np.ndarray a2 = np.zeros([ncolx, nrowx], dtype=float)
  a1 = inv(np.dot(x.T,x))
  a2 = np.dot(a1,x.T)
  beta = np.dot(a2,y)
  return(beta)

which is slightly slower than this Numpy version:

import numpy as np
from scipy.linalg import inv

def ols(x,y):
  a1 = inv(np.dot(x.T,x))
  a2 = np.dot(a1,x.T)
  beta = np.dot(a2,y)
  return(beta)

I guess this is likely due to inefficient array indexing. Following tutorials on the internet, I modified the basic Cython version like this:

import cython
import numpy as np
from scipy.linalg import inv
cimport numpy as np
DTYPE = np.float
ctypedef np.float_t DTYPE_t


def ols_c(np.ndarray[DTYPE_t, ndim=2] x, np.ndarray[DTYPE_t, ndim=1] y):
  cdef int nrowx = x.shape[0]
  cdef int ncolx = x.shape[1]
  cdef np.ndarray[DTYPE_t, ndim=1] beta = np.zeros([ncolx,1], dtype=float) 
  cdef np.ndarray[DTYPE_t, ndim=2] a1 = np.zeros([ncolx, ncolx], dtype=float)
  cdef np.ndarray[DTYPE_t, ndim=2] a2 = np.zeros([ncolx, nrowx], dtype=float)
  a1 = inv(np.dot(x.T,x))
  a2 = np.dot(a1,x.T)
  beta = np.dot(a2,y)
  return(beta)

But now it doesn't work, I get the following error message:

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

What causes this error? I also have some other questions:

What do these 2 lines actually do?

DTYPE = np.float
ctypedef np.float_t DTYPE_t

Also, if I understand correctly typing this cdef np.ndarray[DTYPE_t, ndim=2] x = np.zeros([ncol, nrow], dtype=float) creates a two-dimensional array x with number of columns equal to ncol and row equal to nrow, that contain floats. But what does [DTYPE_t, ndim=2] actually does? I haven't found any documentation on this.

Thank you in advance for your answers!

EDIT: looks like if I replace DTYPE_t with double and comment these two lines:

DTYPE = np.float
ctypedef np.float_t DTYPE_t

HOwever, execution is still slow. What can I do to speed things up?


regarding your speed have a look @ http://simula.no/research/sc/publications/Simula.SC.578/simula_pdf_file :

Trying to vectorize the code also resulted in very poor performance, for the same reasons. Vectorization uses slicing, and slices are Python objects not implemented in Cython.

DeVectorizing your code will probably speed things up.


What do these 2 lines actually do?

DTYPE = np.float
ctypedef np.float_t DTYPE_t

It assigns the np.float (Python-)type to a variable called DTYPE and declares a C type definition (ctypedef).

Using the ctypedef keyword in Cython will make it add the C/C++ typedef statement with the given types in the compiled Cython-code.

A typedef -fed type equals the type it was defined from, but the compiler will warn you when giving it a value of another type (even it's the type it was defined from).

When using Cython, you should have a little understanding of C or C++.

链接地址: http://www.djcxy.com/p/87922.html

上一篇: 在cython中声明numpy数组和c指针

下一篇: Cython错误消息:缓冲区的维数错误(预期1,得到2)