

They are themselves divided into 2 main steps :ĭepthwise Convolution is a first step in which instead of applying convolution of size \(d \times d \times C\), we apply a convolution of size \(d \times d \times 1\). To overcome the cost of such operations, depthwise separable convolutions have been introduced. Therefore, for N Kernels (depth of the convolution) : Where K is the resulting dimension after convolution, which depends on the padding applied (e.g padding “same” would mean A = K). Here is the convolution process illustrated : We apply on it a convolution filter of size d*d, say 3*3. It also has a certain dimension A, say 100 * 100.

The input image has a certain number of channels C, say 3 for a color image. Convolution is a really expensive operation. The limits of convolutionsįirst of all, let’s take a look at convolutions.

Google presented an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). Xception is a deep convolutional neural network architecture that involves Depthwise Separable Convolutions.
