时序卷积网络(Temporal Convolutional Network)

时序卷积网络(TCN)是一种用于处理时序数据的神经网络架构。它在许多任务(如时间序列预测、自然语言处理等)中被认为是替代循环神经网络(RNN)的一个强大模型。

  1. 因果卷积(Causal Convolutions)
  2. 扩张卷积(Dilated Convolutions)
  3. 代码实现(Implementation)

Paper:https://arxiv.org/pdf/1803.01271.pdf

1. 因果卷积

因果卷积就是基于因果语言模型的卷积计算,本质上讲还是我们熟知的卷积计算方式,只不过加了一些技巧,使得:

  1. 输入 token 数量和输出 token 数量一样
  2. 每个 token 只能看到自己和之前的 token

当我们使用 Conv1D 计算文本的 token 表征时,我们使用卷积计算如下图所示:

问题:

  • 当 kernel_size=2 时,输入了 6 个 token, 经过卷积计算之后 token 变成 5 个了。
  • 当 kernel_size=3 时,表征第一个 token 时,就使用到了下文的 token 信息。

解决方法如下图所示:

根据 kernel_size 的大小,在输入序列两侧添加 pad 填充,然后再进行卷积计算,此时发现就可以使得输入和输出 token 数量相同,并且每个 token 只能看到自己和之前的 token, 以后的 token 不可见。上面其实也有问题,添加 pad 只在左侧添加就可以了,但是我们是在两侧添加,所以就需要使用切片将右侧的 pad 去掉。

2. 扩张卷积

我们希望最后一个 token 能够表达整个输入的语义,就像 RNN 网络中使用最后一个时间步的隐藏状态来表征整个输入序列一样。但是,上面的结果是,最后一个 token 只能看到前一个 token 和当前 token, 其涵盖的信息不足以表征整个输入序列。此时,我们就需要通过增加卷积层的层数,以及扩展卷积来实现这一点,具体如下图所示:

从图中可以看到,我们增加了 3 个卷积层。我们简要分析下:

  1. 第一层的最后一个 token A 只能关注到 2 个 token 的信息
  2. 第二层的最后一个 token B 能够关注到 4 个 token 的信息
  3. 第三层的最后一个 token C 能够关注到 6 个 token 的信息

我们会发现,层次越高,最后一个 token 能够关注到前面 token 信息就越多。通过多层扩展因果卷积计算,可以得到关注更多前面 token 的向量表示。如果我们要做分类问题的话,只需要用最后一层的最后一个 token 对应的向量来进行预测即可。

另外,需要注意的是:如果指定的 in_channels 和 out_channels 不同的话,会导致输出的 token 维度发生变化,此时,我们可以在最后再增加一个 Conv1D,来将 in_channels 维度映射成 out_channels 维度。

3. 代码实现

每一层并不是使用简单的 Conv1D 卷积层,而是一个时序块(Temporal Block),其架构如下:

示例代码(Paddle中的实现):

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
class Chomp1d(nn.Layer):
"""
Remove the elements on the right.
Args:
chomp_size (int): The number of elements removed.
"""
def __init__(self, chomp_size):
super(Chomp1d, self).__init__()
self.chomp_size = chomp_size
def forward(self, x):
return x[:, :, :-self.chomp_size]
class TemporalBlock(nn.Layer):
"""
The TCN block, consists of dilated causal conv, relu and residual block.
Args:
n_inputs ([int]): The number of channels in the input tensor.
n_outputs ([int]): The number of filters.
kernel_size ([int]): The filter size.
stride ([int]): The stride size.
dilation ([int]): The dilation size.
padding ([int]): The size of zeros to be padded.
dropout (float, optional): Probability of dropout the units. Defaults to 0.2.
"""
def __init__(self,
n_inputs,
n_outputs,
kernel_size,
stride,
dilation,
padding,
dropout=0.2):
super(TemporalBlock, self).__init__()
self.conv1 = weight_norm(
nn.Conv1D(n_inputs,
n_outputs,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation))
# Chomp1d is used to make sure the network is causal.
# We pad by (k-1)*d on the two sides of the input for convolution,
# and then use Chomp1d to remove the (k-1)*d output elements on the right.
self.chomp1 = Chomp1d(padding)
self.relu1 = nn.ReLU()
self.dropout1 = nn.Dropout(dropout)
self.conv2 = weight_norm(
nn.Conv1D(n_outputs,
n_outputs,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation))
self.chomp2 = Chomp1d(padding)
self.relu2 = nn.ReLU()
self.dropout2 = nn.Dropout(dropout)
self.net = nn.Sequential(self.conv1, self.chomp1, self.relu1,
self.dropout1, self.conv2, self.chomp2,
self.relu2, self.dropout2)
self.downsample = nn.Conv1D(n_inputs, n_outputs,
1) if n_inputs != n_outputs else None
self.relu = nn.ReLU()
self.init_weights()
def init_weights(self):
self.conv1.weight.set_value(
paddle.tensor.normal(0.0, 0.01, self.conv1.weight.shape))
self.conv2.weight.set_value(
paddle.tensor.normal(0.0, 0.01, self.conv2.weight.shape))
if self.downsample is not None:
self.downsample.weight.set_value(
paddle.tensor.normal(0.0, 0.01, self.downsample.weight.shape))
def forward(self, x):
out = self.net(x)
res = x if self.downsample is None else self.downsample(x)
return self.relu(out + res)
class Chomp1d(nn.Layer): """ Remove the elements on the right. Args: chomp_size (int): The number of elements removed. """ def __init__(self, chomp_size): super(Chomp1d, self).__init__() self.chomp_size = chomp_size def forward(self, x): return x[:, :, :-self.chomp_size] class TemporalBlock(nn.Layer): """ The TCN block, consists of dilated causal conv, relu and residual block. Args: n_inputs ([int]): The number of channels in the input tensor. n_outputs ([int]): The number of filters. kernel_size ([int]): The filter size. stride ([int]): The stride size. dilation ([int]): The dilation size. padding ([int]): The size of zeros to be padded. dropout (float, optional): Probability of dropout the units. Defaults to 0.2. """ def __init__(self, n_inputs, n_outputs, kernel_size, stride, dilation, padding, dropout=0.2): super(TemporalBlock, self).__init__() self.conv1 = weight_norm( nn.Conv1D(n_inputs, n_outputs, kernel_size, stride=stride, padding=padding, dilation=dilation)) # Chomp1d is used to make sure the network is causal. # We pad by (k-1)*d on the two sides of the input for convolution, # and then use Chomp1d to remove the (k-1)*d output elements on the right. self.chomp1 = Chomp1d(padding) self.relu1 = nn.ReLU() self.dropout1 = nn.Dropout(dropout) self.conv2 = weight_norm( nn.Conv1D(n_outputs, n_outputs, kernel_size, stride=stride, padding=padding, dilation=dilation)) self.chomp2 = Chomp1d(padding) self.relu2 = nn.ReLU() self.dropout2 = nn.Dropout(dropout) self.net = nn.Sequential(self.conv1, self.chomp1, self.relu1, self.dropout1, self.conv2, self.chomp2, self.relu2, self.dropout2) self.downsample = nn.Conv1D(n_inputs, n_outputs, 1) if n_inputs != n_outputs else None self.relu = nn.ReLU() self.init_weights() def init_weights(self): self.conv1.weight.set_value( paddle.tensor.normal(0.0, 0.01, self.conv1.weight.shape)) self.conv2.weight.set_value( paddle.tensor.normal(0.0, 0.01, self.conv2.weight.shape)) if self.downsample is not None: self.downsample.weight.set_value( paddle.tensor.normal(0.0, 0.01, self.downsample.weight.shape)) def forward(self, x): out = self.net(x) res = x if self.downsample is None else self.downsample(x) return self.relu(out + res)
class Chomp1d(nn.Layer):
    """
    Remove the elements on the right.

    Args:
        chomp_size (int): The number of elements removed.
    """

    def __init__(self, chomp_size):
        super(Chomp1d, self).__init__()
        self.chomp_size = chomp_size

    def forward(self, x):
        return x[:, :, :-self.chomp_size]


class TemporalBlock(nn.Layer):
    """
    The TCN block, consists of dilated causal conv, relu and residual block. 

    Args:
        n_inputs ([int]): The number of channels in the input tensor.
        n_outputs ([int]): The number of filters.
        kernel_size ([int]): The filter size.
        stride ([int]): The stride size.
        dilation ([int]): The dilation size.
        padding ([int]): The size of zeros to be padded.
        dropout (float, optional): Probability of dropout the units. Defaults to 0.2.
    """

    def __init__(self,
                 n_inputs,
                 n_outputs,
                 kernel_size,
                 stride,
                 dilation,
                 padding,
                 dropout=0.2):

        super(TemporalBlock, self).__init__()
        self.conv1 = weight_norm(
            nn.Conv1D(n_inputs,
                      n_outputs,
                      kernel_size,
                      stride=stride,
                      padding=padding,
                      dilation=dilation))
        # Chomp1d is used to make sure the network is causal.
        # We pad by (k-1)*d on the two sides of the input for convolution,
        # and then use Chomp1d to remove the (k-1)*d output elements on the right.
        self.chomp1 = Chomp1d(padding)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(dropout)

        self.conv2 = weight_norm(
            nn.Conv1D(n_outputs,
                      n_outputs,
                      kernel_size,
                      stride=stride,
                      padding=padding,
                      dilation=dilation))
        self.chomp2 = Chomp1d(padding)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(dropout)

        self.net = nn.Sequential(self.conv1, self.chomp1, self.relu1,
                                 self.dropout1, self.conv2, self.chomp2,
                                 self.relu2, self.dropout2)
        self.downsample = nn.Conv1D(n_inputs, n_outputs,
                                    1) if n_inputs != n_outputs else None
        self.relu = nn.ReLU()
        self.init_weights()

    def init_weights(self):
        self.conv1.weight.set_value(
            paddle.tensor.normal(0.0, 0.01, self.conv1.weight.shape))
        self.conv2.weight.set_value(
            paddle.tensor.normal(0.0, 0.01, self.conv2.weight.shape))
        if self.downsample is not None:
            self.downsample.weight.set_value(
                paddle.tensor.normal(0.0, 0.01, self.downsample.weight.shape))

    def forward(self, x):
        out = self.net(x)
        res = x if self.downsample is None else self.downsample(x)
        return self.relu(out + res)

代码中使用的 weight normalization 是一种对神经网络中的权重向量重新参数化,将长度和方向解耦,即:把权重向量使用两个参数进行替换:

  1. 表示长度的 weight_g
  2. 表示方向的 weight_v

weight normalization 实现 Paper 为:https://arxiv.org/pdf/1602.07868.pdf

未经允许不得转载:一亩三分地 » 时序卷积网络(Temporal Convolutional Network)
评论 (0)

3 + 5 =