线性神经网络4-softmax回归

青瓦松

1天前浏览4

一、分类问题

标签表示方法：

以不同的整数表示不同的类别，如以y ∈ {1, 2, 3}分别代表{狗, 猫, 鸡}；
独热编码(one‐hot encoding)：独热编码是一个向量，它的分量和类别一样多。类别对应的分量设置为1，其他所有分量设置为0。如标签定义y为三维向量，其中(1, 0, 0)对应于 “猫”、(0, 1, 0)对应于“鸡”、(0, 0, 1)对应于“狗”。

二、网络架构

仿射函数：数学和机器学习中的基础概念，它描述了输入与输出之间的线性关系，同时允许存在平移（截距项）。

为了解决线性模型的分类问题，定义与输出一样多的仿射函数，即每个输出对应一个仿射函数。如有4个特征和3个可能的输出类别，利用仿射函数可表达为：

可简化为向量形式表达式：o = Wx + b。与线性回归一样，softmax回归也是一个单层神经网络。由于计算每个输出o1、o2和o3取决于所有输入x1、x2、x3和x4，所以softmax回归的输出层也是全连接层。

三、softmax运算

softmax函数：对每个未规范化的预测求幂，再让每个求幂后的结果除以它们的总和，即：

从以上定义可以看出，softmax函数能够将未规范化的预测变换为非负数并且总和为1，同时让模型保持可导的性质。softmax运算不会改变未规范化的预测o之间的大小次序，只会确定分配给每个类别的概率。尽管softmax是一个非线性函数，但softmax回归的输出仍然由输入特征的仿射变换决定。因此，softmax回归是一个线性模型(linear model)。

softmax函数的导数及推导：

四、交叉熵损失函数

Cross-entropy（交叉熵损失函数) 交叉熵是用来评估当前训练得到的概率分布与真实分布的差异情况。它刻画的是实际输出(概率)与期望输出(概率)的距离，也就是交叉熵的值越小，两个概率分布就越接近。常作为分类问题的损失函数。具体公式如下：

其中，y_j是一个表示真实标签的长度为q的独热编码向量，即除了第j项为1其余项均为0；y ̂_j表示第j项的预测的概率值。

交叉熵损失函数的导数及推导：

五、分类问题采用交叉熵而不是均方差的为损失函数的解释

交叉熵损失在分类任务中因其梯度高效、概率解释性强和优化稳定性，成为比 MSE 更优的选择。

六、softmax函数上溢的解决方案

在softmax计算之前，先从所有x_k中减去max(x_k):

在减法和规范化步骤之后，可能有些x_i − max(x)具有较大的负值，exp(x_i − max(x))将有接近零的值而出现下溢(underflow)，进而导致ˆy_i为零，并且使得log(ˆy_i )的值为-inf。反向传播几步后，我们可能会发现自己面对一屏幕可怕的nan结果。尽管我们要计算指数函数，但我们最终在计算交叉熵损失时会取它们的对数。通过将softmax和交叉熵结合在一起，可以避免反向传播过程中可能会困扰我们的数值稳定性问题。

七、softmax分类的实现

import torchfrom torch.utils import dataimport torchvisionfrom torchvision import transformsimport matplotlib.pyplot as pltimport typingfrom matplotlib_inline import backend_inlinefrom IPython import displayimport plotShow
# 定义一个累加器，实现对n个变量进行累加class Accumulator:  '''  在n个变量上累加  '''  def __init__(self, n:int):    self.data = [0.0]*n  def add(self, *args):    self.data = [a+float(b) for a, b in zip(self.data, args)]  def reset(self):    self.data = [0.0]*len(self.data)  def __getitem__(self, idx):    return self.data[idx]

class Animator:  '''  在动画中绘制数据  '''  def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None, ylim=None, xscale='linear',               yscale='linear', fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1, figsize=(3.5,2.5)):    # 增量地绘制多条线条    if legend is None:      legend = []    backend_inline.set_matplotlib_formats('svg')    self.fig, self.axes = plt.subplots(nrows, ncols, figsize=figsize)    if nrows*ncols==1:      self.axes = [self.axes, ]        # 使用lambda函数捕获参数    self.configAxes = lambda: plotShow.setAxes(self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)    self.X, self.Y, self.fmts = None, None, fmts  def add(self, x, y):    # 向图表中添加多个数据点    if not hasattr(y, '__len__'):      y = [y]    n = len(y)    if not hasattr(x, '__len__'):      x = [x]*n    if not self.X:      self.X = [[] for _ in range(n)]    if not self.Y:      self.Y = [[] for _ in range(n)]        for i, (a, b) in enumerate(zip(x, y)):      if a is not None and b is not None:        self.X[i].append(a)        self.Y[i].append(b)    self.axes[0].cla()    for x, y, fmt in zip(self.X, self.Y, self.fmts):      self.axes[0].plot(x, y, fmt)    self.configAxes()        display.display(self.fig)    plt.draw()    plt.pause(0.001)    display.clear_output(wait=True)  def show(self):    display.display(self.fig)

def getFashionMnistLabels(labels:typing.Sequence):  '''  返回FashionMnist数据集的文本标签  '''  textLabels = ['t-shirt', 'trouser', 'pullever', 'dress', 'coat',                 'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']  return [textLabels[int(i)] for i in labels]
def showImages(imgs:list, numRows:int, numCols:int, titles:list=None, scale:float=1.5):  figsize = (numCols*scale, numRows*scale)  _, axes = plt.subplots(numRows, numCols, figsize=figsize)  axes = axes.flatten()  for i, (ax, img) in enumerate(zip(axes, imgs)):    if torch.is_tensor(img): # 图片张量      ax.imshow(img.numpy())    else: # PIL图片      ax.imshow(img)    ax.axes.get_xaxis().set_visible(False)    ax.axes.get_yaxis().set_visible(False)    if titles:      ax.set_title(titles[i])  return axes
def getDataloaderWorkers()->int:  '''  使用4个进程来读取数据  '''  return 4
# 读取数据集def loadDataFashionMnist(batchSize:int, resize=None)->tuple:  '''  下载Fashion-MNIST数据集，然后将其加载到内存中  '''  trans = [transforms.ToTensor()]  if resize:    trans.insert(0, transforms.Resize(resize))  trans = transforms.Compose(trans)  mnistTrain = torchvision.datasets.FashionMNIST(root='../data', train=True, transform=trans, download=True)  mnistTest = torchvision.datasets.FashionMNIST(root='../data', train=False, transform=trans, download=True)  return (data.DataLoader(mnistTrain, batchSize, shuffle=True, num_workers=getDataloaderWorkers()),          data.DataLoader(mnistTest, batchSize, shuffle=False, num_workers=getDataloaderWorkers()))
# softmax函数def softmax(X:torch.Tensor)->torch.Tensor:  '''  1. 对每个项求幂(使用exp);  2. 对每一行求和(小批量中每个样本是一行), 得到每个样本的规范化常数；  3. 将每一行除以其规范化常数, 确保结果的和为1。  '''  XExp:torch.Tensor = torch.exp(X)  partition:torch.Tensor = XExp.sum(1, keepdim=True)  return XExp/partition
# 定义分类精度def accuracy(yHat:torch.Tensor, y:torch.Tensor)->float:  '''  计算预测正确的数量  '''  if len(yHat.shape)>1 and yHat.shape[1]>1:    yHat = yHat.argmax(axis=1)  cmp = yHat.type(y.dtype)==y  return float(cmp.type(y.dtype).sum())
def evaluateAccuracy(net:torch.nn.Module, dataIter:typing.Tuple[torch.Tensor])->float:  '''  计算在指定数据集上模型的精度  '''  if isinstance(net, torch.nn.Module):    net.eval() # 将模型设置为评估模式  metric = Accumulator(2) # 统计正确预测数、预测总数  with torch.no_grad():    for X, y in dataIter:      metric.add(accuracy(net(X), y), y.numel())  return metric[0]/metric[1]
# 定义训练模型def trainEpochCh3(net:torch.nn.Module,                   trainIter:typing.Tuple[torch.Tensor],                   loss:typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor],                   updater:torch.optim.Optimizer)->typing.Tuple[float]:  '''  训练模型一个周期  '''  # 将模型设置为训练模型  if isinstance(net, torch.nn.Module):    net.train()  # 训练损失总和、训练准确度总和、样本数  metric = Accumulator(3)  for X, y in trainIter:    yHat:torch.Tensor = net(X)    l:torch.Tensor = loss(yHat, y)    if isinstance(updater, torch.optim.Optimizer):      # 使用PyTorch内置的优化器和损失函数      updater.zero_grad()      l.mean().backward()      updater.step()    else:      # 使用定制的优化器和损失函数      l.sum().backward()      updater(X.shape[0])    metric.add(float(l.sum()), accuracy(yHat, y), y.numel())  # 返回训练损失和训练精度  return metric[0]/metric[2], metric[1]/metric[2]
def trainCh3(net:torch.nn.Module,               trainIter:data.DataLoader,              testIter:data.DataLoader,              loss:typing.Callable[[torch.Tensor, torch.Tensor], torch.Tensor],              numEpochs: int,               updater: torch.optim.Optimizer)->None:  '''  训练模型  '''  animator = Animator(xlabel='epoch', xlim=[1, numEpochs], ylim=[0.3,0.9],                      legend=['train loss', 'train acc', 'test acc'])  for epoch in range(numEpochs):    trainMetrics = trainEpochCh3(net, trainIter, loss, updater)    testAcc = evaluateAccuracy(net, testIter)    animator.add(epoch+1, trainMetrics+(testAcc, ))  trainLoss, trainAcc = trainMetrics  assert trainLoss<0.5, trainLoss  assert trainAcc<=1 and trainAcc>0.7, trainAcc  assert testAcc<=1 and testAcc>0.7, testAcc
# 定义优化算法def SGD(params:list[torch.Tensor], lr: float, batchSize:int):  '''  小批量随机梯度下降法  params: 模型参数集 合  lr: 学习率，确定每一步更新的大小  batchSize: 批量样本大小  '''  with torch.no_grad():    for param in params:      param -= lr*param.grad/batchSize      param.grad.zero_()
# 预测def predictCh3(net, testIter, n=6):  '''  预测标签  '''  for X, y in testIter:    break  trues = getFashionMnistLabels(y)  preds = getFashionMnistLabels(net(X).argmax(axis=1))  titles = [true + '\n' + pred for true, pred in zip(trues, preds)]  showImages(X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])  plt.show()

#  softmax回归的从零开始实现if __name__ == '__main__':  # 1. 读取数据集  batchSize: int = 256  trainIter, testIter = loadDataFashionMnist(batchSize)    # 2. 模型参数初始化  numInputs: int = 784 # 原始数据集每个样本是28*28的图像，因此以构造一个长度为784的向量来表示每一个像素点  numOutPuts: int = 10 # 数据集中有10个类别  W = torch.normal(0.0, 0.01, size=(numInputs, numOutPuts), requires_grad=True) # 权重项  b = torch.zeros(numOutPuts, requires_grad=True) # 偏置项    # 3. 定义softmax模型  def net(X:torch.Tensor)->torch.Tensor:    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)    # 4. 定义损失函数  def crossEntropy(yHat:torch.Tensor, y:torch.Tensor)->torch.Tensor:    '''    yHat: 预测概率       y: 标签向量    '''    return -torch.log(yHat[range(len(yHat)), y])    # 5. 定义优化函数  lr = 0.1  def updater(batchSize:int):    return SGD([W,b], lr, batchSize)    # 6. 模型训练  numEpochs:int = 10  trainCh3(net, trainIter, testIter, crossEntropy, numEpochs, updater)  # 7. 预测  predictCh3(net, testIter, n=10)

# softmax回归的简洁实现if __name__ == '__main__':  # 1. 读取数据集  batchSize:int = 256  trainIter, testIter = loadDataFashionMnist(batchSize)    # 2. 初始化模型参数  # PyTorch不会隐式地调整输入的形状。因此，  # 我们在线性层前定义了展平层（flatten），来调整网络输入的形状  net = torch.nn.Sequential(torch.nn.Flatten(), torch.nn.Linear(784, 10))  def initWeight(m):    if type(m)==torch.nn.Linear:      torch.nn.init.normal_(m.weight, std=0.01)  net.apply(initWeight)    # 3. 定义损失函数  loss = torch.nn.CrossEntropyLoss(reduction='none')    # 4. 定义优化算法  trainer = torch.optim.SGD(net.parameters(), lr=0.1)    # 5. 训练  numEpochs:int = 10  trainCh3(net, trainIter, testIter, loss, numEpochs, trainer)

来源：檐苔

ANSA二次开发：获取与视角一致的单元法向量

在ANSA二次开发中经常获取当前视角下被选中单元和视角一致的法向量以备后续操作。本文主要通过创建视角向量和被选中单元的法向量来展示这一过程。首先，需要对中的错误进行更正：ansa中GetViewAngles函数旋转顺序为Z->Y->Z；且以顺时针为正，和一般笛卡尔坐标相反。因此有：import mathfrom typing import List, Tuplefrom ansa import basefrom ansa import constantsfrom ansa import calcclass ANSATYPE(object): ''' 不同求解器对应关键字名字 ''' @class method def eshell(cls, deck:int)->str: typDict:dict = { constants.LSDYNA: 'ELEMENT_SHELL', constants.ABAQUS:'SHELL', constants.NASTRAN: 'SHELL', constants.RADIOSS: 'SHELL', constants.ANSYS: 'SHELL', 22: 'SHELL', # OPTISTRUCT 23: 'SHELL' # MARC } try: _type = typDict[deck] except KeyError: raise ValueError('ERROR: 模型Deck设置有误') return _type @class method def node(cls, deck:int)->str: typDict:dict = { constants.LSDYNA: 'NODE', constants.ABAQUS:'NODE', constants.NASTRAN: 'GRID', constants.RADIOSS: 'NODE', constants.ANSYS: 'NODE', 22: 'GRID', # OPTISTRUCT 23: 'NODE' # MARC } try: _type = typDict[deck] except KeyError: raise ValueError('ERROR: 模型Deck设置有误') return _typeclass Vector(object): @class method def point2Point(cls, point1:List[float], point2:List[float], unit:bool=True)->List[float]: ''' 计算两点构成的向量. 向量由point1指向point2，即point2-point1 ''' if isinstance(point1, base.Entity): point1 = point1.position if isinstance(point2, base.Entity): point2 = point2.position vect:List[float] = [p2-p1 for p1, p2 in zip(point1, point2)] return cls.unit(vect) if unit else vect @staticmethod def unit(vector:List[float])->List[float]: ''' 将一个向量单位化 ''' return calc.Normalize(vector) @staticmethod def distance(point1:List[float], point2:List[float])->float: ''' 计算两点间的距离 ''' if isinstance(point1, base.Entity): point1 = point1.position if isinstance(point2, base.Entity): point2 = point2.position return math.dist(point1, point2) @class method def angleCos(cls, vector1:List[float], vector2:List[float])->float: ''' 计算两个空间向量的夹角的余弦值。 ''' unit_vect01 = cls.unit(vector1) unit_vect02 = cls.unit(vector2) return sum([v1*v2 for v1, v2 in zip(unit_vect01, unit_vect02)]) @staticmethod def angle(vector1:List[float], vector2:List[float])->float: ''' 计算两个空间向量的夹角，返回值以弧度形式存在 ''' return calc.CalcAngleOfVectors(vector1, vector2) @class method def elemNormal(cls, shellElem:base.Entity)->List[float]: ''' 根据单元的前三个节点计算单元的法向量 ''' vect:List[float] = base.GetNormalVectorOfShell(shellElem) if isinstance(vect, list): return vect else: nodeType:str = ANSATYPE.node(DECK) nodes:List[base.Entity] = base.CollectEntities(DECK, shellElem, nodeType, recursive=True) return cls.vec3Point(nodes[0], nodes[1], nodes[2]) @class method def vec3Point(cls, point1:List[float], point2:List[float], point3:List[float])->List[float]: ''' 计算空间三点构成平面的法向量 ''' vect1:List[float] = cls.point2Point(point1, point2) vect2:List[float] = cls.point2Point(point1, point3) return calc.CrossProduct(vect1, vect2)def getViewVector()->List[float]: ''' 通过视角函数获取视角旋转角并将其装换为视角向量 ''' # 获取视角转动角 angles:List[float] = base.GetViewAngles() angleRad = list(map(math.radians, angles)) angleSin: List[float] = list(map(math.sin, angleRad)) angleCos: List[float] = list(map(math.cos, angleRad)) # 计算视角向量三分量 x:float = -angleSin[1] y:float = angleSin[0]*angleCos[1] z:float = angleCos[0]*angleCos[1] return [x, y, z]def getShellViewVector(elem:base.Entity)->List[float]: ''' 名称： getShellViewVector 描述：获取shell单元与当前视角方向一致的法向向量参数： elem object shell类型单元返回值：返回一个向量(list)，该向量表示输入shell单元与当前视角一致的单元法向量。 ''' # 对输入对象进行参数检查 if not isinstance(elem, base.Entity): raise TypeError('请输入ansa对象') shellType = ANSATYPE.eshell(DECK) elemType = elem.ansa_type(DECK) if elemType != shellType and elemType != 'SOLIDFACET': raise TypeError(f'''输入ansaType类型有误，此函数要求输入的为shell单元或solid单元的表面类型(SOLIDFACET), 现输入类型为{elemType}''') # 获取单元法向量 elemVect:List[float] = Vector.elemNormal(elem) # 获取视角向量 viewVect:List[float] = getViewVector() # 计算两个向量的夹角，若为锐角，单元法向量为所求方向；若为钝角，单元法向量反方向为所求方向 angle:float = Vector.angle(elemVect, viewVect) if angle==-math.pi: raise ValueError(f'输入向量有误, 现输入向量为{elemVect}和{viewVect}, 请核查') if angle>math.pi/2.0: elemVect = [-vect for vect in elemVect] return elemVectdef calcCog(entities:List[base.Entity])->List[float]: ''' 名称： calcCog 描述：计算输入的entities对象的cog 参数： entities List[base.Entity] 存在cog的可迭代对象或entity对象返回值：返回一个Cog向量(list) ''' if isinstance(entities, base.Entity): entities = [entities] length = len(entities) if length==0: return[0.0, 0.0, 0.0] if length==1: return base.Cog(entities[0]) x, y, z = 0.0, 0.0, 0.0 for ent in entities: cog = base.Cog(ent) x += cog[0] y += cog[1] z += cog[2] return [x/length, y/length, z/length]def getMinDistEntityFromPoint(point:List[float], entities:List[base.Entity])->base.Entity: ''' 名称： getMinDistEntityFromPoint 描述：从输入的entities对象中选出距point最近的对象。参数： point List[float] 参考点坐标 entities List[base.Entity] 存在cog的可迭代对象或entity对象返回值：返回距point点最近的对象。 ''' if not entities: return None if isinstance(entities, base.Entity): return entities if len(entities)==1: return entities[0] minDistance:float = math.inf minEntity:base.Entity = None for ent in entities: cog = base.Cog(ent) distance = Vector.distance(cog, point) if distance<minDistance: minDistance = distance minEntity = ent return minEntitydef main(): types:Tuple[str] = (ANSATYPE.eshell(DECK), 'SOLIDFACET') entities:List[base.Entity] = base.PickEntities(DECK, types, initial_type=types[0]) if not entities: return None if len(entities)==1: entity = entities[0] cog = base.Cog(entity) else: cog = calcCog(entities) entity = getMinDistEntityFromPoint(cog, entities) vect = getShellViewVector(entity) fields = {'XT': cog[0], 'YT': cog[1], 'ZT': cog[2], 'XH': cog[0]+vect[0], 'YH': cog[1]+vect[1], 'ZH': cog[2]+vect[2], 'Name': 'element normal vector'} base.CreateEntity(DECK, 'DEFINE_VECTOR', fields) viewVect = getViewVector() fields = {'XT': cog[0], 'YT': cog[1], 'ZT': cog[2], 'XH': cog[0]+viewVect[0], 'YH': cog[1]+viewVect[1], 'ZH': cog[2]+viewVect[2], 'Name': 'view vector'} base.CreateEntity(DECK, 'DEFINE_VECTOR', fields) return Noneif __name__ == '__main__': main()来源：檐苔