构建自动微分引擎 tinytorch 01

社区文章发布于 2024 年 1 月 21 日

这篇博客曾发布在 pythonstuff.com 上，现在是这篇博客的新家 🤗。我将向你展示如何从头开始构建你自己的迷你 pytorch。

这个周末，我打算构建我自己的自动微分引擎。我之前做过一次，所以应该不是问题。我从一个空的 git 仓库开始，我将保持持续提交，不进行任何 rebase，这样如果有人回溯，他们就能看到所有的修改。

这是一个开发博客，我会尽量写解释，以便你可以重新创建它。我不习惯写教程，所以如果你遇到问题，可以在 Twitter（现在是 X）上给我发私信 @shxf0072。如果我搞砸了，所有代码都将在 github.com/joey00072/tinytorch 上。我会在博客中留下提交记录，所以执行 git checkout COMMIT_ID 即可回到那个时间点。

tinytorch

让我们从围绕 numpy 的 Tensor 包装器开始。

import numpy as np


class Tensor:
    def __init__(self,data):
        self.data = data if isinstance(data,np.ndarray) else np.array(data)
        
    def __add__(self,other):
        return Tensor(self.data + other.data)

    def __mul__(self,other):
        return Tensor(self.data + other.data)
    
    def __repr__(self):
        return f"tensor({self.data})"
    
    
if __name__ == "__main__":
    x = Tensor([8])
    y = Tensor([5])    
    z = x+y

    print(z)

太好了，现在我们可以将两个张量相加了。如果你不了解这一点，可以搜索“python 中的魔法方法”。

我的朋友们叫我去玩瓦罗兰特，所以我会在大约 2 小时后回来（2023 年 8 月 19 日 20:02）。

回来了（2023 年 8 月 19 日 23:02）

加法与乘法

别担心数学，代码很简单。

任何函数对自身的导数都为 1，例如对于 $f (x) = x$ ，它的导数 $\frac{{d}}{{dx}} = 1$ 。

让我们从加法开始。如果你有函数

$f (x) = x + 10$

它的导数将是 1，因为 $\frac{{d(x)}}{{dx}} = 1$ ，常数的导数为 0，10 的导数为 $0$ 。所以 $1 + 0 = 1$ 。 $f (x, y) = x + y$

对于两个变量： $f (x, y) = x + y$ ，关于 (x) 的导数： $\frac{{d(x)}}{{dx}} = 1 \text{ and } \frac{{d(y)}}{{dx}} = 0 \text{ since } y \text{ is constant, } 1 + 0 = 1,$ 关于 (y) 的导数： $\frac{{d(x)}}{{dx}} = 0 \text{ and } \frac{{d(y)}}{{dx}} = 1 \text{ since } y \text{ is constant, } 0 + 1 = 1.$

所以如果 (x =10) 和 (y = 20) $f(x,y) = x+y \\ \\ \frac{{f(x,y)}}{{dx}} = 1 \space \& \space \frac{{f(x,y)}}{{dy}} = 1$

noice adding give equial graidnt back to both node
since z has grident 1, x and y got both grident 1 this will be usefull in residual connections in transformers

乘法

现在，让我们考虑乘法。如果你有函数 $g(x) = x \cdot 10,$ ，它的导数将是 10，因为 (\frac{{d(x)}}{{dx}} = 10) 并且 10 的导数为 0。所以 (10 \cdot 1 = 10)。

(f(x,y) = x \cdot y)

对于两个变量： $f(x,y) = x \cdot y \\ x=10 \space \And y =20$ ，关于 (x) 的导数： $\frac{{d(x)}}{{dx}} = 1 \cdot 20 = 20\space$ ，关于 (y) 的导数： $\space \frac{{d(y)}}{{dx}} = 10 \cdot 1 = 10 ,$

Noice in this case derivative or x have value of y (20) and derivate of y have value of x (10)

让我们来编写代码。

我们将创建 Add、MUL 和 Function 类，将操作逻辑移到每个类的 forward 方法中，并将参数值存储在 Function.args 中以用于 backward。


class Function:
    def __init__(self,op,*args):
        self.op = op
        self.args = args        

class Add:
    @staticmethod
    def forward(x,y):
        return Tensor(x.data + y.data)
    
    @staticmethod
    def backward(ctx,grad):
        x,y = ctx.args
        return Tensor([1]) ,Tensor([1])

class Mul:
    @staticmethod
    def forward(x,y):
        return Tensor(x.data * y.data) # z = x*y
    
    @staticmethod
    def backward(ctx,grad):
        x,y = ctx.args
        return  Tensor(y.data), Tensor(x.data) #  dz/dx, dz/dy

Function 类用于存储所有我们已经应用过的函数/操作。例如，如果我们添加 x=10 和 y=20，Function 将有 fn.op = Add 和 fn.args = (10,20)。

在反向传播时，我们将函数对象作为上下文传递给 backward，以便我们能够找回原始参数。

让我们修改 add 和 mul


class Tensor:
    def __init__(self,data):
        self.data = data if isinstance(data,np.ndarray) else np.array(data)
        self._ctx = None
        
    def __add__(self,other):
        fn = Function(Add,self,other)
        result = Add.forward(self,other)
        result._ctx = fn
        return result

    def __mul__(self,other):
        fn = Function(Mul,self,other)
        result = Mul.forward(self,other)
        result._ctx = fn
        return result
        
    
    def __repr__(self):
        return f"tensor({self.data})"

所以当你执行一些操作时

首先，将所有与该操作相关的信息存储在 Function 对象中。
然后执行 op.forward
将所有信息存储在结果节点中
返回结果

如果你想查看这个图表，请创建一个新的 visualize.py 文件。

pip install graphviz
sudo apt-get install -y graphviz # IDK what to do for windows I use wsl

import graphviz
from tinytorch import *

G = graphviz.Digraph(format='png')
G.clear()
def visit_nodes(G:graphviz.Digraph,node:Tensor):
    uid = str(id(node))
    G.node(uid,f"Tensor: {str(node.data) } ")
    if node._ctx:
        ctx_uid = str(id(node._ctx))
        G.node(ctx_uid,f"Context: {str(node._ctx.op.__name__)}")
        G.edge(uid,ctx_uid)
        for child in node._ctx.args:
            G.edge(ctx_uid,str(id(child)))
            visit_nodes(G,child)


if __name__ == "__main__":
    x = Tensor([8])
    y = Tensor([5])    
    z = x+y
    visit_nodes(G,z)
    G.render(directory="vis",view=True)
    print(z)
    
    print(len(G.body))

import numpy as np

class Tensor:
    def __init__(self,data):
        self.data = data if isinstance(data,np.ndarray) else np.array(data)
        self._ctx = None
        
    def __add__(self,other):
        fn = Function(Add,self,other)
        result = Add.forward(self,other)
        result._ctx = fn
        return result

    def __mul__(self,other):
        fn = Function(Mul,self,other)
        result = Mul.forward(self,other)
        result._ctx = fn
        return result
        
    
    def __repr__(self):
        return f"tensor({self.data})"
    
class Function:
    def __init__(self,op,*args):
        self.op = op
        self.args = args        

class Add:
    @staticmethod
    def forward(x,y):
        return Tensor(x.data + y.data)
    
    @staticmethod
    def backward(ctx,grad):
        x,y = ctx.args
        return Tensor([1]),Tensor([1])

class Mul:
    @staticmethod
    def forward(x,y):
        return Tensor(x.data * y.data) # z = x*y
    
    @staticmethod
    def backward(ctx,grad):
        x,y = ctx.args
        return  Tensor(y.data), Tensor(x.data) #  dz/dx, dz/dy
    
if __name__ == "__main__":
    x = Tensor([8])
    y = Tensor([5])    
    z = x*y
    print(z)

截止到提交 dc11629 https://github.com/joey00072/tinytorch

现在睡觉，明天反向传播

社区

通过拖放到文本输入框、粘贴或点击此处上传图片、音频和视频。

点击或粘贴此处以上传图片

· 注册或登录评论