graphene_django 源码走读

前言

最近项目在使用GraphQL但是对其内部实现不是很了解, 能自由的指定查询字段以及自定义返回格式,这样的灵活度可以说非常高了,自由度越高也就意味着后面的复杂,查询百度真的找不到一篇能看的,遂产生了一些想法:

GraphQL 到底是个啥
GraphQL 何如与 django 集成的
又是如何驱动django orm来执行查询操作的

刚开始直接用 IDE 看源码了,看来一会发现不对劲,内部包含了各种接口和协议又对应的实现又分散在不同的库中,不是独立的小项目, 想明白 GraphQL 和 db 是如何打通的,需要对orm熟悉,然后再打断点重复看几次才能理出点思路,直接看源码涉及的文件和流程太多,需要逐个击破,层层递进. 目前的思路是:

了解 graphsql的基本使用和规范,各个语言的库都是对此规范的实现,明白各个名词的意思.
大致了解这是三个核心库的使用 graphene_django,graphene,graphql(有包括graphql-core,graphql-relay). 本次走读中三者是套娃结构.都是前者对后者的基础上封装和调用.
再看 graphene_django 是如何缝合graphene 和 django,在 django 中graphene_django库和graphene是混合使用的,有些地方层次不清
再串起来看 orm 在整个流程中是如何被传递和执行的,其中主要关注graphql中是如何依据传入的参数去执行函数
阅读源码过程中依赖的相关知识: graphql的基础知识;promis 的链式调用;

环境配置

graphene==2.1.8
graphene-django==2.11.0
graphql-core==2.3.2
graphql-relay==2.0.1

Python 行断点

Python 行断点
mapp/cluster/types.py
python3.8/site-packages/graphene_django/views.py
views.py:68
views.py:280
views.py:245
views.py:165
views.py:122
python3.8/site-packages/graphql/backend/core.py
core.py:32
core.py:45
python3.8/site-packages/graphql/execution/executor.py
executor.py:144
executor.py:531
executor.py:452
executor.py:365
executor.py:113
executor.py:59
python3.8/site-packages/graphql/execution/utils.py
utils.py:59

一些记录

lib/python3.8/site-packages/graphql/execution/executor.py:144
graphql.execution.executor.execute
Promise.resolve(None).then(promise_executor).catch(on_rejected).then(on_resolve)
promise 的链式调用

/Users/liuguojin/gitlab/venv_2/lib/python3.8/site-packages/graphene_django/debug/middleware.py:51
graphene_django.debug.middleware.DjangoDebugMiddleware
gql 获取 db 的 context,并将查询任务构成的promise加入到查询的任务列表中.

graphql_relay.node.node.from_global_id
graphql_relay.utils.unbase64
这里做的 base64转换.

graphene_django.types.DjangoObjectType.get_node
到这里使用 node 模式的默认方式获得查询语句.
graphene.relay.node.Node.node_resolver
主要和 db 对接的地方
graphql.backend.core.GraphQLCoreBackend

setting.DEBUG最好设置为 Flase,不然会把下面的类引入,没有必要看
graphene_django.debug.middleware.DjangoDebugContext
graphene_django.debug.middleware.DjangoDebugMiddleware

GraphQLView 核心代码

graphene_django.views.GraphQLView , 继承了django.views.generic.base.View实现了对 view 的处理函数 dispatch, get_response.
其中execute_graphql_request实现了调用graphql的逻辑,下面是对GraphQLView的一些注解, 这部分熟悉 django 的都理解.

class GraphQLView(View):
    graphiql_version = "0.14.0"
    graphiql_template = "graphene/graphiql.html"
    react_version = "16.8.6"

    schema = None
    graphiql = False
    executor = None
    backend = None
    middleware = None
    root_value = None
    pretty = False
    batch = False

    def __init__(
            self,
            schema=None,
            executor=None,
            middleware=None,
            root_value=None,
            graphiql=False,
            pretty=False,
            batch=False,
            backend=None,
    ):
        if not schema:
            schema = graphene_settings.SCHEMA

        if backend is None:
            backend = get_default_backend()

        if middleware is None:
            middleware = graphene_settings.MIDDLEWARE

        self.schema = self.schema or schema
        if middleware is not None:
            if isinstance(middleware, MiddlewareManager):
                self.middleware = middleware
            else:
                self.middleware = list(instantiate_middleware(middleware))
        self.executor = executor
        self.root_value = root_value
        self.pretty = self.pretty or pretty
        self.graphiql = self.graphiql or graphiql
        self.batch = self.batch or batch
        self.backend = backend

    @method_decorator(ensure_csrf_cookie)
    def dispatch(self, request, *args, **kwargs):  # 重写父类的 dispatch 函数,调用自己实现的get_response
        try:
            # 删减了校验和其他不重要的分支代码
            data = self.parse_body(request)
            show_graphiql = self.graphiql and self.can_display_graphiql(request, data)
            result, status_code = self.get_response(request, data, show_graphiql)
            return HttpResponse(
                status=status_code, content=result, content_type="application/json"
            )
        except HttpError as e:
            response = e.response
            response["Content-Type"] = "application/json"
            response.content = self.json_encode(
                request, {"errors": [self.format_error(e)]}
            )
            return response

    def get_response(self, request, data, show_graphiql=False):  # 处理 gql
        query, variables, operation_name, id = self.get_graphql_params(request, data)  # 获取参数
        
        # 关键语句
        execution_result = self.execute_graphql_request(
            request, data, query, variables, operation_name, show_graphiql
        )
        status_code = 200
        if execution_result:
            # 删减了部分代码
            result = self.json_encode(request, response, pretty=show_graphiql)
        else:
            result = None
        return result, status_code

    def execute_graphql_request(self, request, data, query, variables, operation_name, show_graphiql=False):
        # 对接graphql,满足其 execute 的调用条件, 你可以在这里看到非常相似的代码:
        # python3.8/site-packages/graphql/graphql.py(graphql.graphql.execute_graphql)
        try:
            backend = self.get_backend(request)
            # 获取 graphql.backend.base.GraphQLDocument
            # schema(注册表)是当前应用支持的操作合集
            document = backend.document_from_string(self.schema, query) # 初始化GraphQLDocument来适配graphql库
        except Exception as e:
            return ExecutionResult(errors=[e], invalid=True)
        try:
            extra_options = {}
            if self.executor:
                # We only include it optionally since
                # executor is not a valid argument in all backends
                extra_options["executor"] = self.executor

            # 开始执行查询,之后就得阅读graphql的源码
            return document.execute(
                root_value=self.get_root_value(request),
                variable_values=variables,
                operation_name=operation_name,
                context_value=self.get_context(request),
                middleware=self.get_middleware(request),
                **extra_options
            )
        except Exception as e:
            return ExecutionResult(errors=[e], invalid=True)

    def json_encode(self, request, d, pretty=False):  # 格式化返回
        if not (self.pretty or pretty) and not request.GET.get("pretty"):
            return json.dumps(d, separators=(",", ":"))

        return json.dumps(d, sort_keys=True, indent=2, separators=(",", ": "))

    def get_backend(self, request):
        return self.backend
    
    def parse_body(self, request):  # 获取请求参数
        content_type = self.get_content_type(request)
        if content_type == "application/graphql":
            return {"query": request.body.decode()}

    @staticmethod
    def get_graphql_params(request, data):
        # 获取参数
        query = request.GET.get("query") or data.get("query")
        variables = request.GET.get("variables") or data.get("variables")
        id = request.GET.get("id") or data.get("id")

        if variables and isinstance(variables, six.text_type):
            try:
                variables = json.loads(variables)
            except Exception:
                raise HttpError(HttpResponseBadRequest("Variables are invalid JSON."))

        operation_name = request.GET.get("operationName") or data.get("operationName")
        if operation_name == "null":
            operation_name = None
        return query, variables, operation_name, id

execution.executor.execute 连环套

graphql.execution.executor.execute 真正开始准备执行查询的地方,涉及到了 promise 的用法
熟悉 JavaScript 的会觉得很熟悉,就是 promise 的链式调用,解决 callback 的方式.代码中使用到的promise就是 promise 的 python 实现库.

def execute(
    schema,  # type: GraphQLSchema
    document_ast,  # type: Document
    root_value=None,  # type: Any
    context_value=None,  # type: Optional[Any]
    variable_values=None,  # type: Optional[Any]
    operation_name=None,  # type: Optional[str]
    executor=None,  # type: Any
    return_promise=False,  # type: bool
    middleware=None,  # type: Optional[Any]
    allow_subscriptions=False,  # type: bool
    **options  # type: Any
):
    # type: (...) -> Union[ExecutionResult, Promise[ExecutionResult]]

    if executor is None:
        executor = SyncExecutor()

    exe_context = ExecutionContext(
        schema,
        document_ast,
        root_value,
        context_value,
        variable_values or {},
        operation_name,
        executor,
        middleware,
        allow_subscriptions,
    )

    def promise_executor(v):
        # type: (Optional[Any]) -> Union[Dict, Promise[Dict], Observable]
        return execute_operation(exe_context, exe_context.operation, root_value)

    def on_rejected(error):
        # type: (Exception) -> None
        exe_context.errors.append(error)
        return None

    def on_resolve(data):
        # type: (Union[None, Dict, Observable]) -> Union[ExecutionResult, Observable]
        if isinstance(data, Observable):
            return data

        if not exe_context.errors:
            return ExecutionResult(data=data)

        return ExecutionResult(data=data, errors=exe_context.errors)

    # Promise的链式调用, 大致的意思就是实例化一个Promise然后执行promise_executor 要是发生异常就执行on_rejected,没有异常就执行on_resolve
    promise = (
        Promise.resolve(None).then(promise_executor).catch(on_rejected).then(on_resolve)
    )

    if not return_promise:
        exe_context.executor.wait_until_finished()  # 默认是不返回的promise对象的,而是等待查询结果
        return promise.get()
    else:
        clean = getattr(exe_context.executor, "clean", None)
        if callable(clean):
            clean()

    return promise

promise 介绍

Promise主要基于回调，Python asyncio主要基于事件循环，两者类似，后者封装更多。
Promise的实现过程，其主要使用了设计模式中的观察者模式：

通过Promise.prototype.then和Promise.prototype.catch方法将观察者方法注册到被观察者Promise对象中，同时返回一个新的Promise对象，以便可以链式调用。
被观察者管理内部pending、fulfilled和rejected的状态转变，同时通过构造函数中传递的resolve和reject方法以主动触发状态转变和通知观察者。
简单实现

field

然后剩下的注意力就得放在graphql.execution.executor.resolve_field这个函数上了,这部分也很复杂, 我们先使用 graphene demo 开始. 其实下面的示例就可以脱离 django 和 graphene_django的使用graphene给的 demo 来调试了.

from graphene import ObjectType, String, Schema

class Query(ObjectType):
    # this defines a Field `hello` in our Schema with a single Argument `name`
    hello = String(name=String(default_value="stranger"))
    goodbye = String()

    # our Resolver method takes the GraphQL context (root, info) as well as
    # Argument (name) for the Field and returns data for the query Response
    def resolve_hello(root, info, name):
        print('resolve_hello for debug')
        return f'Hello {name}!'

    def resolve_goodbye(root, info):
        print('resolve_goodbye for debug')
        return 'See ya!'

schema = Schema(query=Query)

# we can query for our field (with the default argument)
query_string = '{ hello }'
result = schema.execute(query_string)
print(result.data['hello'])
# "Hello stranger!"

可以在这里schema._type_map.Query.fields 看到我们所有定义的处理函数,上面的例子其处理如下,根据graphene的约定处理函数前缀默认加resolve_,这里是处理过后的.

{'hello': <graphql.type.definition.GraphQLField object at 0x10f180220>, 'goodbye': <graphql.type.definition.GraphQLField object at 0x10f180270>}

继续执行就会来到graphql.execution.executor.execute 这里,到这里执行前需要的所有必要参数都已准备好,接下就就得认真看看怎么执行的.


class GraphQLCoreBackend(GraphQLBackend):
    """GraphQLCoreBackend will return a document using the default
    graphql executor"""

    def __init__(self, executor=None):
        # type: (Optional[Any]) -> None
        self.execute_params = {"executor": executor}

    def document_from_string(self, schema, document_string):
        # type: (GraphQLSchema, Union[Document, str]) -> GraphQLDocument
        if isinstance(document_string, ast.Document):
            document_ast = document_string
            document_string = print_ast(document_ast) # 类似于路由匹配,这里将决定执行哪个resolve_XXXX
        else:
            assert isinstance(
                document_string, string_types
            ), "The query must be a string"
            document_ast = parse(document_string)   # 类似于路由匹配,这里将决定执行哪个resolve_XXXX
        # document_ast 是一个selection_set组成的嵌套结构,具体由document_string的复杂度决定,每个层次都包含了需要执行的 Field,这里要结合get_field_def来看.
        return GraphQLDocument(
            schema=schema,
            document_string=document_string,    
            document_ast=document_ast,  # 注意下这个参数
            execute=partial(    # 注意下这个参数
                execute_and_validate, schema, document_ast, **self.execute_params
            ),
            # 这里使用了 python 高阶函数中的partial(偏函数),将预先可以提供的参数先提供给之后将会调用的execute函数
        )

需要注意真正的处理函数是怎么在下面这些函数中传递(挺绕的):

schema-> document ->execute->execute_operation->collect_fields->execute_fields->resolve_field->field_def

到这里得到result然后关联上对应的 query 返回即可,这里不再细说返回的代码.

    # graphql.execution.executor.resolve_field 包含了很多细节,这里大概说两个.
    # 这一句很关键,决定了到底是由哪个 field 来执行, parent_type中包含了当前方法 Query 中所有的应用中定义的 query 方法.
    # graphql.type.schema.GraphQLSchema
    field_def = get_field_def(exe_context.schema, parent_type, field_name)
    # ...........省略了部分代码
    executor = exe_context.executor # 选择执行器
    # 将有所需要执行的函数(resolve_fn_middleware), 参数(args), 执行函数的执行器(executor)
    result = resolve_or_error(resolve_fn_middleware, source, info, args, executor)

执行器就是调用 fn 并把参数打包给 fn,这也是为什么resolve_hello中的前两个参数是固定为root, info的原因.


class SyncExecutor(object):
    def wait_until_finished(self):
        # type: () -> None
        pass

    def clean(self):
        pass

    def execute(self, fn, *args, **kwargs):
        # type: (Callable, *Any, **Any) -> Any
        return fn(*args, **kwargs)

到这里 demo 的 query 操作基本就明白了,接着去看看 query 是怎么调用 orm 的操作方法,其实这部分操作主要得看graphene-django是如何封装的,
其逻辑的起点在这里 graphene_django.types.DjangoObjectType

class DjangoObjectType(ObjectType):
    @classmethod
    def __init_subclass_with_meta__(    # 类似元类的操作具体可以查看 __init_subclass__魔术方法,作用是定制子类属性.
        cls,
        model=None,
        registry=None,
        skip_registry=False,
        only_fields=None,  # deprecated in favour of `fields`
        fields=None,
        exclude_fields=None,  # deprecated in favour of `exclude`
        exclude=None,
        filter_fields=None,
        filterset_class=None,
        connection=None,
        connection_class=None,
        use_connection=None,
        interfaces=(),
        convert_choices_to_enum=True,
        _meta=None,
        **options
    ):
        # 一堆限定条件和对外提供的参数来控制查询范围,方式等
        pass
    
    @classmethod
    def get_queryset(cls, queryset, info):
        return queryset

    @classmethod
    def get_node(cls, info, id):
        queryset = cls.get_queryset(cls._meta.model.objects, info)  # 这句将 django model 和 node 关联起来,之后操作 node 即可.
        try:
            return queryset.get(pk=id)
        except cls._meta.model.DoesNotExist:
            return None

流程基本理清了,但是 sql 执行的具体细节还需要看,这就得回头看graphene_django中是如何包装graphql-core和graphql-relay以及orm的,这部分真的太冗长了,不仅要对协议和各种名词熟悉,还要在三个库中来回跳, 后续有空再补充吧.

总结

看到这里的基本就该来个总结了，慢慢接触和使用GraphQL过程中我的内心是很别扭的。
GraphQL的核心优势就是一次性获取资源，看上去是炫酷，炫酷的骚操作有些复杂的方法来实现也可以接受，但想配合这些复杂的实现来插入些自己的特殊需求那就真的考验开发者的实力了,走读源码的过程也验证了我之前的猜想,用起来有多方便,实现起来就有多少复杂,其中包含的接口和规范不是简单看看就能上手改动的,小项目对优化没什么要求,用到的操作仅限于实例的可以尝试，超过范围的就进去了开发的深水区,能看的文档都罗列在参考中了。一句话:REST真简单，GraphQL如乱码,未来还任重而道远。

参考

https://graphql.cn/learn/
https://spec.graphql.cn/
https://docs.graphene-python.org/en/latest/execution/

https://github.com/graphql-python
https://github.com/graphql-python/graphql-core
https://github.com/graphql-python/graphql-relay-py
https://mengera88.github.io/2017/05/18/Promise%E5%8E%9F%E7%90%86%E8%A7%A3%E6%9E%90/
https://github.com/syrusakbary/promise
https://www.jianshu.com/p/ca1dfc5b4b4f
https://www.zhihu.com/question/38596306 GraphQL 为何没有火起来?

graphene_django 源码走读

前言

环境配置

Python 行断点

一些记录

GraphQLView 核心代码

execution.executor.execute 连环套

promise 介绍

field

总结

参考

不悟

引用和评论

将react构建的cli项目打包成二进制

Anaconda安装教程以及Anaconda和pip配置国内镜像

科学计算编程涉及到的技术栈简介

使用 chardet 判断文件编码需要注意的坑——过大的文件会导致高耗时

Python3 格式化时间（qbit）

manus 的替代品有哪些？使用LLM大模型技术做手机/网页/浏览器自动化操作技术汇总

怎么判断自己下载的 trae 是国际版还是国内版？