1

问题

upstream不可resolve会阻塞start/reload

使用nginx时, upstream中若有server不可resolve. start/reload都会失败.
当把nginx作为网关时, 可能有多个服务, 任意服务挂掉. 导致整个网关不可启动. 这明显是不合理的.

only resolve on start/reload

默认情况下, nginx会在start/reload时解析upstream里的server, 并缓存ips. 因为docker容器重启时, ip会变化, 这会导致缓存的ip失效.

解决方案

Use nginx variables

https://serverfault.com/questions/700894/make-nginx-ignore-site-config-when-its-upstream-cannot-be-reached
https://stackoverflow.com/questions/32845674/setup-nginx-not-to-crash-if-host-in-upstream-is-not-found
https://sandro-keil.de/blog/let-nginx-start-if-upstream-host-is-unavailable-or-down/
如:

    resolver 127.0.0.11 valid=30s;
    location / {
        set $target xxx_server_name:3031;
        uwsgi_pass $target;
    }

优点

可以解决上述两个问题.

缺点

  1. 不支持多个server配置.
  2. 无法使用nginx upstream带来的好处, 如各种负载均衡策略等等.

tengine ngx_http_upstream_dynamic

https://github.com/alibaba/tengine/blob/master/modules/ngx_http_upstream_dynamic_module/ngx_http_upstream_dynamic_module.c
https://tengine.taobao.org/document_cn/http_upstream_dynamic_cn.html
如:

upstream backend {
    dynamic_resolve fallback=stale fail_timeout=30s;

    server a.com;
    server b.com;
}

server {
    ...

    location / {
        proxy_pass http://backend;
    }
}

优点

  1. 大厂, 用的人多.
  2. 可以解决only resolve on start/reload问题.
  3. 可以和其它nginx-module配合.

缺点

  1. start/reload时有不可resolve的server会失败.

jdomain

https://github.com/wdaike/ngx...

upstream backend {                                                        
    jdomain www.baidu.com port=80;                                                    
    jdomain www.baidu.com port=81; # 只有一个有效
}

优点

可以解决上述两个问题.

缺点

  1. 不支持server配置, 不支持多个server, 即jdomain的配置, 无法和其它nginx-module配合.
  2. 不支持未resolved情况下的启动 (有一个pr可以解决这个问题). https://github.com/wdaike/ngx_upstream_jdomain/pull/12

nginx-upstream-dynamic-servers

https://github.com/GUI/nginx-...

优点

  1. 可以解决上述两个问题.
  2. 可以和其它nginx-module配合.

缺点

  1. 替换了nginx server directive, 需要随着nginx版本变更维护. 作者测试了的nginx version: 1.6, 1.7, 1.8, 1.9
  2. 用的人不够多.

维护和测试

我们使用的是tengine-2.2.2, 看代码可以知道. 替换tengine server directive会导致id/host 没有设置到ngx_http_upstream_server_t结构体中. 因为id/host在我们的使用场景中都没有用到. 所以没有问题. 如果要用到这2个配置, 可以考虑fork一份代码出来改.

nginx-upstream-dynamic-servers

ngx_http_upstream_dynamic_servers.c:260

#if nginx_version >= 1007002
    us->name = u.url;
#endif
    us->addrs = u.addrs;
    us->naddrs = u.naddrs;
    us->weight = weight;
    us->max_fails = max_fails;
    us->fail_timeout = fail_timeout;

    return NGX_CONF_OK;
tengine-2.2.2

ngx_http_upstream_server_t:

typedef struct {
    ngx_str_t                        name;
    ngx_addr_t                      *addrs;
    ngx_uint_t                       naddrs;
    ngx_uint_t                       weight;
    ngx_uint_t                       max_fails;
    time_t                           fail_timeout;
    ngx_str_t                        id;
    ngx_str_t                        host;

    unsigned                         down:1;
    unsigned                         backup:1;
} ngx_http_upstream_server_t;

ngx_http_upstream.c:5610

    us->name = u.url;
    us->addrs = u.addrs;
    us->naddrs = u.naddrs;
    us->host = u.host;
    us->weight = weight;
    us->max_fails = max_fails;
    us->fail_timeout = fail_timeout;
    us->id = id;

    return NGX_CONF_OK;
测试

因为nginx-upstream-dynamic-servers需要随着nginx/tengine版本升级而维护server directive, 所以这里提一下测试.
https://github.com/alibaba/tengine/wiki/How-to-test

diff --git a/tests/nginx-tests/nginx-tests/upstream.t b/tests/nginx-tests/nginx-tests/upstream.t
index debbf5d..5dd670b 100644
--- a/tests/nginx-tests/nginx-tests/upstream.t
+++ b/tests/nginx-tests/nginx-tests/upstream.t
@@ -37,8 +37,8 @@ http {
     %%TEST_GLOBALS_HTTP%%
 
     upstream u {
-        server 127.0.0.1:8081 max_fails=3 fail_timeout=10s;
-        server 127.0.0.1:8082 max_fails=3 fail_timeout=10s;
+        server 127.0.0.1:8081 max_fails=3 fail_timeout=10s resolve;
+        server 127.0.0.1:8082 max_fails=3 fail_timeout=10s resolve;
     }
~/git/tengine/tests(52fff0e*) » TEST_NGINX_BINARY=/usr/sbin/nginx prove -r nginx-tests
...
nginx-tests/nginx-tests/upstream.t ........................... ok   
nginx-tests/nginx-tests/upstream_hash.t ...................... ok     
nginx-tests/nginx-tests/upstream_hash_memcached.t ............ skipped: Cache::Memcached not installed
nginx-tests/nginx-tests/upstream_ip_hash.t ................... skipped: no realip available
nginx-tests/nginx-tests/upstream_least_conn.t ................ ok   
nginx-tests/nginx-tests/upstream_zone_ssl.t .................. skipped: no http_ssl available
nginx-tests/nginx-tests/userid.t ............................. 1/34 skip() needs to know $how_many tests are in the block at nginx-tests/nginx-tests/userid.t line 185
...

总结

Use nginx variables/jdomain 无法和其它nginx-upstream-module很好地配合使用, 不推荐在要求较高的产品中使用.
nginx-upstream-dynamic-servers/tengine ngx_http_upstream_dynamic 可以按实际情况使用.


enjolras1205
77 声望9 粉丝