keealived-vrrp_script

weight vs. priority

先从代码里分析一下 vrrp_instance 的 priority 值是怎么计算出来的:

  709 /* Update VRRP effective priority based on multiple checkers.  710  * This is a thread which is executed every adver_int.  711  */  712 static int  713 vrrp_update_priority(thread_t * thread)  714 {  715   vrrp_rt *vrrp = THREAD_ARG(thread);  716   int prio_offset, new_prio;  717   718   /* compute prio_offset right here */  719   prio_offset = 0;  720   721   /* Now we will sum the weights of all interfaces which are tracked. */  722   if ((!vrrp->sync || vrrp->sync->global_tracking) && !LIST_ISEMPTY(vrrp->track_ifp))  723        prio_offset += vrrp_tracked_weight(vrrp->track_ifp);  724   725   /* Now we will sum the weights of all scripts which are tracked. */  726   if ((!vrrp->sync || vrrp->sync->global_tracking) && !LIST_ISEMPTY(vrrp->track_script))  727       prio_offset += vrrp_script_weight(vrrp->track_script);  728   729   if (vrrp->base_priority == VRRP_PRIO_OWNER) {  730       /* we will not run a PRIO_OWNER into a non-PRIO_OWNER */  731       vrrp->effective_priority = VRRP_PRIO_OWNER;  732   } else {  733       /* WARNING! we must compute new_prio on a signed int in order  734          to detect overflows and avoid wrapping. */  735       new_prio = vrrp->base_priority + prio_offset;  736       if (new_prio < 1)  737           new_prio = 1;  738       else if (new_prio > 254)  739           new_prio = 254;  740       vrrp->effective_priority = new_prio;  741   }  742   743   /* Register next priority update thread */  744   thread_add_timer(master, vrrp_update_priority, vrrp, vrrp->adver_int);  745   return 0;  746 }

由上面代码可以看到，每个 vrrp_instance 的 priority 由线程计算，最终 priority 的值由配置值 (vrrp->base_priority)和各个脚本 weight 值的总和相加而来。同时控制了最终取值范围在 1-254 之间。

再来看一下，”各个脚本的 weight 值的总和” 是如何计算出来的:

  209 /* Returns total weights of all tracked scripts :  210  * - a positive weight adds to the global weight when the result is OK  211  * - a negative weight subtracts from the global weight when the result is bad  212  *  213  */  214 int  215 vrrp_script_weight(list l)  216 {  217   element e;  218   tracked_sc *tsc;  219   int weight = 0;  220   221   for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {  222       tsc = ELEMENT_DATA(e);  223       if (tsc->scr->result == VRRP_SCRIPT_STATUS_DISABLED)  224           continue;  225       if (tsc->scr->result >= tsc->scr->rise) {  226           if (tsc->weight > 0)  227               weight += tsc->weight;  228       } else if (tsc->scr->result < tsc->scr->rise) {  229           if (tsc->weight < 0)  230               weight += tsc->weight;  231       }  232   }  233   234   return weight;  235 }

等等， result 又表示什么意思？

result

  989 static int  990 vrrp_script_child_thread(thread_t * thread)  991 { .... 1014   wait_status = THREAD_CHILD_STATUS(thread); 1015  1016   if (WIFEXITED(wait_status)) { 1017       int status; 1018       status = WEXITSTATUS(wait_status); 1019       if (status == 0) { 1020           /* success */ 1021           if (vscript->result < vscript->rise - 1) { 1022               vscript->result++; 1023           } else { 1024               if (vscript->result < vscript->rise) 1025                   log_message(LOG_INFO, "VRRP_Script(%s) succeeded", vscript->sname); 1026               vscript->result = vscript->rise + vscript->fall - 1; 1027           } 1028       } else { 1029           /* failure */ 1030           if (vscript->result > vscript->rise) { 1031               vscript->result--; 1032           } else { 1033               if (vscript->result >= vscript->rise) 1034                   log_message(LOG_INFO, "VRRP_Script(%s) failed", vscript->sname); 1035               vscript->result = 0; 1036           } 1037       } 1038   } 1039  1040   return 0; 1041 }

rise 在文档中的含义是连接检测成功 rise 次时，才认为此 vrrp_script 是正常状态。fall 在文档中的含义与 rise 类似，在检测失败 fall 次后才认为此 vrrp_script 处于异常状态。

再来看一段来自 vrrp_track.h:45 的注释:

/* VRRP script tracking results. * The result is an integer between 0 and rise-1 to indicate a DOWN state, * or between rise-1 and rise+fall-1 to indicate an UP state. Upon failure, * we decrease result and set it to zero when we pass below rise. Upon * success, we increase result and set it to rise+fall-1 when we pass above * rise-1. */                     rise             rise+fall-1+------------------++----------------+0       DOWN       rise-1  UP

上面的说明和图例，表示了 result 值变化的范围和代表的 vrrp_instance 状态。

而 result 的初始值在 vrrp_init_script 中设定:

  291 /* if run after vrrp_init_state(), it will be able to detect scripts that  292  * have been disabled because of a sync group and will avoid to start them.  293  */  294 static void  295 vrrp_init_script(list l)  296 {  297   vrrp_script *vscript;  298   element e;  299   300   for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {  301       vscript = ELEMENT_DATA(e);  302       if (vscript->inuse == 0)  303           vscript->result = VRRP_SCRIPT_STATUS_DISABLED;  304   305       if (vscript->result == VRRP_SCRIPT_STATUS_INIT) {  306           vscript->result = vscript->rise - 1; /* one success is enough */  307           thread_add_event(master, vrrp_script_thread, vscript, vscript->interval);  308       } else if (vscript->result == VRRP_SCRIPT_STATUS_INIT_GOOD) {  309           vscript->result = vscript->rise; /* one failure is enough */  310           thread_add_event(master, vrrp_script_thread, vscript, vscript->interval);  311       }  312   }  313 }

而 inuse 的初始值为0，在 track_script 中引用后，inuse++，其值变为1。最终，result 的初始值在 keepalived 启动时 (STATUS_INIT) 被赋值为 rise-1 /* (failure bug) one success is enought /；在 keepalived 重新启动时 (STATUS_INIT_GOOD) 时，被赋值为 rise / (success but) one failure is enough */。再结合上面 vrrp_script_child_thread 的代码，第一次检测，就可以确定 vrrp_instance 是处于正常或是异常状态。

回到开始

现在已经搞清楚了 result 和 vrrp_script 的状态关系，那么再回过头来看一下 vrrp_script_weight 在每一次检测时，对 weight 值的计算过程:

  225       if (tsc->scr->result >= tsc->scr->rise) {  226           if (tsc->weight > 0)  227               weight += tsc->weight;  228       } else if (tsc->scr->result < tsc->scr->rise) {  229           if (tsc->weight < 0)  230               weight += tsc->weight;  231       }

Conclusion

如果 vrrp_script 处于正常状态 (tsc->scr->result >= tsc->scr->rise)，并且 vrrp_script 本身的设定 weight 是正值，这个值会被加到脚本权值之和，并且最终加入到 vrrp_instance 的 priority 值中。如果 weight 是负值，它会被忽略，不对 priority 产生任何影响。

如果 vrrp_script 处于异常状态 (tsc->scr->result < tsc->scr->rise)，并且 vrrp_script 本身的设定 weight 是负值，这个值会被从脚本的权值之和中减去，并且最终造成 vrrp_instance 的 priority` 值减少。如果 weight 是正值，它会被忽略，不对 priority 产生任何影响。

在测试用的例子中，MASTER 的 priority 配置值为 100， SLAVE 的 priority 配置值为 99。设定两个 vrrp_script, A, B，每个脚本的 weight 设定为 10。

根据上面的分析，在 vrrp_script 的值为正数时，如果脚本检测失败，只是不会在 priority 上增加此脚本的权值。而当A为-10，B为10时，由上面的分析，这时 MASTER priority值为100。按理说不会引发主从的切换，但是从日志观察到的情况刚好相反。

对上述现象调试 keepalived 发现，原因是 SLAVE 的权值99，加上其vrrp_script的权值10，最终 SLAVE 权值为 109，高于 (100 + 10 (B) – 10 (A))。最终造成了 MASTER 状态切换。

In The End

It starts with one thing
I don’t know why
It doesn’t even matter how hard you try

Keepalived vrrp_script 的 weight 设定也是一个颇为 tricky 的事儿啊。上面分析得出结论: 在使用 Keepalived 的 vrrp 做主从 failover 时，保持两边的设定一致，并且选择一个合适的 priority 值，都是很关键的。

In The Very Ending

vrrp_script 里的script返回值为0时认为检测成功，其它值都会当成检测失败 (从代码里验证过)。

weight 为正时，脚本检测成功时此weight会加到priority上，检测失败时不加。
主失败:
主 priority < 从 priority + weight 时会切换。
主成功：
主 priority + weight > 从 priority + weight 时，主依然为主
weight 为负时，脚本检测成功时此weight不影响priority，检测失败时priority – abs(weight)
主失败:
主 priority – abs(weight) < 从priority 时会切换主从
主成功:
主 priority > 从priority 主依然为主。

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。