【转】使用openstack的autoscaling模板实现自动可伸缩集群

http://haodong.net.cn/2014/11/autoscaling/

背景

openstack的heat是H版本之后新加入的组件，旨在创建一套业务流程，更轻松的管理一个集群。集群之所以为集群，在于集群内部的虚机可以作为一个整体，统一给用户提供服务。这里我们要实现的是集群的可自动伸缩功能，heat中把功能定义为资源，heat在创建集群的过程中会用到nova, ceilometer, neutron等组件，而这些组件在heat中都可以看成资源，通过模板文件来描述，模板文件可以是yaml格式，也可以是json格式，一般常用yaml格式。

认识autoscaling

AutoScaling的概念最早出现在AWS，AutoScaling是一项Web服务，旨在根据用户定义的策略、时间表和运行状态检查启动或终止虚机。在I版本中，HOT模板资源有了较大的扩充，也包括对AutoScaling相关资源的支持。
根据官网提供的信息，Heat中的AutoScaling主要实现以下几个功能：
1. 用户可以使用给定的或自己写的AutoScaling模板创建一个可自动伸缩的集群。
2. 对外提供AutoScaling相关的接口。创建可伸缩集群不仅仅局限在首先要去创建一个模板，而是可以直接调用AutoScaling接口，显性得创建用户所需的集群。
3. 可伸缩的功能不仅仅应用在虚机管理上，对其他的资源也提供可伸缩运算。
4. 创建自动伸缩集群的同时，提供给用户其他资源，比如负载均衡，集群管理，虚机配置等。

现状

至此，I版本实现了如下几个目标：
1. 实现了根据模板创建一个可自动伸缩集群。
2. 基于模板，可以在创建模板的过程中实现对集群的管理以及虚机配置。
因为Heat把实体都定义为资源，其他资源比如负载均衡，IP池，Alarm等都和Server或者Cluster资源类似，对其他资源的可伸缩也可以做到。
I版本Heat没有向外提供AutoScaling相关的API，有了基本设计思路，目前还处于开发测试阶段，下面是官网的一个截图：

输入输出参数太多，可以访问http://docs.heatautoscale.apiary.io/查看，这里不一一列举。
从上面接口可以看到AutoScaling功能主要包含三个模块，分别是Scaling Group模块，Scaling Policy模块和Webhook模块，每个模块的具体含义在下文会介绍到。

autoscaling用到的资源介绍

实现AutoScaling功能主要用到三个资源ScalingGoup, ScalingPolicy, WebHook，三个资源的功能各不相同。
ScalingGroup资源定义ScalingGroup子资源数量范围（定义maxsize和minsize），定义子资源类型和子资源涵盖的各种资源。YAML文件中，ScalingGroup资源如下表示：

可以在group中定义可选server资源，当该server运行时，使用资源量超过某个阈值时触发webhook行为，继续增加server资源，具体执行方法由ScalingPolicy制定。ScalingPolicy资源制定ScalingGroup自动伸缩的策略，是增加资源还是减少资源，资源变化数量和cooldown时间。模板中定义如下：

Webhooks保存可执行的弹性策略触发时需要执行的endpoint。如下：

除了以上三个资源外，可伸缩还需要用到Ceilometer模块中的Alarm资源，该资源用户监控server的运行状况，一旦达到阈值时变触发Webhooks中的URL链接执行弹性策略规定的动作。Alarm资源结构如下：

autoscaling实现

我们借鉴openstack github平台上的官方模板创建可自动伸缩集群模，把模板中关于loadbalance部分抽离出来，仅仅使用autoscaling group， scaling polocy, alarm组件。时间一个“干净”的，最基础的可伸缩原型。模板如下：

heat_template_version: 2013-05-23description: AutoScaling Wordpressparameters:  image:    type: string    description: Image used for servers  key:    type: string    description: SSH key to connect to the servers  flavor:    type: string    description: flavor used by the web servers  NetID:    type: string    description: Network ID for the serverresources:  db:    type: OS::Nova::Server    properties:      flavor: {get_param: flavor}      image: {get_param: image}      key_name: {get_param: key}      networks:      - network: { get_param: NetID }  web_server_scaleup_policy:    type: OS::Heat::ScalingPolicy    properties:      adjustment_type: change_in_capacity      auto_scaling_group_id: {get_resource: web_server_group}      cooldown: 60      scaling_adjustment: 1  web_server_scaledown_policy:    type: OS::Heat::ScalingPolicy    properties:      adjustment_type: change_in_capacity      auto_scaling_group_id: {get_resource: web_server_group}      cooldown: 60      scaling_adjustment: -1  cpu_alarm_high:    type: OS::Ceilometer::Alarm    properties:      description: Scale-up if the average CPU > 50% for 1 minute      meter_name: cpu_util      statistic: avg      period: 50      evaluation_periods: 1      threshold: 30      insufficient_data_actions:        - {get_attr: [web_server_scaleup_policy, alarm_url]}      matching_metadata: {'metadata.user_metadata.stack': {get_param: "OS::stack_id"}}      comparison_operator: gt  cpu_alarm_low:    type: OS::Ceilometer::Alarm    properties:      description: Scale-down if the average CPU < 15% for 10 minutes      meter_name: cpu_util      statistic: avg      period: 300      evaluation_periods: 1      threshold: 10      alarm_actions:        - {get_attr: [web_server_scaledown_policy, alarm_url]}      matching_metadata: {'metadata.user_metadata.stack': {get_param: "OS::stack_id"}}      comparison_operator: lt  web_server_group:    type: OS::Heat::AutoScalingGroup    properties:      min_size: 1      max_size: 3      resource:        type: OS::Nova::Server        properties:          flavor: {get_param: flavor}          image: {get_param: image}          key_name: {get_param: key}          metadata: {"metering.stack": {get_param: "OS::stack_id"}}          networks:          - network: { get_param: NetID }outputs:  scale_up_url:    description: >      This URL is the webhook to scale up the autoscaling group.  You      can invoke the scale-up operation by doing an HTTP POST to this      URL; no body nor extra headers are needed.    value: {get_attr: [web_server_scaleup_policy, alarm_url]}  scale_dn_url:    description: >      This URL is the webhook to scale down the autoscaling group.      You can invoke the scale-down operation by doing an HTTP POST to      this URL; no body nor extra headers are needed.    value: {get_attr: [web_server_scaledown_policy, alarm_url]}  ceilometer_query:    value:      str_replace:        template: >          ceilometer statistics -m cpu_util          -q metadata.user_metadata.stack=stackval -p 600 -a avg        params:          stackval: { get_param: "OS::stack_id" }

该模板实现的功能是:
当cpu五十秒内平均负载超过50%时，在集群里增加一台虚机。
当cpu五分钟内低于10%时，自动减少一台虚机。
可能在创建过程中会遇到如下问题：
1. 比如运行ceilometer alarm-list时，alarm的状态为insufficent_data，可能是因为ceilometer想meter请求数据时间较长，可以在/etc/ceilometer/pipeline.yaml文件修改interval参数默认的请求间隔为600秒。
2. 创建过程中遇到的问题可以在/var/log/ceilometer/alarm-evaluator.log, compute.log日志中找到，比如博主因为模板中少加了一个参数，导致无法想meter请求数据，问题纠结了几天，最终查看compute.log文件得到提示。

最后启动完了heat之后，可以使用alarm-list, alarm-history -a, 和alarm-show -a查看alarm状态。博主的结果如下：

至此，一个可伸缩集群创建完毕，接下来需要测试，看是否当cpu负载增大时，集群数量会动态改变。

测试

测试代码如下，功能是使得单机cpu使用率达到75%以上：

import java.io.IOException;public class CPUTest {        public static void main(String[] args) {                CPUTestThread cpuTestThread = new CPUTestThread();                for (int i = 0; i < 3; i++) {                        Thread cpuTest = new Thread(cpuTestThread);                        cpuTest.start();                }                try {                        Runtime.getRuntime().exec("taskmgr");                } catch (IOException e1) {                        e1.printStackTrace();                }        }}class CPUTestThread implements Runnable {        public void run() {                int busyTime = 10;                int idleTime = busyTime;                long startTime = 0;                while (true) {                    startTime = System.currentTimeMillis();                    System.out.println(System.currentTimeMillis()+","+startTime+","+(System.currentTimeMillis() - startTime));                    while ((System.currentTimeMillis() - startTime) <= busyTime){}                    try {                        Thread.sleep(idleTime);                    } catch (InterruptedException e) {                        System.out.println(e);                    }                }        }}

测试通过。

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。