2.16.3.1.21. Yardstick Test Case Description TC092¶

SDN Controller resilience in HA configuration
test case id	OPNFV_YARDSTICK_TC092: SDN controller resilience and high availability HA configuration
test purpose	This test validates SDN controller node high availability by verifying there is no impact on the data plane connectivity when one SDN controller fails in a HA configuration, i.e. all existing configured network services DHCP, ARP, L2, L3VPN, Security Groups should continue to operate between the existing VMs while one SDN controller instance is offline and rebooting. The test also validates that network service operations such as creating a new VM in an existing or new L2 network network remain operational while one instance of the SDN controller is offline and recovers from the failure.
test method	This test case: fails one instance of a SDN controller cluster running in a HA configuration on the OpenStack controller node checks if already configured L2 connectivity between existing VMs is not impacted verifies that the system never loses the ability to execute virtual network operations, even when the failed SDN Controller is still recovering
attackers	In this test case, an attacker called “kill-process” is needed. This attacker includes three parameters: `fault_type`: which is used for finding the attacker’s scripts. It should be set to ‘kill-process’ in this test `process_name`: should be set to sdn controller process `host`: which is the name of a control node where opendaylight process is running example: `fault_type`: “kill-process” `process_name`: “opendaylight-karaf” (TBD) `host`: node1
monitors	In this test case, the following monitors are needed `ping_same_network_l2`: monitor pinging traffic between the VMs in same neutron network `ping_external_snat`: monitor ping traffic from VMs to external destinations (e.g. google.com) `SDN controller process monitor`: a monitor checking the state of a specified SDN controller process. It measures the recovery time of the given process.
operations	In this test case, the following operations are needed: “nova-create-instance-in_network”: create a VM instance in one of the existing neutron network.
metrics	In this test case, there are two metrics: process_recover_time: which indicates the maximun time (seconds) from the process being killed to recovered packet_drop: measure the packets that have been dropped by the monitors using pktgen.
test tool	Developed by the project. Please see folder: “yardstick/benchmark/scenarios/availability/ha_tools”
references	TBD
configuration	This test case needs two configuration files: 1. test case file: opnfv_yardstick_tc092.yaml Attackers: see above “attackers” discription Monitors: see above “monitors” discription waiting_time: which is the time (seconds) from the process being killed to stoping monitors the monitors SLA: see above “metrics” discription POD file: pod.yaml The POD configuration should record on pod.yaml first. the “host” item in this test case will use the node name in the pod.yaml.
test sequence	Description and expected result
pre-action	The OpenStack cluster is set up with an SDN controller running in a three node cluster configuration. One or more neutron networks are created with two or more VMs attached to each of the neutron networks. The neutron networks are attached to a neutron router which is attached to an external network the towards DCGW. The master node of SDN controller cluster is known.
step 1	Start ip connectivity monitors: Check the L2 connectivity between the VMs in the same neutron network. Check the external connectivity of the VMs. Each monitor runs in an independent process. Result: The monitor info will be collected.
step 2	Start attacker: SSH to the VIM node and kill the SDN controller process determined in step 2. Result: One SDN controller service will be shut down
step 3	Restart the SDN controller.
step 4	Create a new VM in the existing Neutron network while the SDN controller is offline or still recovering.
step 5	Stop IP connectivity monitors after a period of time specified by “waiting_time” Result: The monitor info will be aggregated
step 6	Verify the IP connectivity monitor result Result: IP connectivity monitor should not have any packet drop failures reported
step 7	Verify process_recover_time, which indicates the maximun time (seconds) from the process being killed to recovered, is within the SLA. This step blocks until either the process has recovered or a timeout occurred. Result: process_recover_time is within SLA limits, if not, test case failed and stopped.
step 8	Start IP connectivity monitors for the new VM: Check the L2 connectivity from the existing VMs to the new VM in the Neutron network. Check connectivity from one VM to an external host on the Internet to verify SNAT functionality. Result: The monitor info will be collected.
step 9	Stop IP connectivity monitors after a period of time specified by “waiting_time” Result: The monitor info will be aggregated
step 10	Verify the IP connectivity monitor result Result: IP connectivity monitor should not have any packet drop failures reported
test verdict	Fails only if SLA is not passed, or if there is a test case execution problem.