Internal Azure Load Balancer: How to load balance dependent services on one backend pool

Today I would like to present a common issue. Let’s assume the following scenario:

  1. Customer has a on-premises load balanced application,
  2. This application consists of several modules,
  3. Every node of load balancing pool hosts a set of modules,
  4. Modules operate independently, as they do not rely on any one single host for other modules they may require but use the load balancer as an intermediary for any additional modules required
  5. Some modules uses services provided by the other modules
  6. TCP based communication is used. Modules use more than only the HTTPS protocol
  7. Customer wants to lift and shift their application’s infrastructure to Azure’s Cloud.

A good example of this scenario is outlined in the following KB by CyberArk: CyberArk KB article.
Microsoft describes this case in the following Azure Troubleshooting guide.

Microsoft makes reference of this issue by stating the following:

If an internal load balancer is configured inside a virtual network, and one of the participant backend VMs is trying to access the internal load balancer frontend, failures can occur when the flow is mapped to the originating VM. This scenario isn’t supported.

And of course, this is true. Let’s evaluate Root Cause of this issue.
As stated before, our customer wants to perform a shift and lift approach for their CyberArk infrastructure. In other words, they want to use as many built-in services as possible in the Cloud. An Azure Load Balancer is the purposed solution.
An example of the architectural design by CyberArk is shown below:
Failing Azure Load Balancer architecture

First, let me present common Azure Load Balancer features and limitations:

  1. An Azure Load Balancer works only on Layer 4 of OSI/ISO
  2. An Azure Load Balancer works by using standard methods for keeping a workload balanced – these are source and destination IPs and TCP ports
  3. An Internal Azure Load Balancer uses a virtual IP (VIP) from customer’s vNET, which allows traffic to be kept internally
  4. An Azure VM can be part of only one internal and one external load balancer
  5. Only the primary network interface (vNIC) of a given Azure VM can be part of the load balancer backend pool. Only this interface can have the default gateway configured
  6. Azure Load Balancer does not do outbound traffic NAT. Simply put, it doesn’t have an outbound IP address, only inbound. This means that the Azure Load Balancer changes only the destination IP from its own VIP to Virtual Machine Primary IP during inbound NAT or load balancing rules traversal. From an Azure VM perspective, ALB is completely transparent.

Last limitation is the root cause of our problem. Microsoft states that (let’s repeat it, as it is crucial):

failures can occur when the flow is mapped to the originating VM.

This is because after an IP packet leaves the Azure Load Balancer it would have the same source IP address and the same destination IP address. According to the RCF 791 section 3.2 this kind of situation should not happen and azure backbone drops those IP packages.

So, knowing that, we can try to make some workaround. From an architectural point of view, we need to create the following conditions, where the source IP will be different from the destination. This means that our VM must have an additional IP address.
Additional IPs on an Azure vNic are supported but only the primary IP from a given interface can be used as the origin of the IP packet. This means that we cannot leverage this feature in our scenario.
Another possibility is to add vNic to the Azure VM. This is also supported. Azure VM can have one primary vNic and up to eight secondary vNics. Secondary vNics can be assigned to any subnet within Primary vNic’s vNet. So Azure VM cannot belong to more than one Azure vNet. This is enough. We need to create an additional subnet and attach a secondary vNic to it.
Secondary vNics has one more constrain. It cannot have default route. It cannot have any route at all, to be honest. Secondary vNics can gather traffic, but is hardly to send something through it. A secondary vNIC has one more constraint. It cannot have a default route. In essence it cannot have any route at all by default. Secondary vNICs can gather traffic but it is hardly to send something through it. So, this naive approach will not work. The source IP will still be established from the primary vNICs IP address.

The trick is to change VIP of Azure Internal Load Balancer.
This is the only way to enforce passing the IP packet through the secondary network adapter. When the Azure Internal Load Balancer’s VIP is from the same subnet as the secondary vNIC, we can leverage the default IP behavior – local IPs are reached through the local segment of ethernet or another medium. This is for optimization, but in this scenario it is our only choice.

Final architecture is shown on following picture:

Happy Load Balancing!

Windows Server 2016 RDP certificate configuration

On windows 2016 as well as previous version, there is no utility for RDP Certificate configuration. We can do this levereging WMI interface. The simplest way for accomplishing that is Powershell script. Configuration script can be found below.

1
2
3
4
$CN = "CN=localhost"
$RdpSetting = Get-WmiObject -class "Win32_TSGeneralSetting" -Namespace root\cimv2\terminalservices -Filter "TerminalName='RDP-tcp'"
$thumbprint = (Get-ChildItem -path cert:/LocalMachine/My | | ? Subject -eq $CN).Thumbprint
Set-WmiInstance -path $RdpSetting.__path -argument @{SSLCertificateSHA1Hash="$thumbprint"}

Presented method is universal, so can be used on client version Windows 10 as well on the server Windows 2016.

“Root element is missing” error during SMA Runbook starup

This error is most often seen because of nested runbook invocation from inline or function context. Such action is obviously not permitted. We can call Inline or Function from Workflow, but in other direction we have to use one of solutions described here.
I have just met another cause of such error.
In powershell, there is nomally available such syntax:

1
$var = (get-service)[0]

It is using result of expression within brackets as a table, what is indexed without creation of intermediate variable.
Based on my expierience and some tests, I can state that this syntax is causing error “Root element is missing”

Windows Terminal Services Logon “Access Denied”

I would like to describe resolution of the problem with Terminal Services. When you are using Terminal Services in conjunction with License Server on separate machine, you may experience following symptoms:

  • During the Logon Process, user receives the message “Access denied.”. It is shown instead logon screen, just after the “Welcome” message.
  • Within application and system event logs, there is no related error messages.
  • Within the TerminalServices-LocalSessionManager event log, there is following message correlated with user logon attempt: “Session X has been disconnected, reason code 12”, where X means number of logon session granted to user logon try by Session manager.
  • This problem you may experience on Windows 2008 R2 as well on 2012 (R2).
  • GPO policy update failure often occurs simultaneously.

Temporary solution to this problem may be modifying the following registry entry:

1
2
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\
IgnoreRegUserConfigErrors (DWORD) = 1

After addition of this registry value you need to reboot affected server.

After mitigation of poor user experience, you can peacefully start real diagnosis, what is wrong in your environment. In one of the cases, the real issue was mistake in windows firewall configuration of domain controlers, what was applied by GPO. In affecting GPO, there was rule denying “SMB over TCP” traffic.
It may be something different in your case, but always it must be something connect with domain controllers.

Failover Cluster Generic Script Resource Failed 0x80070009

Hi All,

I have found interesting behavior of Failover Clustering feature in Windows 2012 R2. Message from console and logs in this situation is quite Add to dictionary.
When you have done Generic Script Resource configuration, as described in this Microsoft Product Team Blog , you can find your resource down and following status message:

1
The storage control block is invalid.

I have shown that on this screenshot:
Status zasobu klastrowego
When we display the extended error message, we find out, that error code is

1
0x80070009

Rozszerzona informacja o błędzie
Similar Entry we can find in Cluster Event Log:
Log systemowy klastra

In this situation we should verify if our script is not returning 0x9 value from any entry point function. When we assured that, this error tells us we have made syntax error in Visual Basic script and in the result cluster resources manager was unable to compile and execute that script.

That is why checking script (every, not only Visual Basic!) is general good practise. We can verify Visual Basic Script Behawior by running the script from command line. This can be done with following command:

1
cscript.exe C:\pełna\ścieżka\do\pliku.vbs

Correctly written script should write to console completelly nothing except VB host banner because Generic Script Resource should contain only function definition and no calls to them (this is property of all CallBacks).
In case of any syntax error command execution should return with similar error:
Błąd walidacji skryptu

Have a nice Scripting Time!

Resources migration between subscriptions within MS Azure

In Microsoft Azure Cloud, any user can own more than one subscription. It is the smallest payment unit, where you can configure payment method and configure resource ownership boundaries.
Every Subscription has own Billing Generation cycle. So you receive as much bills as you have subscriptions. There is exception for Enterprise Agreements, but this is out of scope of this text.
The important thing is you can receive for example three bills, each for two euros. It can be annoing, really.
So this is the reason for joining the subscription.
The only way for doing that is resouces migration to one choosen subscription from the others.
In the Internet, there is planty of questions about that, but responces are unclear or outdated.
Fortunatelly Microsoft has described it within documentation. It can be found here.
One thing is worth of notice. Every resource type has own migration policy, so possibility and limitations of given resource type depends on it.
Full list of resource types, what support migration you can find here.

Certificate Request generation for Microsoft Enterprise CA by openssl

Long time ago I have written about generating Certificate Signing Requests from non-Windows machines. The main goal was to sign such request by Microsoft Enterprise CA. I have mentioned vSphere infrastructure as an example.
I have been recently asked a similar question. New vSphere versions require Alternative Name Extension to exist in the certificate. The question was how to configure openssl to implement both functionalities.
Reaching the goal was quite simple, but not trivial. We can define several sections containing settings for request extensions, however only one can be used for a specific certificate request generation.
It is a good practice to reorganize an openssl configuration file designed for generating a single server certificate. In this way, we obtain templates for each server instance.
An Example configuration file may look as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
openssl_conf = openssl_init

[ openssl_init ]
oid_section = new_oids

[ req ]
default_bits = 2048
default_keyfile = rui.key
distinguished_name = req_distinguished_name
encrypt_key = no
prompt = no
string_mask = nombstr
req_extensions = v3_req

[ new_oids ]
MsCaCertificateTemplate = 1.3.6.1.4.1.311.20.2

[ v3_req ]
basicContraints = CA:FALSE
keyUsage = digitalSignature, keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth, clientAuth
subjectAltName = DNS:server01, DNS:server01.domena.test
MsCaCertificateTemplate = ASN1:PRINTABLESTRING:VMwareCertificate

[ req_distinguished_name ]
countrName = PL
stateOrProvinceName = Malopolskie
localityName = Krakow
0.organizationName = Firma
organizationalUnitName = Oddzial
commonName = server01.domena.test

Most important parts of the config file are:

  • Line 1. Identifies the global configuration section.
  • Line 4. Identifies the OIDs definitione section. This line is singleton in this section.
  • Line 13. Identifies the Extension definition section. Those extensions will be added to certificate request body.
  • Line 15. Defines OIDs section.
  • Line 16. Defines OID registered and used by Microsoft for marking certificate template extension.
  • Line 22. Defines alternative names of the server. Of course, we can use other than DNS prefixes.
  • Line 23. Defines the name of certificate Template, what is designed to use during signing the certificate. It is important to remember that we need to specify “Certificate Template Name”, as oposite to “Certificate Template Display Name”.

Rest of the file is standard body similar to every single config file designed for generation of requests.