Options to set waagent.conf variables (Linux)
The linux (Ubuntu) machines are not really persistent when shutting down and/or capturing, the waagent.conf is not read properly (even when capturing an image, a new image based on the capture does not include changed settings in waagent.conf).
It would be nice to have the waagent.conf setting configurable at start up, when creating the virtual machine.
There are a few bugs we are working through to address these issues:
1) Hostname changes are not persistent across reboots on Ubuntu. This is an issue we are working through on cloud-init which we should have fixed soon: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1375252
2. Customers cannot use the waagent.conf file to change the location of the local disk mount point or set up swap space in a consistent way. We are working through how to fix this wit these two bugs:
The (more or less) final conclusion should be:
a) with respect to the swap issue:
- introduce cloud-init version 0.7.6 or higher (https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1410824)
- create swap partition with cloud-config (https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1374166)
and note that both solutions are robust in Trusty (Ubuntu 14.04.1 LTS) and robust during shut-down and start-up sequence(s),
b) with respect to local disk mount point:
- cloud-config settings can be adjusted in order to customize mount points for (all) devices
c) with respect to using the waagent.conf variables:
- often not a satisfactory solution, mostly due to ResourceDisk.Format settings,
- a customizable or extendable (preferred) Datasource (as used by cloud-init) is the key to success
Some bugs are indeed present in cloud-init and/or cloud-config. I will post these in launchpad.
However, the before mentioned bugs are unrelated to the swap space, as created by waagent.
I almost fully agree with your post, with (in essence) two exceptions, being:
1 - the fact that manual entries in /etc/fstab are overridden by waagent script (in many cases),
2 - the fact that waagent (and waagent.conf) is dominating cloud-init (and cloud-config) and not the other way around.
I have spent numerous hours of testing to the swap issue, since it is a good example of minor caveats in the waagent script, with those minor caveats having huge impact on user experience.
The swap issue is really a result of having the swap defined in code in the ActivateResourceDisk part of the waagent script, hence preventing swap and disk settings being configured separately by changing the waagent.conf.
A rewrite of code should do the trick, at least for obtaining persistent swap space.
In addition to the above, a number of small remarks.
I have noticed that the waagent script can be changed on a created VM. That is a danger to the whole Azure concept, since VMs can be broken down by faulty lines of code in the waagent.
I have also noticed that the multiple options for custom configuration of a VM are not always working properly. In essence, it is strange to have a CustomScript Extension, CustomData options, scripts (Role.StateConsumer) etc. and a conclusion that they do not work as intended and/or are not flexible enough.
I have finally noticed that real issues are being dominated by issues that are minor and should not be of major importance. For instance, it seems to be the case that custom mount points for disks is redirecting the discussion to cloud-config (not necessary, see description of bug 1374115, that is really irrelevant, given the option to set waagen.conf settings and restarting the VM) and/or that the swap space issue is redirecting the discussion to assumptions and not facts (see description of bug 1374166. Fact is that a VM during provision is creating/mounting disks and/or swap space with the waagent script, and fact is that the settings at provisioning can be overridden by a change of waagent.conf and a restart).
In short, in my humble opinion, emphasis should be on the really important issues and solutions thereof, not on some vague bug descriptions and/or vague conclusions of certain persons.
The swap space issue is, ironically, an important issue (not for the sake of swap space, but for the sake of the waagent script, that ignores specific configurable settings in waagent.conf, if one other setting, being ResourceDisk.Format, is set to "n").
In essence, one setting in configuration should not be able to discard other settings.
PS I am working on a change in the waagent script, enabling creation and resizing of (persistent) swap space, by changing settings in the waagent.conf script. Note that it should be better to have a possibility to create and resize swap in the Azure portal, without having to restart the VM.
Stephen Zarkos commented
>I do understand the cloud-init reference and, given that you state yourself that the "traditional method using waagent.conf is not working", I must emphasize that the solution for issues should not be (only) searched in cloud-init/cloud-config related improvements.
In the case of Ubuntu it must be. There is no good method to share the responsibility of configuring the resource disk and swap space between the two without hitting race conditions. Canonical chose to enhance their distribution by porting cloud-init, which is great for many users (particularly those coming from other clouds), but they need to also make things easier for those users like yourself who aren't using cloud-configs to configure their ephemeral disk.
>For instance, the swap part of code in the AbstractDistro class will not be read (executed) at all, if ResourceDisk.Format=n.
Sure, that would be an easy change, but to be clear that design is on purpose. The reason for having this in the agent is that the resource disk may need to be reformatted many times during the life of the VM, and so this is a dependency if you want to do swap correctly on Azure. If you want swap on an OS or data disk you can create this yourself and add the entry to fstab, but it won't be a very good experience compared to hosting it on the local temporary disk. Because it will be a bad experience, IMHO the agent should not support this.
> Cooperation with Canonical to fix cloud-init related issues are not really effective, if the waagent script and corresponding waagent.conf is not working as intended by Microsoft.
I do see what you're saying here. But the crux of the issue is that Provisioning.Enabled=n as it is set in Ubuntu images is actually working as intended. This obviously hands over control to cloud-init, which is also intentional as we want distributions to own (to some level) and enhance their offering on Azure. But if we are going to continue to do this then Canonical must resolve these issues you've mentioned.
Indeed, I found out that other scripts are indeed working (using many script formats to create swap at boot).
By the way, why is it not considered to issue CustomScriptExtension (or something similar) to provide cloud-config/cloud-init related settings?
That should be a more flexible solution for at least Ubuntu (instead of changing waagent script and/or expanding the waagent.conf with numerous variables).
I do understand the cloud-init reference and, given that you state yourself that the "traditional method using waagent.conf is not working", I must emphasize that the solution for issues should not be (only) searched in cloud-init/cloud-config related improvements.
The Ubuntu case is a good illustration of the waagent script not being up-to-date and/or meeting all requirements, with a small note that other Distros are very likely to have similar issues soon.
Restructuring waagent script is cumbersome, but can resolve many issues that are not obvious.
For instance, the swap part of code in the AbstractDistro class will not be read (executed) at all, if ResourceDisk.Format=n.
In the current logical structure of the waagent script, improvements to classes and other code has to result in specific behaviour and/or overriding of specific settings, including those of cloud-init.
However, simple improvements do not yield any results.
In a sense, the logical structure can be desired, but the code itself is not.
Maintaining current logical structure in essence obscures code errors AND the fact that many lines of code are (in essence) stating the obvious (i.e. the same as in other lines of code).
Cooperation with Canonical to fix cloud-init related issues are not really effective, if the waagent script and corresponding waagent.conf is not working as intended by Microsoft.
In short, the waagent script should be on the driver seat, with clean and simple code.
Stephen A. Zarkos commented
I'm not sure if it was clear before, but on Ubuntu VMs the Linux agent does not do any provisioning nor handling of the resource disk, this is handled entirely by cloud-init - hence the bug reports on Ubuntu's Launchpad to fix some of the issues you mentioned. This means that today you can use a cloud-config to do whatever you like with the ephemeral disk in a manner consistent with other clouds, but also means that our traditional method using waagent.conf isn't working. We certainly need to continue working with Canonical to fix this.
Regarding splitting of the agent script - currently, the waagent script itself is logically split by using Python classes to segment distribution-specific functions and workflow from generic code. For example, when running on a SUSE system only the SUSE class will be instantiated, which inherits the functions from the AbstractDistro class. The SUSE-specific functions will of course override the inherited functions, allowing us to create distribution-specific workflows with a minimal amount of changes. Breaking it up into multiple files would probably make all this easier to read and understand, but of course wouldn't change the requirement for this logical division.
Ning Kuang [MSFT] (Program Manager) commented
Yes, it supports other scripts, as long as it runs on current platform.
In addition, if you want more flexibility, you can also consider CustomScript Extension:
It allows you to run script from the location you’ve specified, with parameters you’ve entered.
If I am not mistaken, the Role.StateConsumer = <Patch to My Script> only allow for python script.
Is there a reason for not allowing simple bash scripting, as common (and easy) on Linux?
Apologies for the late reaction.
With respect to the local disk mount point and/or swap space issues, the following.
These issues are interrelated, as you probably are aware of.
The waagent python script adds the mount of the (standard) resource disk in the last line of /etc/fstab.
Any manually added lines in /etc/fstab for swap space and/or the (standard) resource disk are (therefore) without effect.
The issues on hand are, in short, the result of the behaviour of the waagent script.
This behaviour (or, in a sense misbehaviour) can be overcome by setting ResourceDisk.Format=y, resulting in persistent swap space and (standard) allocation of (all) disks. However, this solution does not allow for persistent data storage on the (standard) resource disk.
In essence, the two issues (custom mount points and persistent swap space) are unrelated.
The solution to the "custom mount point" issue could be to allow for (additional) input variable(s) for the waagent script, given the fact that the "def ActivateResourceDisk(self)" part can handle these variable(s) in a standard way. However, the ability to change locations of mount points with input variable(s) is not desirable, since it can really mess-up (amongst others) the proper creation of the VM.
The solution to the "persistent swap space" issue seems to be related to the ResourceDisk.Format setting AND the fact that the lines of code are integrated in the "def ActivateResourceDisk(self)" part, that (in essence) will not be executed (exit on return) if ResourceDisk.Format=n.
The swap space will hence
a) not be created at creation time of the VM, given the standard setting ResourceDisk.EnableSwap=n in waagent.conf,
b) not be created at shut-down or start-up, if ResourceDisk.Format=n,
and therefore, the waagent python script inherently leads to non-persistent swap space, due to a (simple) structural code error.
As mentioned before, this issue cannot be (temporarily) resolved by manual additions to /etc/fstab.
The more structural solution should be found in a rearrangement of the waagent script, in specific the "def ActivateResourceDisk(self)" part.
I am not a python expert, I will hence try to suggest a logical structure for the rearrangement (actual code has to be adjusted, probably).
In my humble opinion, a (separate) "def ActivateSwap()" part (or something appropriate) should be created and called in (lines 452-454)
if format == None or format.lower().startswith("n"):
ActivateSwap() # code to be added, see remarks/note below
DiskActivated = True
and that could or should resolve the "persistent swap space" issue.
The ActivateSwap() function can be almost identical to the current code, with some minor adjustments for calling the function.
Note that the ActivateSwap() function can allow for
1) the use of the standard waagent.conf at creation time of the VM (reducing the impact of changed code),
2) the creation of persistent swap space by setting ResourceDisk.EnableSwap=y and ResourceDisk.SwapSizeMB=xxxx, followed by a restart,
3) (AND) the change of persistent swap space by changing ResourceDisk.SwapSizeMB=xxxx, followed by a restart,
4) (AND most important) the manual creation of persistent swap space after first creation of the VM, if ResourceDisk.EnableSwap=y and ResourceDisk.SwapSizeMB=xxxx (with xxxx equal to manually created swap space).
It should also be noted that (lines 452-454) formatted as
if format == None or format.lower().startswith("n"):
if swap == None or swap.lower().startswith("y"): # code to be added, see remarks/note below
CreateSwap() # code to be added, see remarks/note below
DiskActivated = True
could be more efficient in obtaining the goals mentioned in point 1 to 4, since the CreateSwap() function can be simplified to
sizeKB = int(Config.get("ResourceDisk.SwapSizeMB")) * 1024
if os.path.getsize("/mnt/swapfile") != (sizeKB * 1024):
if not os.path.isfile("/mnt/swapfile"):
Run("dd if=/dev/zero of=/mnt/swapfile bs=1024 count=" + str(sizeKB))
Log("Enabled " + str(sizeKB) + " KB of swap at /mnt/swapfile")
and the above is only possible, due to the facts that
- at creation time of the VM, a swap space will never be created,
- at restart of the VM, any swap will be created at /mnt, since the (standard) resource disk is mounted at that point,
- a swap file can be safely erased and/or recreated (certainly at restart), due to the nature of swap space.
The (two) suggestions made are based on a logical approach and can be implemented, with code adjustments for proper python code.
I hope that the suggestions help.
Ning Kuang [MSFT] (Program Manager) commented
Thanks for the feedback. We are working on code refactoring for the waagent.
For suggestion (a), we plan to make improvements on the distro specific logic, so it is easy to read and debug. See following tracking bug: https://github.com/Azure/WALinuxAgent/issues/62
For suggestion (b), please check following document, it is already supported.
Under section “Configuration”, you can provide the path of your script in the configuration file under Role.StateConsumer, for example Role.StateConsumer = <Patch to My Script>. The waagent will execute this script after provision.
When looking into the configurable waagent settings, please note that the waagent python script is eligible for a rigourous code clean-up.
The specific code for various (linux) distro´s is, in a sense, rather obsolete (since defined in other parts of the script) and/or unnecessary (for example, the waagent script does not need lines of code for Suse distro´s, when installing Ubuntu) and/or incorrect (some parts of the script do not work properly) and/or inapt (for specific linux tasks, that normally should be run at start-up).
It is my suggestion to
a) split up the waagent python script, into
- a core script (required for creation, deletion, start and shutdown of a VM)
- multiple distro-orientated scripts (of which only one script will be installed, for the specific linux distro that is running on the VM)
b) to allow the (core) waagent python script to run (multiple) "other" scripts, defined in waagent.conf
The above suggestion is intended to
- keep the waagent python script clean AND flexible,
- allow Azure users to (easily) customize settings AND (multiple) scripts at start-up or shutdown.
Maybe it is a good suggestion, keep me posted!