DO NOT DO THIS ANY MORE - SINCE THE ADVENT OF TRUSTED FACTS, THIS TECHNIQUE IS DEPRECATED - LEAVING THIS POST HERE FOR THE SAKE OF HISTORY
This post was originally published at https://puppetlabs.com/blog/the-fact-is/ - I am republishing it here.
One of the major interfaces to extend functionality with Puppet is the use of Facter and custom facts to provide catalog compile time data to the Puppet master to customise the configurations for a specific node (using PuppetDB, this data becomes available even when you’re doing things elsewhere too). A fact, in case you were wondering, is a key/value data pair that represents some aspect of node state, such as it’s IP address, uptime, operatingsystem or whether it’s a virtual machine.
As an example within Puppet, one might use this data in the context of a catalog compile to make a decision about the catalog that we sent back to the node. A fact that tells us what Operating system a node has could cause some conditional logic in a class to tell the node the right set of packages to configure ntp on the system (because the package names differ between Linuxes, let alone what you do for a Windows). Alternatively one might use the ipaddress fact to customise the contents the appropriate listen directive in Apache or SSH.
Custom facts might be written using either by extending Puppet using Ruby, or even more trivially by using the facter.d library shipped with the Puppet Labs stdlib module or even by simply setting an environment variable available during a Puppet agent run. In this way, you can provide the catalog compilation infrastructure rich state information about your nodes.
We can see that it might become common practice in all one’s modules to alter behaviour in Puppet classes on the basis of node state data - in fact it’s one of the things that makes it so useable and flexible. However, there are some security implications here.
To take an example, one of the more common facts that gets used is the ‘hostname’ fact. Typically, unless for some reason you choose otherwise, this will match the certname of the node (i.e. the CN on the certificate generated out of the Puppet CA when you provisioned the node) though Puppet won’t use the same mechanism to arrive at that determination.
The value of $::hostname, depending on your platform, will probably be arrived at by running a command like hostname from the context of the Facter libraries used by Puppet.
A better solution is to extract the validated CN of the certificate used to authenticate a node on the Puppet master and use that data instead. Previous versions of this post suggested using $clientcert - turns out that the shared popular misconception that there was any value in that bit of data was utterly mistaken.
$actualclientcert = certcheck()
module Puppet::Parser::Functions
newfunction(:certcheck, :type => :rvalue, :doc => <<-EOS
Returns the actual certname
EOS
) do |arguments|
return host
end
end
From a security perspective there is a vast difference between the assertion backed by PKI that a node’s hostname is the fact versus the certname. This is especially true given the variety of ways we might arrive at the value that has been sent to the Puppet master. Assuming we trust the PKI, the state of being that the node’s hostname matches the certificate (if that’s the way we run our PKI) is significantly more trustworthy that the assertion that the hostname fact is the correct one. This is the major reason Puppet uses SSL and PKI for node identification.
Admittedly, in order to get to the point where you’ve sent that data to a Puppet master, one needs sufficient rights over the private key material on a node to get as far as getting Puppet to think about sending you stuff. The implication for secure module writing is that if you want to securely distribute the right configuration settings and associated data back down to nodes you must use trustworthy data when compiling a catalog.
For settings with a lesser security impact, OS package names for example, trusting facts is cheaper and probably secure enough (depending on your environment) that you might as well just use them. What you probably don’t want to do however is have your module that distributes, for example, private SSL key data to nodes that asset via facts that they should have them.
For modules with security critical information in them, the determination as to how that class instantiates on a node should be made solely using data on the Puppet master - that might be data in the puppet code that comprises your class, in an ENC or in your hiera backend - it should in no way be influenced by an untrusted assertion from a node. Even then, the root of the decision tree within your code ought to be your PKI - thus based on the SSL certificate the node used to authenticate.
If you completely trust data supplied via facts, then including data, settings or anything else in a catalog where that information might be used to perform a privilege escalation where a node has spoofed fact data, you’ve been hacked. The true danger here is that the escalation you achieve is to get root on something else. If in your organisation different sets of users have root on different sets of boxes, your module design philosophy could lead to leakage across node sets.
As an aside, if users have root on multiple boxes, what’s to stop them moving SSL certs around the place and putting the right configurations on the wrong boxes. Don’t give people root if you don’t have to by the way. Sudo everything is probably a bad thing too. Sudo don’t do that.
The alternative to using facts, as I mention above, is to use trustable data. That’s a combination of $
actualclientcert and the data you have control over on your Puppet master. It may increase the overhead of the amount of static data you need to manage, but this is going to be a trade off between security, how much you can be bothered, and what the outcome of it all going wrong is. Incidentally, this significantly promotes the idea of encoding useful data in hostnames and therefore Puppet certificate names (see my post on
the simple things - this is an extension of stuff I hadn’t fully thought through there).
I personally have no problem with using facts to make decisions with low impact outcomes security wise. I can’t immediately see is any way of making facts trustworthy given the current architecture though - Puppet runs as root, if you’re root you can run Puppet and have access to the private key and you can spoof facts.
Ultimately however, this comes down to is the good old fashioned security principles of not trusting user input, and sanitising the hell out of it before you do anything with it.