This article was simultaneously published on the public account of Byte Word Cloud .
original intention
In a business developed with Python, there are two specific requirements for Terraform interaction:
- Various commands of Terraform need to be called to complete operations such as deployment and destruction of resources
- Need to parse the content of the Terraform configuration file (HCL syntax) and analyze the components inside
For the former, there is an open source library called python-terraform , which encapsulates Terraform commands. When we call it in the code, a new process will be started behind to execute the corresponding commands of Terraform, and the command exit code and captured command will be returned. stdout
and stderr
. python-terraform
Although it is convenient to use, the biggest disadvantage is that Terraform is required to be installed in the execution environment in advance, and the new process also brings additional overhead.
For the latter, no open source Python library has been found to suffice.
I wanted to have a library that can execute Terraform commands in the current process without requiring users to install Terraform in advance, and can also parse Terraform configuration files, and py-libterraform was born.
use
Before explaining the implementation principle of py-libterraform
, let's take a look at how to install and use it.
Its installation is very simple, do pip
can be ordered, support Mac
, Linux
and Windows
, and support Python3.6 and more Version:
$ pip install libterraform
py-libterraform
Currently provides two functions: TerraformCommand
for executing Terraform CLI, TerraformConfig
for parsing Terraform configuration files. These two functions are described later with examples. Assume that there is currently a sleep
folder, and the content of the main.tf
file is as follows:
variable "time1" {
type = string
default = "1s"
}
variable "time2" {
type = string
default = "1s"
}
resource "time_sleep" "wait1" {
create_duration = var.time1
}
resource "time_sleep" "wait2" {
create_duration = var.time2
}
output "wait1_id" {
value = time_sleep.wait1.id
}
output "wait2_id" {
value = time_sleep.wait2.id
}
Terraform CLI
Now enter the sleep directory, you need to execute Terraform on it init
, apply
and show
, then you can do this:
>>> from libterraform import TerraformCommand
>>> cli = TerraformCommand()
>>> cli.init()
<CommandResult retcode=0 json=False>
>>> _.value
'\nInitializing the backend...\n\nInitializing provider plugins...\n- Reusing previous version of hashicorp/time from the dependency lock file\n- Using previously-installed hashicorp/time v0.7.2\n\nTerraform has been successfully initialized!\n\nYou may now begin working with Terraform. Try running "terraform plan" to see\nany changes that are required for your infrastructure. All Terraform commands\nshould now work.\n\nIf you ever set or change modules or backend configuration for Terraform,\nrerun this command to reinitialize your working directory. If you forget, other\ncommands will detect it and remind you to do so if necessary.\n'
>>> cli.apply()
<CommandResult retcode=0 json=True>
>>> _.value
[{'@level': 'info', '@message': 'Terraform 1.1.7', '@module': 'terraform.ui', '@timestamp': '2022-04-08T19:16:59.984727+08:00', 'terraform': '1.1.7', 'type': 'version', 'ui': '1.0'}, ... ]
>>> cli.show()
<CommandResult retcode=0 json=True>
>>> _.value
{'format_version': '1.0', 'terraform_version': '1.1.7', 'values': {'outputs': {'wait1_id': {'sensitive': False, 'value': '2022-04-08T11:17:01Z'}, 'wait2_id': {'sensitive': False, 'value': '2022-04-08T11:17:01Z'}}, 'root_module': {'resources': [{'address': 'time_sleep.wait1', 'mode': 'managed', 'type': 'time_sleep', 'name': 'wait1', 'provider_name': 'registry.terraform.io/hashicorp/time', 'schema_version': 0, 'values': {'create_duration': '1s', 'destroy_duration': None, 'id': '2022-04-08T11:17:01Z', 'triggers': None}, 'sensitive_values': {}}, {'address': 'time_sleep.wait2', 'mode': 'managed', 'type': 'time_sleep', 'name': 'wait2', 'provider_name': 'registry.terraform.io/hashicorp/time', 'schema_version': 0, 'values': {'create_duration': '1s', 'destroy_duration': None, 'id': '2022-04-08T11:17:01Z', 'triggers': None}, 'sensitive_values': {}}]}}}
It can be seen from the above execution process that no matter what command is executed, a CommandResult
object will be returned to represent the command execution result (including return code, output, error output, whether it is a json structure).
in:
-
init()
The returned value isTerraform init
the standard output of the command, a string -
apply()
The returned value defaults toTerraform apply -json
The standard output of the command is treated as json loaded data, a list showing log records. If you don't want to parse standard output, you can useapply(json=False)
-
show()
The returned value defaults toTerraform show -jon
The standard output of the command is treated as json loaded data, a dictionary showing the data structure of the Terraform state file
The idea of the encapsulation function of all commands is to make the results as convenient as possible for the program to process, so the Terraform commands that support -json
will use this option by default and parse the results.
The above is a simple example, in fact TerraformCommand
encapsulates all Terraform commands, you can call help(TerraformCommand)
to view.
Terraform configuration file parsing
If you want to get Terraform to further process the parsing result of the configuration file, then TerraformConfig
can meet the requirements, through which you can parse the specified Terraform configuration directory, and obtain the variables, resources, output, line numbers, etc. information, which is useful for analyzing configuration composition. You can do this (part of the output is used more... omitted):
>>> from libterraform import TerraformConfig
>>> mod, _ = TerraformConfig.load_config_dir('.')
>>> mod
{'SourceDir': '.', 'CoreVersionConstraints': None, 'ActiveExperiments': {}, 'Backend': None, 'CloudConfig': None, 'ProviderConfigs': None, 'ProviderRequirements': {'RequiredProviders': {}, 'DeclRange': ...}, 'Variables': {'time1': ..., 'time2': ...}, 'Locals': {}, 'Outputs': {'wait1_id': ..., 'wait2_id': ...}, 'ModuleCalls': {}, 'ManagedResources': {'time_sleep.wait1': ..., 'time_sleep.wait2': ...}, 'DataResources': {}, 'Moved': None}
TerraformConfig.load_config_dir
behind calls Terraform source code internal/configs/parser_config_dir.go
in LoadConfigDir
method to load Terraform profile directory, returns the contents of native returns results *Module
, hcl.Diagnostics
are serialized and loaded as dictionaries in Python respectively.
Implementation principle
Since Terraform is written in GoLang, Python cannot be called directly, but fortunately it can be compiled into a dynamic link library and then loaded and called by Python. So the general idea is to do this:
- Use
cgo
to write the C interface file of Terraform - Compile it as a dynamic link library, ending with
.so
.dll
Linux/Unix and ---c13d50fdbc9ccf5c81a44ebadddda2ed--- on Windows - Load this dynamic link library through
ctypes
in Python, and implement command encapsulation on top of this
Essentially, GoLang and Python use C as the medium to complete the interaction. There are many articles on the Internet about how to use cgo
and ctypes
. This article focuses on the various "pits" encountered in the implementation process and how to solve them.
Pit 1: GoLang's internal packages mechanism blocks external calls
Starting from version 1.4, GoLang has added the Internal packages mechanism, which only allows the internal parent directory and subpackages of the parent directory to be imported, and other packages cannot be imported. In the latest version of Terraform, almost all code is placed in internal, which means that the interface file written by cgo
(called libterraform.go
in this project) is used as an external package ( For example, if the package name is libterraform
), the Terraform code cannot be called, so the encapsulation of the Terraform command cannot be realized.
One solution is to change the internal to public in Terraform, but this means that a lot of Terraform source code needs to be modified, which is not a good idea.
Then another idea is to let libterraform.go
as a "part" of the whole Terraform project to "trick" the Go compiler. The specific process is as follows:
- The package name of
libterraform.go
is consistent with the main Terraform package, namelymain
- Before building, move
libterraform.go
to the Terraform source root directory as a member of the Terraform project - When building, use the
go build -buildmode=c-shared -o=libterraform.so github.com/hashicorp/terraform
command to compile, so that the compiled dynamic link library can contain the logic oflibterraform.go
Pit 2: Pay attention to the memory space requested by the C runtime
Whether it's GoLang or Python, we don't need to worry about memory management, because they will be collected at the right time by the language's garbage collection mechanism. But when it comes to C logic, you need to pay attention to memory management.
Using the interface defined in cgo may return *C.char
, which is actually a memory space opened at the C level and needs to be explicitly released. For example, libterraform.go
defines a method to load the Terraform configuration directory ConfigLoadConfigDir
, which is implemented as follows:
//export ConfigLoadConfigDir
func ConfigLoadConfigDir(cPath *C.char) (cMod *C.char, cDiags *C.char, cError *C.char) {
defer func() {
recover()
}()
parser := configs.NewParser(nil)
path := C.GoString(cPath)
mod, diags := parser.LoadConfigDir(path)
modBytes, err := json.Marshal(convertModule(mod))
if err != nil {
cMod = C.CString("")
cDiags = C.CString("")
cError = C.CString(err.Error())
return cMod, cDiags, cError
}
diagsBytes, err := json.Marshal(diags)
if err != nil {
cMod = C.CString(string(modBytes))
cDiags = C.CString("")
cError = C.CString(err.Error())
return cMod, cDiags, cError
}
cMod = C.CString(string(modBytes))
cDiags = C.CString(string(diagsBytes))
cError = C.CString("")
return cMod, cDiags, cError
}
In the implementation of the above method, using C.CString
will apply for a memory space at the C level, and return the result to the caller, then the caller (Python process) needs to explicitly release the memory after using the returned value .
Before that, you need to expose the method of releasing memory through cgo:
//export Free
func Free(cString *int) {
C.free(unsafe.Pointer(cString))
}
Then, in Python, you can implement the following encapsulation:
import os
from ctypes import cdll, c_void_p
from libterraform.common import WINDOWS
class LoadConfigDirResult(Structure):
_fields_ = [("r0", c_void_p),
("r1", c_void_p),
("r2", c_void_p)]
_load_config_dir = _lib_tf.ConfigLoadConfigDir
_load_config_dir.argtypes = [c_char_p]
_load_config_dir.restype = LoadConfigDirResult
root = os.path.dirname(os.path.abspath(__file__))
_lib_filename = 'libterraform.dll' if WINDOWS else 'libterraform.so'
_lib_tf = cdll.LoadLibrary(os.path.join(root, _lib_filename))
_free = _lib_tf.Free
_free.argtypes = [c_void_p]
def load_config_dir(path: str) -> (dict, dict):
ret = _load_config_dir(path.encode('utf-8'))
r_mod = cast(ret.r0, c_char_p).value
_free(ret.r0)
r_diags = cast(ret.r1, c_char_p).value
_free(ret.r1)
err = cast(ret.r2, c_char_p).value
_free(ret.r2)
...
Here, after obtaining the returned result, call _free
(that is, libterraform.go
in Free
) to explicitly release the memory to avoid memory leak.
Pit 3: Capture output
In the source code of Terraform, the output of executing the command will be printed to the standard output stdout
and the standard error output stderr
, then use cgo to encapsulate the interface of RunCli
, And when called by Python, it is directly output to stdout
and stderr
by default.
What could be wrong with this? If two commands are executed at the same time, the output results will be interleaved, and there is no way to distinguish which command the results are from.
The solution is to use pipes:
- Use
os.pipe
in the Python process to create pipes for stdout and stderr respectively (file descriptors are generated) -
libterraform.go
the two file descriptors into theRunCli
method of ---7b81b133c9ddf53741f86f33975796cc---, and internally useos.NewFile
to open the two file descriptors and replaceos.Stdout
andos.Stderr
- At the end of the
RunCli
method close both files and restore the originalos.Stdout
andos.Stderr
In addition, use the file descriptor obtained by os.pipe
99ed8db5acaf01692ffe945213d7dada--- to libterraform.go
when using it, pay attention to the difference of the operating system:
- For Linux/Unix, just pass it in and use it
- For Windows, it is necessary to additionally convert the file descriptor into a file handle, because on Windows GoLang's
os.NewFile
receives a file handle
The relevant code in Python is as follows:
if WINDOWS:
import msvcrt
w_stdout_handle = msvcrt.get_osfhandle(w_stdout_fd)
w_stderr_handle = msvcrt.get_osfhandle(w_stderr_fd)
retcode = _run_cli(argc, c_argv, w_stdout_handle, w_stderr_handle)
else:
retcode = _run_cli(argc, c_argv, w_stdout_fd, w_stderr_fd)
Pit 4: Pipe Hang
Since the size of the pipe is limited, if the write exceeds the limit, it will cause a Hang to be written. Therefore, you cannot read the output from the pipe after calling RunCli
(that is, the command output will be written to the pipe), otherwise it will be found that it is normal when executing simple commands (such as version
). Hangs when executing complex commands (such as apply
because there is a lot of output).
The solution is to start two threads to read the file descriptor content of standard output and standard error output before calling RunCli
098e5f87af1b522b6b80b789febd9b76---, and go to join
after calling the RunCli
command. join
these two threads. The relevant code in Python is as follows:
r_stdout_fd, w_stdout_fd = os.pipe()
r_stderr_fd, w_stderr_fd = os.pipe()
stdout_buffer = []
stderr_buffer = []
stdout_thread = Thread(target=cls._fdread, args=(r_stdout_fd, stdout_buffer))
stdout_thread.daemon = True
stdout_thread.start()
stderr_thread = Thread(target=cls._fdread, args=(r_stderr_fd, stderr_buffer))
stderr_thread.daemon = True
stderr_thread.start()
if WINDOWS:
import msvcrt
w_stdout_handle = msvcrt.get_osfhandle(w_stdout_fd)
w_stderr_handle = msvcrt.get_osfhandle(w_stderr_fd)
retcode = _run_cli(argc, c_argv, w_stdout_handle, w_stderr_handle)
else:
retcode = _run_cli(argc, c_argv, w_stdout_fd, w_stderr_fd)
stdout_thread.join()
stderr_thread.join()
if not stdout_buffer:
raise TerraformFdReadError(fd=r_stdout_fd)
if not stderr_buffer:
raise TerraformFdReadError(fd=r_stderr_fd)
stdout = stdout_buffer[0]
stderr = stderr_buffer[0]
At last
When it was found that the existing open source library could not meet the needs, I hand- py-libterraform
, which basically realized the requirement of calling Terraform commands in a single process. Although various problems were encountered during the development process, and it was necessary to constantly jump between Python, GoLang, and C, it was fortunate that they were solved one by one.
Finally, https://github.com/Prodesire/py-libterraform ask for praise 😄~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。