[Automated-testing] Board management API discussion at ATS - my ideas

Thu Oct 17 01:58:54 PDT 2019

Hi Tim, everyone,

On Sat, 2019-10-12 at 17:14 +0000, Tim.Bird at sony.com wrote:
> Hello everyone,
> 
> I have a few ideas about board management APIs that I thought I'd share.  There will
> be a discussion about these at ATS, but I thought I'd share some of my ideas ahead of
> time to see if we can get some discussion out of the way before the event - since time
> at the event will be somewhat limited.

Thanks for getting this started, and giving me something to critique.
;)

> What I'd like to see in a "standard" board management API is a system whereby 
> any test framework can be installed in a lab, and
> 1) automatically detect the board management layer that is being used in the lab
> 2) be able to use a single set of APIs (functions or command line verbs) to 
> communication with the board management layer
> 3) possibly, a way to find out what features are supported by the board management
> layer (that is, introspection)
> 
> The following might be nice, but I'm not sure:
> 4) the ability to support more than one board management layer in a single lab

I'd say these are all aspects of making the current "monolithic"
frameworks more modular. For me, a concrete use-case would be running
lava and kernel-ci tests in our labgrid lab. Having multiple board
management layers in the same lab seems to be less useful (especially
if the functionality exposed via the API is a common subset).

> = style of API =
> My preference would be to have the IPC from the test manager (or test scheduler) to
> the board management layer be available as a Linux command line.  I'm OK with having a
> python library, as that's Fuego's native language, but I think most board management systems
> already have a command line, and that's more universally accessible by test frameworks.
> Also, it should be relatively easy to create a command line interface for libraries that
> currently don't have one (ie only have a binding in a particular language (python, perl, C library, etc.))
> 
> I don't think that the operations for the board management layer are extremely time-sensitive,
> so I believe that the overhead of going through a Linux process invocation to open a separate
> tool (especially if the tool is in cache) is not a big problem.  In my own testing, the overhead of invoking
> the 'ttc' command line (which is written in python) takes less than 30 milliseconds, when python
> and ttc are in the Linux cache.  I think this is much less than the time for the operation that are
> actually performed by the board management layer.
> 
> As a note, avoiding a C or go library (that is a compiled language) avoids having to re-compile
> the test manager to  communicate with different board management layers. 
> 
> For detection, I propose something like placing a file into a well-known place in a Linux filesystem,
> when the board management layer is installed.
> 
> For example, maybe making a script available at:
> /usr/lib/test/test.d
> (and having scripts: lava-board-control, ttc-board-control, labgrid-board-control, beaker-board-control,
> libvirt-board-control, r4d-board-control, etc)

> or another alternative is to place a config script for the board management system in:
> /etc/test.d
> with each file containing the name of the command line used to communicate with that board management layer, and
> possibly some other data that is required for interface to the layer (e.g. the communication method, if we decide to
> support more than just CLI (e.g. port of a local daemon, or network address for the server providing board management),
> or location of that board management layer's config file.

I agree, a command line interface (while limited) is probably enough to
see if we can find a common API.

> = starting functions =
> Here are some functions that I think  the board management layer should support:
> 
> introspection of the board management layer supported features:
> verb: list-features

This could be used to expose optional extensions, maybe unter an
experimental name until standardized (similar to how browsers expose
vendor specific APIs). On example could be 'x-labgrid-set-gpio'.

> introspection of the board layer managed objects:
> vefb: list-boards

OK. It might be necessary to return more than just the name (HW type?,
availability?).

> reserving a board:
> verbs: reserve and release

This touches a critical point: Many existing frameworks have some
scheduler/queuing component, which expects to be the only instance
making decisions on which client can use which board. When sharing a
lab between multiple test frameworks (each with it's own scheduler),
there will be cases where i.e. Lava wants to run a test while the board
is already in use by a developer.

The minimal interface could be a blocking 'reserve' verb. Would
potentially long waiting times be acceptable for the test frameworks?

A more complex solution would be to have only one shared scheduler per
lab, which would need to be accessible via the board management API and
part of that layer. How to support that in Lava or fuego/Jenkins
doesn't seem obvious to me.

> booting the board:
> verb: reboot
> (are power-on and power-off needed at this layer?)

OK. The BM layer you handle power off on release.

> operating on a board:
>    get serial port for the board
>    verb: get-serial-device
> (are higher-level services needed here, like give me a file descriptor for a serial connection to the device?  I haven't used
> terminal concentrators, so I don't know if it's possible to just get a Linux serial device, or maybe a Linux pipe name, and
> have this work)

Terminal concentrators (and set2net) usually speak RFC 2217 (a telnet
extension to control RS232 options like speed and flow control).

The minimal case could be to expose the console as stdin/stdout, to be
used via popen (like LAVA's 'connecton_command'). This way, the BM
layer could hide complexities like:
- connecting to a remote system which has the physical interface
- configure the correct RS232 settings for a board
- wait for a USB serial console to (re-)appear on boards which need
power before showing up on USB

You'd loose the ability to change RS232 settings at runtime, but
usually, that doesn't seem to be needed.

>   execute a command on the board and transfer files
>   verbs: run, copy_to, copy_from

Now it's getting more complex. ;)

You need to have a working Linux userspace for these commands, so now
the BM layer is responsible for:
- provisioning kernel+rootfs
- controlling the bootloader to start a kernel
- shell login
- command execution and output collection
- network access? (for copy)
And also logging of these actions for the test framework to collect for
debugging?

At least for labgrid, that would move a large part of it below the BM
API. As far as I know LAVA, this functionality is also pretty closely
integrated in the test execution (it has actions to deploy SW and runs
commands by controlling the serial console).

So I suspect that we won't be able to find a workable API at the
run/copy level.

> Now, here are some functions which I'm not sure belong at this layer or another layer:
>   provision board:
>   verbs: install-kernel,  install-root-filesystem
>   boot to firmware?

I think installation is more or less at the same level as run/copy (or
even depends on them). 

> Here are some open questions:
>  * are all these operations synchronous, or do we need some verbs that do 'start-an-operation', and 'check-for-completion'?
>     * should asynchronicity be in the board management layer, or the calling layer? (if the calling layer, does the board
>     management layer need to support running the command line in concurrent instances?)

If the calling layer can cope with synchronous verbs (especially
reserve), that would be much simpler. The BM layer would need
concurrent instances even for one board (open console process + reboot
at least). Using multiple boards in parallel (even from one client)
should also work.

>  * are these sufficient for most test management/test scheduler layers? (ie are these the right verbs?)

Regarding labgrid: We have drivers to call external programs for power
and console, so that would work for simple cases. I think the same
applies to LAVA.

The critical point here is which part is responsible for scheduling: I
think it would need to be the BM. Currently, neither LAVA nor labgrid
can defer to an external scheduler.

>  * what are the arguments or options that go along with these verbs?
>     * e.g. which ones need timeouts? or is setting a timeout for all operations a separate operation itself?

Reserving can basically take an arbitrary amount of time (if someone
else is already using a board). For other long running commands, the
called command could regularly print that it's still alive?

>  * for provisioning verbs:
>    * how to express where the build artifacts are located (kernel image, rootfs) to be used for the operations?
>       * just local file paths, or an URL for download from a build artifact server?

As above, it think the test framework would stay responsible for
provisioning. It knows where the artifacts are, and how to control the
specific HW/SW to install them.

>   * do we need to consider security as part of initial API design?  (what about user ID, access tokens, etc.)

I don't think so. Access controls on the network layer should be enough
to make an initial implementation useful.

This doesn't need to be a downside, the current frameworks already have
this part covered and using separate NFS/HTTP servers for each test
framework in a shared lab shouldn't cause issues.

> I've started collecting data about different test management layers at:
> https://elinux.org/Board_Management_Layer_Notes
> 
> Let me know what you think.

So if we could find a way to have a common scheduler and control
power+console via subprocess calls, shared labs would become a
possibility. Then one could use different test frameworks on the same
HW, even with interactive developer access, depending on what fits best
for each individual use-case.

For use, that would mean using labgrid for the BM layer. Then for tests
which need to control SD-Mux, fastboot, bootloader and similar, we'd
continue writing testcases "natively" with labrid+pytest. In addition,
we could then also use the same lab for kernelci+lava, which would be
very useful.

Regards,
Jan
-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |