[Automated-testing] Farming together - areas of collobration

Mon Nov 13 04:10:36 PST 2017

Hello Andrew,

thanks for pushing this forward!

On Sat, 2017-11-11 at 00:23 +0000, Andrew Murray wrote:
> Hello,
> 
> Following on from the "Farming Together" BOF at ELCE [1][2], I'm keen
> to collaborate with this new community to harmonise automated testing
> infrastructure tools.
> 
> It's my view that there are two main use-cases for farms - the first
> being to facilitate automated testing and the second to facilitate
> remote working (interactive/real-time access to hardware). Any
> efforts
> we undertake should have these use-cases in mind - as it seems that
> much of the existing software available is narrowly targeted at a
> specific task or user within one of these use-cases. (Are there any
> other use cases?).

One other use-case I have in mind for labgrid is using it for automated
image deployment during production. As this often requires custom
integration (databases, UIs, …), this basically boils down to making
the bottom two layers easy to reuse as a library.

> My vision of a farm is that it would consist of three distinct
> layers...
> 
> The Grass - This is the bottom layer which provides the bare minimum
> software abstraction to the fundamental capabilities of the farm.
> This consists of physical devices and suitable drivers for them. The
> interface to higher layers is of the language of turn power port 2
> on, bridge relay 5 on, etc. The goal here is for us to be able pick
> up some hardware off the shelf (e.g. a APC power switch) and to have
> a 'driver' already available from this community that fits into this
> view of the world. In order to achieve this it would be necessary to
> define categories of devices (power, relay, serial), the verbs (power
> cycle, power off, etc) and a way of addressing physical devices. I
> believe the LabGrid stack has a good model in this respect.

To rephrase this from the labgrid perspective, this layer consists of
three parts "Resources", "Drivers" and "Protocols". Resources describe
which interfaces (power, serial, USB, buttons/jumpers, ...) are
available and  the physical details. Drivers contain the functionality
provide a "Protocol" by controlling a Resource.

Splitting this up provides one main benefit: Each resource can be
defined independent of an individual board and be reused by name,
abstracting away some of the low-level details (such as the TCP port on
a 8x network serial server). The NetworkPortPort, for example, is
configured like this:
  NetworkPowerPort:
    model: gude
    host: powerswitch.example.computer
    index: 0

On the other side, this split allows the Driver configuration to handle
only the (often project/board-specific) SW details. For example:
  UBootDriver:
    prompt: 'Uboot> '
    password: 'secret'

The Protocol provided by a Driver is basically what you call "verbs".
For example, the PowerProtocol [1] defines on, off and cycle. Several
drivers can provide the same Protocol. 

> The Cow - This is the middle layer which uses the bottom layer and
> provides an abstraction of a board, this is where boards are defined
> and related to the physical devices. This layer would manage
> exclusive 
> access to boards and self-tests. Users of this layer could find out
> what boards are available (access control), what their capabilities
> are and access to those capabilities. The language used here may be
> turn board on, get serial port, press BOOT switch, etc. The goal here
> is that we can work as a community to create definition of boards - a
> bit like device tree at a high physical level.

For labgrid, some of this is provided by the remote infrastructure
part.

> The Dairy - This is the top layer which uses the middle layer. This
> is
> the application layer, applications can focus on their value-add
> which
> may be running tests (LAVA) or providing remote access to boards
> (perhaps with additional features such as 'workspaces' with prebuilt
> and predeployed software images).

This is explicitly not handled by labgrid, although it contains some
helpers for integration into pytest.

> The problem we have at the moment is that existing software tries to
> do all three. For example LAVA - it's a great 'Dairy' (application)
> but the person setting up LAVA on a machine has to plumb it into
> their
> physical hardware. This normally creates a easily breakable link that
> overly couples the hardware to LAVA. It means LAVA has to do more
> than
> just schedule and run tests - it has to manage boards, it has to
> check
> they are OK, etc. This is a distraction for LAVA - wouldn't it be
> great if we tell it where our 'Cow' is and it figures out the rest.
> 
> By splitting the stack into layers, we can move beyond the basics of
> farms and really improve the reliability of farms and add features.
> We
> can also allow there to be multiple applications, e.g. a board farm
> that is kernelci enabled but also allows remote access for
> developers.

I fully agree. To be fair, LAVA already tries to split this a bit with
lava-dispatcher, but it's still not intended to be used independent of
LAVA.

> In my experience, the most difficult part of farming is making it
> robust and reliable. It would be nice to not be scared of replacing a
> USB hub without fear of everything breaking and boards seeing the
> wrong UARTs etc. It would be nice to be alerted when a board is no
> longer working, etc.

I think this is difficult mainly because of another (implicit) goal:
The farm should be flexible enough to handle unusual or new interfaces.

If you'd know everything you will require beforehand, having a reliable
farm it not too difficult (buy high quality network relays, serial
console servers and industrial USB hubs).

But, reliably suffers when you need to accommodate custom interfaces:
You add additional wiring, more USB ports, connect one board to
another, a logic analyzer/scope, a power meter, an SD-Mux, a beaglebone
to emulate an USB-Stick, …
Doing that for one board is also not that hard either, but if most
boards have some custom stuff, the board farm infrastructure needs to
be flexible and extensible enough.

> I'm keen to focus on one small part of this stack and do it well - I
> don't see the need to abandon existing projects in this field, but
> instead to align in certain areas.
> 
> In my view it's the bottom two layers that need the most work. I have
> a particular vision for how part of the 'Grass' layer should work...
> 
> I find that when managing a farm, it would be nice to add new boards
> without having to change some integral farm configuration or have
> particular access rights to 'the farm' to plumb in a new board. I.e.
> so that anyone can add a farm without risk of breaking anything else.
> In our farm we install boards on rack-mount shelves - I think it
> would be a fantastic idea if we could define a standard that:
> 
> - Standardises connectivity between a shelf and the farm PC (perhaps
> just USB, Ethernet and power)
> - Requires each shelf to contain a USB stick with a file that
> describes the board, the physical USB topology between the shelf and
> its components (USB serial adaptors etc)
> 
> This immediately opens up some benefits:
> 
> - You can now make a new piece of hardware available to farm users by
> creating a shelf and plugging it into the farm - no need to go and
> configure the farm - this makes it really easy.
> - Likewise you can plug your shelf into any shelf of the farm - i.e.
> makes maintenance really easy
> - You could plug your shelf into your PC and farm software on your PC
> would detect the hardware and give you access.
> - You could plug your shelf into anyone elses farm.
> 
> As a slightly more concrete example, imagine you have a beagle bone
> black, there would be a community definition file available that
> describes that it is a board named 'Beagle Bone Black' which
> describes
> that it has a power jack, a power button, a reset button, serial, USB
> and an Ethernet cable. Perhaps the definition file has additional
> information for example that to power on the board you have to apply
> power and then press the power button. You could then create a shelf
> with the power connected, a USB relay for the power button and
> serial.
> You could then create a mapping file on a USB stick that describes:
> 
> - That this relates to a community definition file called BBB.
> - BBB Power is connected to the shelf power supply
> - BBB power button is connected to a USB relay of type XXX on the
> shelf with a specific relative physical path (from the end of the
> shelf) using port 1
> - BBB serial is connected to a USB serial on the shelf with a
> specific
> relative physical path (from the end of the shelf)
> - The remainder of the other connections are not made.
> 
> When the farm detects the USB it can present a BBB to the user with
> information about what capabilities are available. The user can then
> interact using BBB terminology - e.g. press BBB power button, or more
> generally - e.g. turn board on.
> 
> Of course I've made many assumptions here about shelves, the hardware
> within it etc - but this is just an idea for discussion.

We've thought about something for labgrid as well, but discarded this
approach for now. I'll try to explain what I see as the downsides and
what labgrid currently does instead. 

Most of us have board farms to test embedded software on real hardware.
This is usually because we can't find the relevant problems by software
only testing in CI or emulation. This doesn't mean that we don't need
normal CI testing, but that the remaining "unusual" stuff is what makes
standard interfaces insufficient.

I'd argue that this unusual stuff the main reason we build custom
systems in the first place. Otherwise we (or our customers) would just
an embedded PC. So we can expect that we'll always boards which need
new interfaces/ peripherals/instruments to test. 

So if we standardize on something, it shouldn't make it harder to
automate the per-project custom stuff. :)

Some examples:

1) For me the main reason to move a board from the farm to my desk is
that want to connect something beyond the normal serial, network, USB
and power, such as power meter. Nevertheless, I still want to run my
automated tests as before, but which power measurements correlated to
the tests. So I'd need modify the board configuration anyway (and again
when moving it back to the rack), maybe even several times when moving
it back and forth.

Having to modify a file on a USB stick for that seems like it would
become tedious. I'd prefer a way to just *identify* the board, and then
have the configuration available some where else. The it could be
version controlled via git, my colleagues could look up my setup when
I'm on vacation and moving boards even between rack "places" with
different interfaces becomes easier.

2) We have shard resources in some racks (such as a CAN bus tester),
which is connected to different rack places as required. Reconfiguring
this should be as easy as possible (and not require touch files on
several USB sticks).

3) We have several instances of the same board and it should be easy to
make sure that they are configured in the same way. That seems more
difficult with a file on a USB stick.

In labrid, this part can be done with the remote infrastructure. There
is one central service ("coordinator") which knows which Resources are
available and how they are assembled to form "places". Resources are
announces by "exporters" running on several hosts and are things like
serial ports, power switches, buttons, GPIOs, USB android-
fastboot/bootloader interfaces or SD muxes.

A place is then configured to use these resources using the same CLI
tool that's used to lock and control a board. This works out well for
us because the resources available in each rack are relatively static
when compared to the boards connected to them.

The coordinator makes it unlikely that unrelated places are broken by
mistake. Even in that case, it's usually easy to debug and fix wrong
config changes.

To make this more automatic, I'd prefer a way to *identify* a shelf, so
a centrally stored configuration could be applied to this shelf
independent of where it is currently connected.

I'm not yet sure if we need more automation for this part soon. In
practice, I can (re-)configure a labgrid place quickly and my
colleagues can also use it immediately. Also, I'd like to avoid
additional complexity in the board configuration and keep it as
transparent as possible. We already have enough complexity in the
systems we're trying to test, would having to debug the automation
infrastructure. ;)

> Would this split of layers be beneficial?

Yes. :)

> Would such a standard be helpful?

Maybe. Labgrid's design is explicitly not final, so I'm open to be
convinced to do things differently of some other approach works better,
even if that requires changes by current users.

> What areas of farming need to be addressed?

I'd be very interested in how your daily use-cases and workflows look
like.

> Is this the right place to start?
Having a common terminology seems important (but I'm not sure about
Grass, Cow and Dairy ;).

> Keen to all feedback. (Apologies for the rambling and terrible farm
> analogies :D)
I've probably rambled as well. :)

Thanks,
Jan

[1] https://github.com/labgrid-project/labgrid/blob/master/labgrid/prot
ocol/powerprotocol.py

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |