[Automated-testing] Farming together - areas of collobration

Wed Nov 15 14:55:00 PST 2017

> -----Original Message-----
> From: Jan Lübbe [mailto:jlu at pengutronix.de]
> 
> Hello Andrew,
> 
> thanks for pushing this forward!

Thanks for joining in : )

> 
> On Sat, 2017-11-11 at 00:23 +0000, Andrew Murray wrote:

> > It's my view that there are two main use-cases for farms - the first
> > being to facilitate automated testing and the second to facilitate
> > remote working (interactive/real-time access to hardware). Any efforts
> > we undertake should have these use-cases in mind - as it seems that
> > much of the existing software available is narrowly targeted at a
> > specific task or user within one of these use-cases. (Are there any
> > other use cases?).
> 
> One other use-case I have in mind for labgrid is using it for automated image
> deployment during production. As this often requires custom integration
> (databases, UIs, …), this basically boils down to making the bottom two layers
> easy to reuse as a library.

As in factory commissioning type of environment? I guess this could also be used in factory production to test the device works as expected - i.e. the same abilities as provided by a farm - except it doesn't look like a farm (but leveraging the ability to control a variety of power controllers, serial etc and in the future video/audio testing etc).

> 
> > My vision of a farm is that it would consist of three distinct
> > layers...
> >
> > The Grass - This is the bottom layer which provides the bare minimum
> > software abstraction to the fundamental capabilities of the farm.
> > This consists of physical devices and suitable drivers for them. The
> > interface to higher layers is of the language of turn power port 2 on,
> > bridge relay 5 on, etc. The goal here is for us to be able pick up
> > some hardware off the shelf (e.g. a APC power switch) and to have a
> > 'driver' already available from this community that fits into this
> > view of the world. In order to achieve this it would be necessary to
> > define categories of devices (power, relay, serial), the verbs (power
> > cycle, power off, etc) and a way of addressing physical devices. I
> > believe the LabGrid stack has a good model in this respect.
> 
> To rephrase this from the labgrid perspective, this layer consists of three parts
> "Resources", "Drivers" and "Protocols". Resources describe which interfaces
> (power, serial, USB, buttons/jumpers, ...) are available and  the physical details.
> Drivers contain the functionality provide a "Protocol" by controlling a Resource.

Thanks - I read the documentation (which is very nice btw and hence my delay in responding) and my simplified view is resources are physical things, drivers know what to do with those things - and in an object oriented way they implement Interfaces (OO style) which you call Protocols. The benefit of the Protocols are to give a common API to drivers. So for a UART - you may have a resource which is little more than a marker to say it’s a particular device node. The driver knows how to read data from that device node - i.e. use Python's Serial. And the driver implements the ConsoleProtocol which provides a get/write type of interface.

A difference between my layering and LabGrid is that LabGrid's resource and drivers are more broad - i.e you have resources and drivers for things that don't represent physical hardware - such as an SSH driver, Shell driver and UBoot driver. I guess in my thinking so far, I've considered all these things to be at a higher level and kept the lower layers to represent / emulate a physical device and interactions with it. I don’t know if this is a good thing or bad thing. 

> 
> Splitting this up provides one main benefit: Each resource can be defined
> independent of an individual board and be reused by name, abstracting away
> some of the low-level details (such as the TCP port on a 8x network serial
> server). The NetworkPortPort, for example, is configured like this:
>   NetworkPowerPort:
>     model: gude
>     host: powerswitch.example.computer
>     index: 0
> 
> On the other side, this split allows the Driver configuration to handle only the
> (often project/board-specific) SW details. For example:
>   UBootDriver:
>     prompt: 'Uboot> '
>     password: 'secret'
> 
> The Protocol provided by a Driver is basically what you call "verbs".
> For example, the PowerProtocol [1] defines on, off and cycle. Several drivers
> can provide the same Protocol.

I would suspect we are all quite well aligned on the verbs - this might be an easy one to define.

Looking at the LabGrid example configuration...

targets:
  main:
    resources:
      RawSerialPort:
        port: "/dev/ttyUSB0"
    drivers:
      ManualPowerDriver:
        name: "example"
      SerialDriver: {}
      ShellDriver:
        prompt: 'root@\w+:[^ ]+ '
        login_prompt: ' login: '
        username: 'root'

For a given board I have a urge to try and decouple this a little. Everything ShellDriver related is related to the software running on a board rather than the board iteself. Ideally you would describe a board with simply resources - and the name of the needed drivers (maybe it could detect which drivers are needed).

> > In my experience, the most difficult part of farming is making it
> > robust and reliable. It would be nice to not be scared of replacing a
> > USB hub without fear of everything breaking and boards seeing the
> > wrong UARTs etc. It would be nice to be alerted when a board is no
> > longer working, etc.
> 
> I think this is difficult mainly because of another (implicit) goal:
> The farm should be flexible enough to handle unusual or new interfaces.
> 
> If you'd know everything you will require beforehand, having a reliable farm it
> not too difficult (buy high quality network relays, serial console servers and
> industrial USB hubs).
> 
> But, reliably suffers when you need to accommodate custom interfaces:
> You add additional wiring, more USB ports, connect one board to another, a
> logic analyzer/scope, a power meter, an SD-Mux, a beaglebone to emulate an
> USB-Stick, … Doing that for one board is also not that hard either, but if most
> boards have some custom stuff, the board farm infrastructure needs to be
> flexible and extensible enough.

I agree - Perhaps the physical architecture should be designed such that these customisations are limited in scope to beyond a farm connector - I guess I'm saying that you should never be able to break another board in the farm because of customising another board.  

> 
> > I'm keen to focus on one small part of this stack and do it well - I
> > don't see the need to abandon existing projects in this field, but
> > instead to align in certain areas.
> >
> > In my view it's the bottom two layers that need the most work. I have
> > a particular vision for how part of the 'Grass' layer should work...
> >
> > I find that when managing a farm, it would be nice to add new boards
> > without having to change some integral farm configuration or have
> > particular access rights to 'the farm' to plumb in a new board. I.e.
> > so that anyone can add a farm without risk of breaking anything else.
> > In our farm we install boards on rack-mount shelves - I think it would
> > be a fantastic idea if we could define a standard that:
> >
> > - Standardises connectivity between a shelf and the farm PC (perhaps
> > just USB, Ethernet and power)
> > - Requires each shelf to contain a USB stick with a file that
> > describes the board, the physical USB topology between the shelf and
> > its components (USB serial adaptors etc)
> >
> > This immediately opens up some benefits:
> >
> > - You can now make a new piece of hardware available to farm users by
> > creating a shelf and plugging it into the farm - no need to go and
> > configure the farm - this makes it really easy.
> > - Likewise you can plug your shelf into any shelf of the farm - i.e.
> > makes maintenance really easy
> > - You could plug your shelf into your PC and farm software on your PC
> > would detect the hardware and give you access.
> > - You could plug your shelf into anyone elses farm.
> >
> > As a slightly more concrete example, imagine you have a beagle bone
> > black, there would be a community definition file available that
> > describes that it is a board named 'Beagle Bone Black' which describes
> > that it has a power jack, a power button, a reset button, serial, USB
> > and an Ethernet cable. Perhaps the definition file has additional
> > information for example that to power on the board you have to apply
> > power and then press the power button. You could then create a shelf
> > with the power connected, a USB relay for the power button and serial.
> > You could then create a mapping file on a USB stick that describes:
> >
> > - That this relates to a community definition file called BBB.
> > - BBB Power is connected to the shelf power supply
> > - BBB power button is connected to a USB relay of type XXX on the
> > shelf with a specific relative physical path (from the end of the
> > shelf) using port 1
> > - BBB serial is connected to a USB serial on the shelf with a specific
> > relative physical path (from the end of the shelf)
> > - The remainder of the other connections are not made.
> >
> > When the farm detects the USB it can present a BBB to the user with
> > information about what capabilities are available. The user can then
> > interact using BBB terminology - e.g. press BBB power button, or more
> > generally - e.g. turn board on.
> >
> > Of course I've made many assumptions here about shelves, the hardware
> > within it etc - but this is just an idea for discussion.
> 
> We've thought about something for labgrid as well, but discarded this approach
> for now. I'll try to explain what I see as the downsides and what labgrid
> currently does instead.
> 
> Most of us have board farms to test embedded software on real hardware.
> This is usually because we can't find the relevant problems by software only
> testing in CI or emulation. This doesn't mean that we don't need normal CI
> testing, but that the remaining "unusual" stuff is what makes standard
> interfaces insufficient.
> 
> I'd argue that this unusual stuff the main reason we build custom systems in
> the first place. Otherwise we (or our customers) would just an embedded PC.
> So we can expect that we'll always boards which need new interfaces/
> peripherals/instruments to test.
> 
> So if we standardize on something, it shouldn't make it harder to automate the
> per-project custom stuff. :)

Agreed, but after all there are only so many different ways to connect a board to a PC. 

> 
> Some examples:
> 
> 1) For me the main reason to move a board from the farm to my desk is that
> want to connect something beyond the normal serial, network, USB and power,
> such as power meter. Nevertheless, I still want to run my automated tests as
> before, but which power measurements correlated to the tests. So I'd need
> modify the board configuration anyway (and again when moving it back to the
> rack), maybe even several times when moving it back and forth.
> 
> Having to modify a file on a USB stick for that seems like it would become
> tedious. I'd prefer a way to just *identify* the board, and then have the
> configuration available some where else. The it could be version controlled via
> git, my colleagues could look up my setup when I'm on vacation and moving
> boards even between rack "places" with different interfaces becomes easier.

Perhaps a USB stick with an identifier on it, and then the farm can find a GIT location for the board? Perhaps it can even write this configuration back to the USB stick for when moving to someone elses farm without access to GIT?

> 
> 2) We have shard resources in some racks (such as a CAN bus tester), which is
> connected to different rack places as required. Reconfiguring this should be as
> easy as possible (and not require touch files on several USB sticks).

Yeah I agree actually, it would be tedious.

> 
> 3) We have several instances of the same board and it should be easy to make
> sure that they are configured in the same way. That seems more difficult with a
> file on a USB stick.

Not if the stick simply says I'm a BeagleBone, and these are the physical things I'm connected to. And thus there is some standard BeagleBone definition somewhere else. But this was only an idea.

I think the solution to all of this is in the detail somewhere.

> 
> 
> In labrid, this part can be done with the remote infrastructure. There is one
> central service ("coordinator") which knows which Resources are available and
> how they are assembled to form "places". Resources are announces by
> "exporters" running on several hosts and are things like serial ports, power
> switches, buttons, GPIOs, USB android- fastboot/bootloader interfaces or SD
> muxes.
> 
> A place is then configured to use these resources using the same CLI tool that's
> used to lock and control a board. This works out well for us because the
> resources available in each rack are relatively static when compared to the
> boards connected to them.
> 
> The coordinator makes it unlikely that unrelated places are broken by mistake.
> Even in that case, it's usually easy to debug and fix wrong config changes.
> 
> 
> To make this more automatic, I'd prefer a way to *identify* a shelf, so a
> centrally stored configuration could be applied to this shelf independent of
> where it is currently connected.

Yes - this really one of the key goals for me too. It's important to identify a shelf because it will presumably have some non-removable connectors that you can connect a board to.

> 
> 
> I'm not yet sure if we need more automation for this part soon. In practice, I
> can (re-)configure a labgrid place quickly and my colleagues can also use it
> immediately. Also, I'd like to avoid additional complexity in the board
> configuration and keep it as transparent as possible. We already have enough
> complexity in the systems we're trying to test, would having to debug the
> automation infrastructure. ;)

This comes down to a maintenance issue - how easy is it to break configurations, how easy is it for others to make changes, how easy is it for others to break other peoples configurations. Does this help make it more robust.

I like the idea of a board only being able to use resources on its shelf, it somehow simplifies the problem.

> 
> > Would this split of layers be beneficial?
> 
> Yes. :)
> 
> > Would such a standard be helpful?
> 
> Maybe. Labgrid's design is explicitly not final, so I'm open to be convinced to do
> things differently of some other approach works better, even if that requires
> changes by current users.

For me the most interesting part is the configuration files. Effectively this is kinda standard. I can imagine adapting your board configuration files slightly. For example instead of a serial resource being described as /dev/ttyUSB0 - it might instead describe it as a physical path from the point where a USB cable connects to a hub on the shelf. Thus the farm has to do the auto detection of the shelf to determine how to complete the physical path to the point where it enters the farm PC. But of course this depends on a standard architecture, shelf design etc. 

> 
> > What areas of farming need to be addressed?
> 
> I'd be very interested in how your daily use-cases and workflows look like.

We have a few use-cases:

- Some boards are 'used' by kernelci - which sit on top of our 'ebfarm' shell script scripts
- Some boards are 'used' by engineers - they use the 'ebfarm' scripts to 'use' a board and then to power on off etc. The act of 'using' a board makes the dev nodes for serial etc appear in a given workspace, it also changes some symbolic links in our NFS/TFTP server so that we can share a board between two develops working on completely different things. A scripts allow workspaces to be made.
- We have Jenkins which also uses boards to run tests - our tests are mostly 'expect' based - again on top of our ebfarm scripts.

Users connect to a container via SSH and have conditional access to boards and their workspaces. The ebfarm scripts are actually a wrapper around SSH which send the commands to another script on the real farm PC. 

> 
> > Is this the right place to start?
> Having a common terminology seems important (but I'm not sure about Grass,
> Cow and Dairy ;).
> 
> > Keen to all feedback. (Apologies for the rambling and terrible farm
> > analogies :D)
> I've probably rambled as well. :)
> 
> Thanks,
> Jan

Thanks,

Andrew Murray

> Pengutronix e.K.                           |                             |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |