[Automated-testing] Farming together - areas of collobration

Tue Nov 14 13:16:01 PST 2017

> -----Original Message-----
> From: Andrew Murray on Friday, November 10, 2017 4:24 PM
> Following on from the "Farming Together" BOF at ELCE [1][2], I'm keen
> to collaborate with this new community to harmonise automated testing
> infrastructure tools.

Thanks for kickstarting the discussion and activities.

> It's my view that there are two main use-cases for farms - the first
> being to facilitate automated testing and the second to facilitate
> remote working (interactive/real-time access to hardware). Any efforts
> we undertake should have these use-cases in mind - as it seems that
> much of the existing software available is narrowly targeted at a
> specific task or user within one of these use-cases. (Are there any
> other use cases?).

Those are two use cases that are related, and that would be good to support.

> 
> My vision of a farm is that it would consist of three distinct layers...

I think defining the layering is good.  It will be good to figure out,
if something breaks the layering or has different layering, why it has
done so, and why.  Is it something architectural that ties things together?
Or just convenience and not considering other use cases that could 
benefit from more modular approaches?

> The Grass - This is the bottom layer which provides the bare minimum
> software abstraction to the fundamental capabilities of the farm. This
> consists of physical devices and suitable drivers for them. The
> interface to higher layers is of the language of turn power port 2 on,
> bridge relay 5 on, etc. The goal here is for us to be able pick up
> some hardware off the shelf (e.g. a APC power switch) and to have a
> 'driver' already available from this community that fits into this
> view of the world. In order to achieve this it would be necessary to
> define categories of devices (power, relay, serial), the verbs (power
> cycle, power off, etc) and a way of addressing physical devices.

Agreed.  I have thought it would be good to do some kind of
survey to see what people are already doing here.  The
tools I know of in this space are:
 - labgrid
 - ttc
 - libvirt/virsh  (I only just learned about this one at ELCE)
 - pduclient

> I believe the LabGrid stack has a good model in this respect.

I don't know enough about labgrid to know how it compares.
I know that LAVA (and thus kernelci) uses pduclient, but when
I looked at that one in detail I was underwhelmed.  Apparently
there's a lot of out-of-tree "drivers" (the commands for various
items).

There's a chicken-and-egg problem here (and probably for all
layers).  Until one system dominates, there continues to be
fragmentation, as no one has incentives to add something to
one system or another, unless they are using it themselves.

> 
> The Cow - This is the middle layer which uses the bottom layer and
> provides an abstraction of a board, this is where boards are defined
> and related to the physical devices. This layer would manage exclusive
> access to boards and self-tests. Users of this layer could find out
> what boards are available (access control), what their capabilities
> are and access to those capabilities. The language used here may be
> turn board on, get serial port, press BOOT switch, etc. The goal here
> is that we can work as a community to create definition of boards - a
> bit like device tree at a high physical level.

Some of the existing tools (ttc, libvirt) are actually at this layer, and
not the lower layer.

> 
> The Dairy - This is the top layer which uses the middle layer. This is
> the application layer, applications can focus on their value-add which
> may be running tests (LAVA) or providing remote access to boards
> (perhaps with additional features such as 'workspaces' with prebuilt
> and predeployed software images).
> 
> The problem we have at the moment is that existing software tries to
> do all three. For example LAVA - it's a great 'Dairy' (application)
> but the person setting up LAVA on a machine has to plumb it into their
> physical hardware. This normally creates a easily breakable link that
> overly couples the hardware to LAVA. It means LAVA has to do more than
> just schedule and run tests - it has to manage boards, it has to check
> they are OK, etc. This is a distraction for LAVA - wouldn't it be
> great if we tell it where our 'Cow' is and it figures out the rest.

I strongly agree with this observation.  In networking, the OSI layer
never took off, but it was a handy way to talk about networking interfaces
for decades, because it at least identified the layers.

I find myself scratching my head about LAVA, because I can't figure
out its boundaries or layers.  I'm sure the same can be said for people
trying to understand Fuego.  Because there haven't been any layers
or APIs, everyone has had to do an agglomeration of services and actions,
and left others as an exercise for the user, leading a lot of systems to
be somewhat amorphous blobs.

> 
> By splitting the stack into layers, we can move beyond the basics of
> farms and really improve the reliability of farms and add features.

The thing that resonated with me at ELCE was discussion about moving
beyond the basics.  We can't seem to get out of the swamp we are mired
in, to do more than just automate loading the system and power cycling.
I'd really like (as an industry) to move on to automating hardware testing,
including video, audio, basic busses, like USB, I2C, and others, etc.
None of those are automated in a generic way, because everyone's
hardware test rigs are custom.  And we'll never get to common test rigs
until we define common interfaces to them.

>We
> can also allow there to be multiple applications, e.g. a board farm
> that is kernelci enabled but also allows remote access for developers.

That would be great.

> In my experience, the most difficult part of farming is making it
> robust and reliable. It would be nice to not be scared of replacing a
> USB hub without fear of everything breaking and boards seeing the
> wrong UARTs etc. It would be nice to be alerted when a board is no
> longer working, etc.
> 
> I'm keen to focus on one small part of this stack and do it well - I
> don't see the need to abandon existing projects in this field, but
> instead to align in certain areas.
> 
> In my view it's the bottom two layers that need the most work. I have
> a particular vision for how part of the 'Grass' layer should work...
> 
> I find that when managing a farm, it would be nice to add new boards
> without having to change some integral farm configuration or have
> particular access rights to 'the farm' to plumb in a new board. I.e.
> so that anyone can add a farm without risk of breaking anything else.
> In our farm we install boards on rack-mount shelves - I think it would
> be a fantastic idea if we could define a standard that:
> 
> - Standardises connectivity between a shelf and the farm PC (perhaps
> just USB, Ethernet and power)
> - Requires each shelf to contain a USB stick with a file that
> describes the board, the physical USB topology between the shelf and
> its components (USB serial adaptors etc)

When you say "shelf", I think "DUT controller".  I would like standardized
connectivity between a DUT controller, and the using software (be it
command line, test framework, etc.).  For me, I'd like to see a standard
interface so that these controllers can be discovered and
utilized with minimal effort.

I think a survey of DUT controllers would be good as well.
I will offer to do one, if no one else is interested, and to put
the results on the elinux wiki.

> 
> This immediately opens up some benefits:
> 
> - You can now make a new piece of hardware available to farm users by
> creating a shelf and plugging it into the farm - no need to go and
> configure the farm - this makes it really easy.
> - Likewise you can plug your shelf into any shelf of the farm - i.e.
> makes maintenance really easy
> - You could plug your shelf into your PC and farm software on your PC
> would detect the hardware and give you access.
> - You could plug your shelf into anyone elses farm.
> 
> As a slightly more concrete example, imagine you have a beagle bone
> black, there would be a community definition file available that
> describes that it is a board named 'Beagle Bone Black' which describes
> that it has a power jack, a power button, a reset button, serial, USB
> and an Ethernet cable. Perhaps the definition file has additional
> information for example that to power on the board you have to apply
> power and then press the power button. You could then create a shelf
> with the power connected, a USB relay for the power button and serial.
> You could then create a mapping file on a USB stick that describes:
> 
> - That this relates to a community definition file called BBB.
> - BBB Power is connected to the shelf power supply
> - BBB power button is connected to a USB relay of type XXX on the
> shelf with a specific relative physical path (from the end of the
> shelf) using port 1
> - BBB serial is connected to a USB serial on the shelf with a specific
> relative physical path (from the end of the shelf)
> - The remainder of the other connections are not made.
> 
> When the farm detects the USB it can present a BBB to the user with
> information about what capabilities are available. The user can then
> interact using BBB terminology - e.g. press BBB power button, or more
> generally - e.g. turn board on.

This sounds like a great vision for what we'd like to see (or at
least what I'd like to see).

> Of course I've made many assumptions here about shelves, the hardware
> within it etc - but this is just an idea for discussion.
> 
> Would this split of layers be beneficial?
I think so.

> 
> Would such a standard be helpful?
I think so.

> 
> What areas of farming need to be addressed?
> 
> Is this the right place to start?

My preference would be to start with some surveys of existing
farm management software and hardware, to see:
1) what capabilities are supported
2) what off-the-shelf drivers or software is already integrated into an existing system
3) what verbs are used
4) what upper-level interfaces are provided
eg. command line, Jenkins plugin, etc.

I also think we should put together a glossary, so we at least have reference
points for what we're talking about.  Maybe this would derive out of the
standards we define - but I think there is some terminology I'd like to make
sure we have in common, just so we understand each other while we
try to come up with standards.

I can take a stab at writing an initial list of terms, and see if we can
get agreement on them, if desired.  (I would put it on the elinux wiki).

Here are some things to consider, in no particular order:

I think one of the most difficult things will be building the ecosystem to
support a solution here.  There are issues with finding a home for this
stuff, in that someone will have to do maintenance work and support use cases
that don't apply to themselves.  In other words, I'm worried about the economic
incentives to help concentrate the collaborative effort required for this.

Also, there will be difficulty getting people with existing systems to switch
over to something new. That's going to be a lot of pain for not much near-term
gain.  Most people just cobble together their own system, get to a fixed level of
automation, then move on to testing or board usage within that framework.

To be a bit more blunt about it, as a concrete example: would LAVA adopt a new
system for low-level board control, if it presented itself?  I kind of doubt it.
Who would do this work?

Finally, who is leading this thing?  I'm happy to participate, but can't spare the time
to actually lead.  And how will decisions about a standard get made?  Who has
"approval" authority, or are we just trying to come up with software and solutions
that are so good they will get adopted as de-facto standards?

Will there be a committee organized?  Do we have regular meetings at ELCE?

I'd love to see this move forward, and will help where I can.
 -- Tim