[Automated-testing] Board management API discussion at ATS - my ideas

Tue Oct 22 12:29:28 PDT 2019

On Tue, 22 Oct 2019 at 16:26, Jan Lübbe <jlu at pengutronix.de> wrote:
>
> I'm cutting out large quoted parts...
>
> Tim: Maybe you can say a bit more on your motivation for introducing a
> BM API?
>
> On Tue, 2019-10-22 at 15:34 +0100, Milosz Wasilewski wrote:
> > On Tue, 22 Oct 2019 at 10:00, Jan Lübbe <jlu at pengutronix.de> wrote:
> > > On Mon, 2019-10-21 at 10:02 +0100, Milosz Wasilewski wrote:
> > > > On Thu, 17 Oct 2019 at 10:18, Jan Lübbe <jlu at pengutronix.de> wrote:
> > > > > On Sat, 2019-10-12 at 17:14 +0000, Tim.Bird at sony.com wrote:
> > > > > > What I'd like to see in a "standard" board management API is a system whereby
> > > >
> > > > I'm a bit confused by this idea. At first it looks interesting but
> > > > when I read further there is more and more confusion. Board
> > > > management is used interchangeably with scheduling which is probably
> > > > wrong.
> > >
> > > I mentioned scheduling in my reply.
> > >
> > > Taking a step back: Currently, the test frameworks have their own board
> > > management layer. The main use-case for a common API at this level
> > > would seem to be to share a board management layer (and so a physical
> > > lab) between multiple test frameworks.
> > >
> > > Another case would be writing a new test framework (which could reuse
> > > one of the existing board management layers), but I don't know if
> > > that's as relevant.
> > >
> > > Are there other cases?
> > >
> > > With the goal of sharing a lab (= BM layer) between test frameworks,
> > > there has to be some coordination. That was the reasoning behind
> > > arguing that for this use-case to work, there would need to be a shard
> > > scheduler. That would then decide which "client" test framework can use
> > > a given board exclusivly.
> >
> > From my perspective this sounds like replacing LAVA completely. The
> > only aspect that is not mentioned here is device provisioning. Overall
> > I'm not sure what benefit that would bring to the lab we're running.
> > I'm really confused.
>
> From my labgrid perspective, there are some things LAVA does well
> (health checks, nice web interface, easy integration with kernelci,
> existing test definitons) that labgrid doesn't have (and likely never
> will).
>
> On the other hand, LAVA (currently) can't easily do other things I use
> daily with labgrid (provisioning boards from scratch, controlling GPIOs
> for different scenarios, interactive sessions).

LAVA might be able to do some of this. For example we already have
provisioning from scratch for Beagle X15. This requires controlling
board boot mode with relays and loading u-boot via xmodem. One thing
I'm not going to commit to is interactive session. I think I explained
my reasoning already. However if you're sure that there is no way to
break the board (as described below) we have a poor man's version of
hacking session. It requires granting people access to dispatchers but
it's possible. In this case board goes to 'maintenance' in LAVA so no
automated jobs will be scheduled on it. After hacking session is
complete board can be returned to LAVA but this also requires user
action. So if someone forgets than the board will be unavailable from
LAVA's perspective.

>
> So my (perhaps unrealistic) hope is to have both share the same boards.
>
> Perhaps Tim can say more about his use-case.
>
> > > > > The minimal interface could be a blocking 'reserve' verb. Would
> > > > > potentially long waiting times be acceptable for the test frameworks?
> > > >
> > > > I don't think it's a good idea. For example returning test results for
> > > > stable Linux RCs is very time sensitive. If the boards are 'reserved'
> > > > by some other users LKFT can't do it's job. So multiple schedulers
> > > > running in the same LAB are a pretty bad idea.
> > >
> > > Yes, for some labs, this won't work well. But for others like our
> > > internal lab, where we often share the single prototype between
> > > developers and CI, it works fine. (Jenkins just waits until the
> > > developers go home)
> >
> > This depends on how the hardware works. Some boards are pretty hard to
> > bring back to 'initial good state' without manually pressing buttons
> > or changing dip switches. This makes it very hard for sharing between
> > manual and automated use cases.
>
> Ah. See below.
>
> > > > > A more complex solution would be to have only one shared scheduler per
> > > > > lab, which would need to be accessible via the board management API and
> > > > > part of that layer. How to support that in Lava or fuego/Jenkins
> > > > > doesn't seem obvious to me.
> > > >
> > > > LAVA has it's own scheduler. As I wrote above I can imagine common API
> > > > between scheduler and executor but not sharing boards between
> > > > different schedulers using board management. In this scenario board
> > > > management becomes 'master scheduler'.
> > >
> > > Yes, that was the point I was trying to make. There can only be one
> > > scheduler in a lab. So the question boils down to if it's reasonable to
> > > have i.e. LAVA's scheduler replaced (or controlled) by a 'master
> > > scheduler'....
> >
> > I don't think LAVA scheduler would be easy to replace. We support some
> > exotic use cases like multinode.
>
> Agreed, the LAVA scheduler is complex as it has to cover many different
> use-cases. I just think without some coordination, sharing a lab won't
> really work.
>
> > > > > So if we could find a way to have a common scheduler and control
> > > > > power+console via subprocess calls, shared labs would become a
> > > > > possibility. Then one could use different test frameworks on the same
> > > > > HW, even with interactive developer access, depending on what fits best
> > > > > for each individual use-case.
> > > >
> > > > I'm not a big fan of sharing boards between automated and manual use
> > > > cases. This usually leads to an increase in time spent on board
> > > > houskeeping.
> > >
> > > It's been working well for our use case, and we often have only very
> > > few prototypes which need to be utilized for development and testing.
> > >
> >
> > I understand that. It's a pretty common case.
> >
> > > Which factors cause housekeeping issues when sharing boards in your
> > > experience?
> >
> > It's usually 'human factor'. People hacking the boards leave it in a
> > state that is not suitable for LAVA. For example changing a bootloader
> > is likely to break LAVA jobs. Same story with changing state of relays
> > of managed hubs. Leaving shared drive full is also an issue. As I
> > mentioned before some of these problems are not easy to recover from
> > and for some boards manual intervention is required. So from my pov
> > sharing hardware between automated testing and hacking sessions is not
> > an option. It can be done but in my case I don't think I have a big
> > enough reason to try that.
>
> Thanks, now I see why that would cause problems in your lab.
>
> As we often to bootloader development, almost all of our boards are
> setup in a way that we can start (or flash) a complete new system,
> including the bootloader. Also any external storage, power relays or
> GPIOs are always controlled by the farm.
>
> And this is also used by the automated tests. So there's not really
> something you can break on a board for the next user (maybe except
> burning fuses).

If you have any board I can buy that can do what you describe and is
supported by labgrid, I'd like to make an experiment and use labgrid
as BM layer for lava dispatcher. That might work.

Best Regards,
milosz

>
> Regards,
> Jan
> --
> Pengutronix e.K.                           |                             |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>