[yocto-ab] Project Status: The good, the bad and the worrying

Tue Feb 12 09:32:35 PST 2019

I wanted to give the advisory board an update on where the project
stands today. The project has faced a lot of upheaval with changes in
personnel, processes, budget and structure over the last year or so.
I'd like to share my thoughts on this from the technical perspective of
the project. There are things that are going well, areas where we have
issues and areas we may need to worry.

We're due to meet in March and the intent here is to give people time
to think about where we're at, allowing us to discuss the ways forward
at Open Source Leadership Summit.

I've gone into details here so its a longer email. I do hope people
find this interesting/useful and I've not got too detailed.

The Good
========

The project adoption seems to be growing, I continue to be surprised
and pleased at the interesting places we've seen the project being
used. Being used in satellites passing mars was cool!

At the technical level we are holding our heads above water. We are
making progress on a few new interesting features:

* sstate hash equivalency (basic implementation)
* autobuilder now using latest upstream buildbot
  (https://autobuilder.yoctoproject.org/typhoon)
* build performance measurements using the autobuilder
* graphical reporting of build performance comparisions
* automated testing mingw SDK using wine
* automated running and collection ptest results
* server consolidation into one data centre/set of racks
* upgrade several old/problematic components to modern versions:
  - python 3.7
  - perl 5.28 using perl-cross to build
* Major toolchain bootstrap process cleanup + performance speedup
* The layer index and recipe reporting systems merged and are now 
  updated to the new codebase which many new enhancements and features

We should be on track during 2.7 to:

* convert qemuarm to a modern emulation target
* automate ptest on arm using KVM
* test builds on arm servers
* removal of testopia replacing it with results comparison and 
  reporting code leading to automated QA runs
* allow GL passthrough in qemu

The distro is fairly up to date in that we're using modern upstream
versions of much of our software and where we have differences, they're
comparatively minor.

The autobuilder work through the consolidation, new targets, new tools
and so on is letting us run tests more easily and faster, with wider
coverage.

By numbers, our bug tracking record doesn't look too bad either:

https://wiki.yoctoproject.org/charts/combo.html

Overall open numbers are dropping, perhaps due to fewer features
getting filed and worked on. The Weighted Defect Density (WDD), our
long tracked metric shows a decline. The first chart does show the
number of medium+ bugs slowing rising and see below for an alternate
perspective.

The Bad and the Worrying
========================

The above sounds great, sadly its not all good news. 

In particular, the maintainers model for recipes in OE-Core is
faltering. We had a push for this, it was setup and started but isn't
proving to be quite as dynamic/responsive as we'd hoped. We're
struggling to get responses from the maintainers, mainly due to the
time pressures placed upon them through other distractions. The
"Automatic Upgrade Helper" can send out 'perfect' patches yet the
maintainers don't review and submit them. If they don't have time for
that there is little hope of a complex upgrade.

We had hoped to roll OE-Core best practises out to other layers but
when the model isn't entirely working for OE-Core, this doesn't make
sense.

Stable branch maintainership seems to be a lone battle for Armin right
now. The wider community is also struggling for layer maintainers (e.g.
meta-openembedded).

We are also struggling to have anyone attend bug triage meetings, or to
take on and fix bugs. I'm not sure how to encourage and reward people
for doing this. A significant chunk of this work is being done by
WindRiver but we need to find a way to scale and involve 'new blood'.
The metrics show improvements in bug trends, probably as we have fewer
features in the pipeline and we have triaged and closed a lot of old
bugs but the rising medium+ bugs probably reflects a real underlying
problem.

A good case study is ptests. We now have much better insight into the
numbers of tests passing, failing and timing out. I could file a ton of
bugzilla entries about the things we need to fix which these have
highlighted, there are real problems in there. The trouble is there is
nobody to work on this. I can't even get anyone to help improve the
overall reporting of the ptest results.

In general our work on features does feel "stalled". I'd personally
love to spend time on the sstate hash equivalency work but I can't as
there are too many other urgent issues (including QA) that need
attention first.

Areas that aren't getting attention:
* Build performance improvements
* Reproducibity
* Security
* Project process documentation (e.g. LF Core Infrastructure
Initiative 
  (CII) Best Practice badgin)
* Process tooling improvements (to make maintenance easier)
* Writing new tests
* Automating existing manual tests/BSP testing
* No license handling improvements despite our potential

We do still have an infrastructure work backlog (mailing list
migration, bugzilla upgrade, website backend concerns) and we're
starting to see autobuilder failing hardware which is in part due to
the continually deferred autobuilder refresh.

Also, many members layers do not have Yocto Project Compatible Status.
Meta-Openembedded has issues which many layers depend upon, as does for
example meta-arago or parts of mentor's layers, with engineers claiming
lack of time is the reason.

The SWAT process/team which looks after autobuilder failures is
basically in freefall at the moment. There are some team members who do
reply to the weekly rotation to say they know about it, then don't look
at the autobuilder *at all* until it rotates away from them.

A Personal Perspective
======================

Day to day I'm trying to do what needs to be done, hopefully to improve
things, strengthen our position and let the project evolve and grow. I
can't do everything at once so I'm having to choose where to focus, QA
being the main target right now as quality is something the project is
known for and we need to maintain it.

In some ways the project faces a choice. If the members involved decide
its "good enough", or that somebody else will contribute, it will
naturally move to a maintenance mode. That may be what people want, I
do worry that we've not yet reached the point where there is enough
automated tooling to let many of the extra layers survive though. To be
as close as we are yet let that fail would be sad from my perspective.

I'll stop here and let people consider these issues. There is much to
be positive about but also a few things that could keep me awake at
night and I'm hoping we can have some good discussion about these
issues, either here/now via email, offlist, or in person in March.

As ever, if anyone has any questions about this, the project in general
or anything else, please let me know

Cheers,

Richard