[yocto-ab] Project Status: The good, the bad and the worrying

Wed Feb 13 08:00:25 PST 2019

RiCHARD,

On 2/12/19 9:32 AM, Richard Purdie wrote:
> I wanted to give the advisory board an update on where the project
> stands today. The project has faced a lot of upheaval with changes in
> personnel, processes, budget and structure over the last year or so.
> I'd like to share my thoughts on this from the technical perspective of
> the project. There are things that are going well, areas where we have
> issues and areas we may need to worry.
>
> We're due to meet in March and the intent here is to give people time
> to think about where we're at, allowing us to discuss the ways forward
> at Open Source Leadership Summit.
>
> I've gone into details here so its a longer email. I do hope people
> find this interesting/useful and I've not got too detailed.

Thanks for sending out this update.
see comments below.

> The Good
> ========
>
> The project adoption seems to be growing, I continue to be surprised
> and pleased at the interesting places we've seen the project being
> used. Being used in satellites passing mars was cool!
>
> At the technical level we are holding our heads above water. We are
> making progress on a few new interesting features:
>
> * sstate hash equivalency (basic implementation)
> * autobuilder now using latest upstream buildbot
>   (https://autobuilder.yoctoproject.org/typhoon)
> * build performance measurements using the autobuilder
> * graphical reporting of build performance comparisions
> * automated testing mingw SDK using wine
> * automated running and collection ptest results
> * server consolidation into one data centre/set of racks
> * upgrade several old/problematic components to modern versions:
>   - python 3.7
>   - perl 5.28 using perl-cross to build
> * Major toolchain bootstrap process cleanup + performance speedup
> * The layer index and recipe reporting systems merged and are now 
>   updated to the new codebase which many new enhancements and features
>
> We should be on track during 2.7 to:
>
> * convert qemuarm to a modern emulation target
> * automate ptest on arm using KVM
> * test builds on arm servers
> * removal of testopia replacing it with results comparison and 
>   reporting code leading to automated QA runs
> * allow GL passthrough in qemu
>
> The distro is fairly up to date in that we're using modern upstream
> versions of much of our software and where we have differences, they're
> comparatively minor.
>
> The autobuilder work through the consolidation, new targets, new tools
> and so on is letting us run tests more easily and faster, with wider
> coverage.
>
> By numbers, our bug tracking record doesn't look too bad either:
>
> https://wiki.yoctoproject.org/charts/combo.html
>
> Overall open numbers are dropping, perhaps due to fewer features
> getting filed and worked on. The Weighted Defect Density (WDD), our
> long tracked metric shows a decline. The first chart does show the
> number of medium+ bugs slowing rising and see below for an alternate
> perspective.
>
> The Bad and the Worrying
> ========================
>
> The above sounds great, sadly its not all good news. 
>
> In particular, the maintainers model for recipes in OE-Core is
> faltering. We had a push for this, it was setup and started but isn't
> proving to be quite as dynamic/responsive as we'd hoped. We're
> struggling to get responses from the maintainers, mainly due to the
> time pressures placed upon them through other distractions. The
> "Automatic Upgrade Helper" can send out 'perfect' patches yet the
> maintainers don't review and submit them. If they don't have time for
> that there is little hope of a complex upgrade.
>
> We had hoped to roll OE-Core best practises out to other layers but
> when the model isn't entirely working for OE-Core, this doesn't make
> sense.
>
> Stable branch maintainership seems to be a lone battle for Armin right
> now. The wider community is also struggling for layer maintainers (e.g.
> meta-openembedded).
>
> We are also struggling to have anyone attend bug triage meetings, or to
> take on and fix bugs. I'm not sure how to encourage and reward people
> for doing this. A significant chunk of this work is being done by
> WindRiver but we need to find a way to scale and involve 'new blood'.
> The metrics show improvements in bug trends, probably as we have fewer
> features in the pipeline and we have triaged and closed a lot of old
> bugs but the rising medium+ bugs probably reflects a real underlying
> problem.
>
> A good case study is ptests. We now have much better insight into the
> numbers of tests passing, failing and timing out. I could file a ton of
> bugzilla entries about the things we need to fix which these have
> highlighted, there are real problems in there. The trouble is there is
> nobody to work on this. I can't even get anyone to help improve the
> overall reporting of the ptest results.
>
> In general our work on features does feel "stalled". I'd personally
> love to spend time on the sstate hash equivalency work but I can't as
> there are too many other urgent issues (including QA) that need
> attention first.
>
> Areas that aren't getting attention:
> * Build performance improvements
> * Reproducibity
> * Security
> * Project process documentation (e.g. LF Core Infrastructure
> Initiative 
>   (CII) Best Practice badgin)
> * Process tooling improvements (to make maintenance easier)
> * Writing new tests
> * Automating existing manual tests/BSP testing
> * No license handling improvements despite our potential
>
> We do still have an infrastructure work backlog (mailing list
> migration, bugzilla upgrade, website backend concerns) and we're
> starting to see autobuilder failing hardware which is in part due to
> the continually deferred autobuilder refresh.
>
> Also, many members layers do not have Yocto Project Compatible Status.
> Meta-Openembedded has issues which many layers depend upon, as does for
> example meta-arago or parts of mentor's layers, with engineers claiming
> lack of time is the reason.

It is also a black hole process. Not sure if this is do to the pending
TLC creation or not.

>
> The SWAT process/team which looks after autobuilder failures is
> basically in freefall at the moment. There are some team members who do
> reply to the weekly rotation to say they know about it, then don't look
> at the autobuilder *at all* until it rotates away from them.
My apologies. I have been slaking these past few months.

>
> A Personal Perspective
> ======================
>
> Day to day I'm trying to do what needs to be done, hopefully to improve
> things, strengthen our position and let the project evolve and grow. I
> can't do everything at once so I'm having to choose where to focus, QA
> being the main target right now as quality is something the project is
> known for and we need to maintain it.
>
> In some ways the project faces a choice. If the members involved decide
> its "good enough", or that somebody else will contribute, it will
> naturally move to a maintenance mode. That may be what people want, I
> do worry that we've not yet reached the point where there is enough
> automated tooling to let many of the extra layers survive though. To be
> as close as we are yet let that fail would be sad from my perspective.
>
>
>
> I'll stop here and let people consider these issues. There is much to
> be positive about but also a few things that could keep me awake at
> night and I'm hoping we can have some good discussion about these
> issues, either here/now via email, offlist, or in person in March.
>
> As ever, if anyone has any questions about this, the project in general
> or anything else, please let me know
>
> Cheers,
>
> Richard
>
>
>