[Automated-testing] A common place for CI results?

Mon May 20 08:32:09 PDT 2019

----- Original Message -----
> From: "Tom Gall" <tom.gall at linaro.org>
> To: kernelci at groups.io, "Dan Rue" <dan.rue at linaro.org>
> Cc: "Tim Bird" <Tim.Bird at sony.com>, vkabatov at redhat.com, automated-testing at yoctoproject.org, info at kernelci.org
> Sent: Wednesday, May 15, 2019 11:06:33 PM
> Subject: Re: A common place for CI results?
> 
> 
> 
> > On May 15, 2019, at 3:33 PM, Dan Rue <dan.rue at linaro.org> wrote:
> > 
> > On Tue, May 14, 2019 at 11:01:35PM +0000, Tim.Bird at sony.com wrote:
> >> 
> >> 
> >>> -----Original Message-----
> >>> From: Veronika Kabatova
> >>> 
> >>> Hi,
> >>> 
> >>> as we know from this list, there's plenty CI systems doing some testing
> >>> on
> >>> the
> >>> upstream kernels (and maybe some others we don't know about).
> >>> 
> >>> It would be great if there was a single common place where all the CI
> >>> systems
> >>> can put their results. This would make it much easier for the kernel
> >>> maintainers and developers to see testing status since they only need to
> >>> check one place instead of having a list of sites/mailing lists where
> >>> each CI
> >>> posts their contributions.
> >>> 
> >>> 
> >>> A few weeks ago, with some people we've been talking about kernelci.org
> >>> being
> >>> in a good place to act as the central upstream kernel CI piece that most
> >>> maintainers already know about. So I'm wondering if it would be possible
> >>> for
> >>> kernelci to also act as an aggregator of all results? There's already an
> >>> API
> >>> for publishing a report [0] so it shouldn't be too hard to adjust it to
> >>> handle and show more information. I also found the beta version for test
> >>> results [1] so actually, most of the needed functionality seems to be
> >>> already
> >>> there. Since there will be multiple CI systems, the source and contact
> >>> point
> >>> for the contributor (so maintainers know whom to ask about results if
> >>> needed)
> >>> would likely be the only missing essential data point.
> >>> 
> >>> 
> >>> The common place for results would also make it easier for new CI systems
> >>> to
> >>> get involved with upstream. There are likely other companies out there
> >>> running
> >>> some tests on kernel internally but don't publish the results anywhere.
> >>> Only
> >>> adding some API calls into their code (with the data they are allowed to
> >>> publish) would make it very simple for them to start contributing. If we
> >>> want
> >>> to make them interested, the starting point needs to be trivial.
> >>> Different
> >>> companies have different setups and policies and they might not be able
> >>> to
> >>> fulfill arbitrary requirements so they opt to not get involved at all,
> >>> which
> >>> is a shame because their results can be useful. After the initial
> >>> "onboarding"
> >>> step they might be willing to contribute more and more too.
> >>> 
> >>> 
> >>> Please let me know if the idea makes sense or if something similar is
> >>> already
> >>> in plans. I'd be happy to contribute to the effort because I believe it
> >>> would
> >>> make everyone's life easier and we'd all benefit from it (and maybe
> >>> someone
> >>> else from my team would be willing to help out too if needed).
> >> 
> >> I never responded to this,
> > 
> > yea, you did. ;)
> > 
> >> but this sounds like a really good idea to me. I don't care much which
> >> backend we aggregate to, but it would be good as a community to start
> >> using one service to start with.  It would help to find issues with
> >> the API, or the results schema, if multiple people started using it.
> >> 
> >> I know that people using Fuego are sending data to their own instances
> >> of KernelCI.  But I don't know what the issues are for sending this
> >> data to a shared KernelCI service.
> >> 
> >> I would be interested in hooking up my lab to send Fuego results to
> >> KernelCI.  This would be a good exercise.  I'm not sure what the next
> >> steps would be, but maybe we could discuss this on the next automated
> >> testing conference call.
> > 
> > OK here's my idea.
> > 
> > I don't personally think kernelci (or LKFT) are set up to aggregate
> > results currently. We have too many assumptions about where tests are
> > coming from, how things are built, etc. In other words, dealing with
> > noisy data is going to be non-trivial in any existing project.
> 
> I completely agree.
> 

This is a good point. I'm totally fine with having a separate independent
place for aggregation.

> > I would propose aggregating data into something like google's BigQuery.
> > This has a few benefits:
> > - Non-opinionated place to hold structured data
> > - Allows many downstream use-cases
> > - Managed hosting, and data is publicly available
> > - Storage is sponsored by google as a part of
> >  https://cloud.google.com/bigquery/public-data/
> > - First 1TB of query per 'project' is free, and users pay for more
> >  queries than that
> 
> I very much like this idea. I do lots of android kernel testing
> and being able to work with / compare / contribute to what
> is essentially a pile of data in BQ would be great. As an
> end user working with the data I’d also have lots of dash
> board options to customize and share queries with others.
> 
> > With storage taken care of, how do we get the data in?
> 
> > First, we'll need some canonical data structure defined. I would
> > approach defining the canonical structure in conjunction with the first
> > few projects that are interested in contributing their results. Each
> > project will have an ETL pipeline which will extract the test results
> > from a given project (such as kernelci, lkft, etc), translate it into
> > the canonical data structure, and load it into the google bigquery
> > dataset at a regular interval or in real-time. The translation layer is
> > where things like test names are handled.
> 

+1, exactly how I imagined this part.

> Exactly. I would hope that the various projects that are producing
> data would be motived to plug in. After all, it makes the data
> they are producing more useful and available to a larger group
> of people.
> 
> > The things this leaves me wanting are:
> > - raw data storage. It would be nice if raw data were stored somewhere
> >  permanent in some intermediary place so that later implementations
> >  could happen, and for data that doesn't fit into whatever structure we
> >  end up with.
> 
> I agree.

+1

> 
> > - time, to actually try it and find the gaps. This is just an idea I've
> >  been thinking about. Anyone with experience here that can help flesh
> >  this out?
> 
> I’m willing to lend a hand.
> 

Thanks for starting up a specific proposal! I agree with everything that was
brought up. I'll try to find time to participate in the implementation part
too (although my experience with data storage is.. limited, I should be able
to help out with the structure prototyping and maybe other parts too).

Thanks again,

Veronika
CKI Project

> > Dan
> > 
> > --
> > Linaro - Kernel Validation
> 
> Tom
> 
> —
> Directory, Linaro Consumer Group
> 
>