[Automated-testing] Automatic test result interpretation

Mon Aug 5 07:05:14 PDT 2019

Hello,

Dan Rue <dan.rue at linaro.org> writes:

> On Thu, Jul 18, 2019 at 03:05:51PM +0200, Richard Palethorpe wrote:
>> Hello,
>>
>> This mainly relates to tagging failing tests with bug references (say
>> Bugzilla ID or a mailing list item) and propagating those throughout the
>> community. Regardless of the bug trackers, test suite, test runners or
>> test result database being used.
>>
>> Also automatically detecting anomalies in test results regardless of the
>> test runner used and with or without nicely formatted meta data. As well
>> as other similar problems.
>>
>> I have, roughly speaking, created a data-analyses framework which can
>> help to solve these problems and used it to create a few reports/scripts
>> which handle "bug tag propagation" and some form of anomaly detection.
>>
>> It is kind of difficult to explain this in writing, so please see the
>> following video:
>>
>> https://youtu.be/Nzha4itchg8
>> https://palethorpe.gitlab.io/jdp/reports/ (link to the reports mentioned)
>>
>> I designed/evolved the framework to be able to accept whatever input is
>> available from many disparate sources, side stepping the issue of what
>> test result format or test meta data format to use. Although it still
>> requires effort to integrate a new format or accept results from a new
>> test runner.
>>
>> There is more information available here:
>> https://palethorpe.gitlab.io/jdp/
>>
>> I must stress that the methods we are using right now are quite simple,
>> but we could easily incorporate something like MLJ[1]. I hypothesize
>> that this would allow us to us to automate the vast majority of test
>> result review. As most test failures contain some common pattern which
>> can be used to identify them as a known failure. This might even be
>> achievable purely by using some DSL[2] to specify heuristics.
>>
>> However we first need enough data from enough sources make such efforts
>> worthwhile. Currently it mostly works for our internal uses in the SUSE
>> kernel QA team, but we have not had much serious interest from outside
>> our team, so I will put this to one side and work one something else for
>> a while to avoid tunnel vision. If you are interested, please let me
>> know, so that I can justify the time to make it more accessible.
>>
>> [1] https://julialang.org/blog/2019/05/beyond-ml-pipelines-with-mlj
>> [2] https://palethorpe.gitlab.io/jdp/bug-tagging/#Advanced-Tagging-(Not-implemented-yet)-1
>
> Hi Richard - thank you so much for this. Please don't treat my delay in
> response as a lack of interest, I'm just getting to this corner of my
> inbox now finally. I have some questions, and my notes are below which
> may be useful for others.

Thanks for taking the time.

>
> What does JDP stand for?

I don't remember.

>
> You mentioned a distributed data cache and a local cache. Tell me more
> :) What is it and how does it work? Sounds like this is redis but won't
> it be expensive to recreate it from scratch?

Hopefully not, if you try to run JDP and there is no Redis instance on
your local machine, it will create one. Redis compresses the data fairly
well, so replication from the central data cache is quite quick.

Redis has quite simple replication however so I am not sure how well it
will scale, but I have tried to decouple most of JDP from Redis. So
another key-value store could be used which supports partial
replication. Alternatively we could move old data to an archive database
which is not replicated by default.

> (Does openqa have a bulk
> export ability? I know for SQUAD (LKFT's result tracker), we have to run
> millions of REST queries to get everything).

Nope. I check for new test IDs and refresh tests where the status or
comments might have changed. Then do a REST query for each 'job' which
may contain many test results. OpenQA does have RabbitMQ integration
(push notifications) which we could monitor to avoid much of the
refresh.

It would be better if OpenQA submitted all its job results to a central
service just for collecting and distributing test results efficiently,
but that is another matter.

>
> Your usage of jupyter notebooks is interesting. I've played with them and
> wondered if they were worth investing our time in. What problems do they solve
> for you and how have they helped your team?

They are good for making graphical reports quickly and interactively. I
think I am the only person using them interactively though, the rest of
the team just looks at the static reports JDP generates with
them. Infact I think most of them only pay attention to the
notifications over Rocket Chat and the propagated bug tags.

So they solve the problem of allowing 'expert users' to jump into a
graphical report and start modifying it. It's just not clear if that was
a problem that needed solving. Although I personally have a tight
feedback loop when developing reports this way thanks to Jupyter.

>
> Why Julia? What's your and your colleagues' experience with it? What
> does the learning curve look like? We're mostly a python team and I
> hesitate to introduce new languages without enough advantage to cover
> the overhead. I guess the provocative way to ask is, why should a python
> shop consider julia?

Python would have been the default choice for this project given its
domain, but I am very unhappy with its type system (or lack of),
performance and the style of OOP it allows (e.g. long inheritance
chains and tightly coupling data with code, although I could avoid that
in my own code the same as I do in Perl).

I could probably solve some of the performance issues by writing parts of
JDP in C or a special version of Python. However after being exposed to
Rust and Haskell, I wouldn't start a new project in a language which
does not have a strong type system (except for C maybe, but that is
another matter) because they catch lots of bugs and document the
code. Julia code, where the types are correctly inferred or forced,
is almost as fast as C.

Also Julia is very good at manipulating Julia code in a structured way
(like Lisp). AFAIK this is not the case in Python where you would
manipulate the code as text or use a spinoff of Python (I investigated a
number of wierd Python variants, but I just didn't get a good feeling
from this).

Advanced meta-programming is irrelevant for JDPs code itself, but it is really
essential for things like the following:
https://julialang.org/blog/2018/12/ml-language-compiler which we might
decide to use in a JDP based report or workflow.

The learning curve is pretty shallow initially, because you can ignore
the type system and 'multi-methods' and just pretend you are writing
Python, but with different syntax. Then as soon as someone asks "how do
I make a class", then you have some issues. Still, it largely depends on
if you have been exposed to something like Scalar, Rust or Haskell.

I could go into a lot more detail, but I think this is a dangerous time
sink. There is plenty of good Julia propaganda on their website ;-)

>
> "Edit on Github" links don't work on https://palethorpe.gitlab.io/jdp/

yeah, I should remove that.

>
> I watched the video on youtube (thank you!!). I'm curious JDP fits into
> the overall workflow in SUSE. For example, how many users are using JDP
> and the jupyter notebooks directly, vs receiving reports from it? How
> does reporting automation work? I saw gitlab ci mentioned. Just
> wondering what sort of traction you're seeing and what sorts of
> friction.

Well, it seems that I have solved every "real" (and some imaginary)
internal reporting problems for our team sufficiently that nobody has
any interest in developing more JDP reports (although there is interest
in developing bug tag propagation further). So it is only me using it
directly, but I guess if I refuse to update propagation long enough then
someone else will do it.

It is only really the kernel QA team which uses it, partially because I
haven't presented it to many people yet, but also because it is not
clear to most people how they would reuse it or even what it does. The
main feedback I have is that people have no idea what they are looking
at or that I need to focus on the concrete problems it solves more.

JDP is run inside a bunch of Docker containers which are managed by
Gitlab CI. Technically any CI/CD software could be used to run it. JDP
is automatically deployed after each commit if the tests pass. The
reports are run on a regular basis by Gitlab and deployed as part of the
JDP docs to gitlab and github pages. Some of the reports talk to Rocket
Chat and OpenQA.

>
> A lot of these concepts are new to me and I'm trying to figure out how
> they could fit into the daily work of my team.
>
> Thanks,
> Dan
>
> Other notes:
>
> JDP
> - project from suse
> - https://github.com/richiejp/jdp
> - uses julia, jupyter notebooks
>   - jupyter - list of cells: code, markdown
>     - can run code cells
>     - result of last command of cell will be displayed
> - suse uses openqa for testing (suse project)
>   - monolithic framework ("bit of a beast") - runs machine or vm, sends
>     tests to them, aggregates results, etc.

Other teams use Jenkins and other stuff as well.

> - tracker support for openqa, bugzilla
>   - fetch test results from openqa
>   - could have a SQUAD tracker
>   - will be able to have normalized test result objects to abstract out
>     the result source (such as squad vs openqa)
>     - this would let us aggregate tests across various projects
>
>>
>> --
>> Thank you,
>> Richard.
>> --
>> _______________________________________________
>> automated-testing mailing list
>> automated-testing at yoctoproject.org
>> https://lists.yoctoproject.org/listinfo/automated-testing

--
Thank you,
Richard.