[Automated-testing] Modularization project - parser

Mon Nov 19 03:03:08 PST 2018

On Fri, 16 Nov 2018 at 23:41, <Tim.Bird at sony.com> wrote:
>
> Hey everyone,
>
> One thing that I think we do OK at, conceptually, in Fuego is
> our parser.  Our architecture allows for
> simple regular-expression-based parsers, as well as
> arbitrarily complex parsing in the form of a python program.
>
> The parser is used to transform output from a test program
> into a set of testcase results, test measurement results (for benchmarks)
> and can optionally split the output into chunks so that additional
> information (e.g. diagnostic data) can be displayed from our
> visualization layer (which is currently Jenkins), on a testcase-by-testcase
> basis.
>
> It has multiple outputs, including a json format that is usable
> with KernelCI, as well as some charting outputs suitable for
> use with a javascript plotting library (as well as HTML tables),
> and a Fuego-specific flat text file used for aggregated results.
>
> This is all currently integrated into the core of Fuego.  However,
> we have been discussing breaking it out and making a standalone
> testlog parser project.
>
> I envision something that takes a test's testlog, and the run meta-data,
> and spits out results in multiple formats (junit, xunit, kernelci json, etc.)
>
> From the survey, I noted that some systems prefer it if the tests
> instrument their output with special tokens.  I'd like to gather up
> all the different token sets, and if possible have the parser autodetect
> the type of output it's processing, so this is all seamless and requires
> little or no configuration on the part of the test framework or end user.

What is meant by output type here? The output of the test operation
itself or the presentation of the test operation to the process which
creates the test results in the database?

Tokens can actually be internal implementations of an API which is not
designed to be exposed to test writers. Test operations should not
creep into that space by emitting the tokens themselves, or as with
any internal API, future code changes could invalidate such a parser
without affecting tests which are compliant with the API. The LAVA
tokens which are emitted by lava-test-case and other scripts in the
LAVA overlay are not part of the LAVA API for reporting results and
are not to be misused by test writers - calls from a POSIX environment
need to be made by executing lava-test-case and related scripts
directly. LAVA has a variety of different test behaviours, covering
POSIX shells and IoT monitors. We are in the process of supporting an
interactive test action which can be used for non-POSIX shells like
bootloaders, RTOS and UEFI test operations. We are interested in ideas
for harnessing test output directly and we've looked at this a few
times.

Common problems include:

1. Many test outputs "batch up" the output with a summary near (but
not at) the actual failure and a traceback right at the bottom of the
output with really useful data on what went wrong. Often, several
different tracebacks are output in the one block. Python unittests and
pytest are common examples of this. It makes life very difficult for
test parsers because the point in the test log where the failure is
reported is nowhere near the *data* that any bug report would need to
include to enable any developer to fix the failure. This is also a
common problem with compilers - the compilation failure does not
always include relevant information of what preceded the failed call.
Sometimes the erroneous inclusion of a previous (successful) step
triggers a failure later on or a bug in the configuration processing
means that a later assumption is invalidated. So the parser has
essentially failed because the triager needs the full test log output
anyway to do manual parsing. Care is needed to manage expectations
around any such automation.

2. Many test outputs do not "bookend" their output. Some will put out
a header at the top but many do not put a unique message at the end,
so in a test log containing multiple different test blocks in series,
it can be hard for the parser to know *which* output occurs where.
Often the header is not unique between different runs of the same test
with different parameters. So when a test job runs 50 test runs,
changing parameters on each run to test different sections of the
overall support, the parser has no way to know if the output is from
test run 3 or test run 46. Additionally, some test operations re-order
the tests themselves during optimisation - e.g parallelisation.
Without specialist test writer knowledge, the parser will fail. The
parameterisation of such optimisations can also be caused by changes
in the test job submission, not within the test output itself. So
without direct input from the test writer into how the results are
picked out of the test output, an automated parser would fail.

3. Test writers need to be able to write their own test operations
which do not comply with any known test output parser and have the
ability to execute a subprocess (within a compiled test application)
which does the reporting of the result to the framework directly.

4. Many test operations do not occur on the DUT but remotely through
protocols (like adb) and the "test output" is completely irrelevant to
the test result as it's actually the output of pushing and pulling the
test operation itself, i.e. identical for every test result. The
result is then the return code of each protocol command and needs to
be independently associated with the name which the test writer
associates with the protocol command.

These are some reasons why test operations solely based on tokens fail
with many general purpose test operations. LAVA uses tokens internally
for the POSIX test actions but the API is actually to execute a shell
script on the DUT which handles the "bookending". The batching up
problem can only be handled by a custom test output shim which is, as
yet, unwritten for most affected test operations.

So, rather than using patterns, there are some cases where scripts
must be executed on the DUT to handle the inconsistencies of the test
output itself.

> Also, from the test survey, I noted that some systems use a declarative
> style for their parser (particularly Phoronix Test Suite).  I think it would
> be great to support both declarative and imperative models.
>
> It would be good, IMHO, if the parser could also read in results from
> any known results formats, and output in another format, doing
> the transformation between formats.

I think this is overly ambitious and would need to be restricted to
*compliant* formats. Additionally, many formats do not have 100%
equivalence of the type and range of data which can be expressed,
meaning that many conversions will be lossy and therefore
unidirectional.

> Let me know if I'm duplicating something that's already been done.
> if not, I'll try to start a wiki page, and some discussions around the
> design and implementation of this parser (results transformation engine)
> on this list.
>
> Particularly I'd like to hear people's requirements for such a tool.

1. Which parts, if any, must be executed on the DUT? Which parts are
not executed on the DUT?

2. Is this parser going to prevent operation with non-compliant test
output, e.g. where bookending is impossible or where batching is not
possible to handle reliably?

3. What are the requirements for executing the parser? What language?

4. How is the parser to cope with output where the parser patterns are
entirely determined by the test writer according to strings embedded
in the test software itself? (e.g. IoT)

5. How is the parser expected to cope with iterrations of test output
where the loops are outside it's control?

6. How is the parser to cope with test operations which do not produce
any parseable output at all but which rely on exit codes of binary
tools?

>
> Regards,
>  -- Tim
>
> --
> _______________________________________________
> automated-testing mailing list
> automated-testing at yoctoproject.org
> https://lists.yoctoproject.org/listinfo/automated-testing

-- 

Neil Williams
=============
neil.williams at linaro.org
http://www.linux.codehelp.co.uk/