[Automated-testing] test definitions shared library

Tue Jul 16 20:49:23 PDT 2019

> -----Original Message-----
> From: daniel.sangorrin at toshiba.co.jp
> 
> Hello Milosz,
> 

First, let me say that this analysis and breakdown of different elements is
very useful for the sake of discussion.  Thanks for putting down your thoughts here.

> Yesterday, during the monthly automated testing call [1], I mentioned that I
> had used Linaro test definitions' library [2] with a few additional functions [3]
> to run _some_ Fuego tests.
> 
> First of all, there is an easier way to run Fuego tests from LAVA definitions.
> That's the reason I stopped developing that adaptation layer. The easier way
> consists of installing the no-jenkins version of Fuego on the target filesystem
> (install-debian.sh). Once installed it works just as any other test runner and
> can be called from a LAVA yaml job (e.g. ftc run-test -t Functional.bzip2) as I
> showed during my presentation at Linaro connect
> (https://www.youtube.com/watch?v=J_Gor9WIr9g).

This approach will work for systems for systems that don't fall in the 
very low end.  And it's also a useful way for Fuego to be used when the 
device-under-test is one's own development machine (which will appeal
to some developers).  However, I want to make sure that Fuego doesn't
lose support for testing low end designs (and non-Linux designs).

I thought the general direction we were heading with Fuego/LAVA
integration was to put Fuego on a LAVA worker node, and not
on the target itself.

> 
> Having said that, I do see a benefit in modularizing the test definitions so that
> they can be re-used on different frameworks or without a framework at all.

Agreed.

> 
> From here on, I am going to try to explain what I think of a test definition
> standard as it comes out of my mind in brainstorming mode. Please bear with
> me. Hopefully, this will hit some areas of consensus and some others that
> require refining. There are many frameworks out there and i only know a
> small fraction of them so I am not sure if these ideas would work for all of
> them. I'd love to hear feedback from other projects.
> 
> 1) What is a test definition?
> 
> "meta-data and software that comprise a particular test" [1]
> 
> Well this sounds a bit abstract to me. There are several types of test
> definitions:
> 
>   * test wrappers: these are wrapper scripts that take care of building,
> deploying, executing and parsing an existing test. For example, IOzone is an
> existing test and [4] and [5] are the test definitions from Linaro and Fuego
> respectively. I think it's interesting to compare them and extract the
> similarities and differences. Both of them are able to build the IOzone test;
> Linaro is able to install build dependencies while Fuego assumes the
> toolchain has them; both execute the iozone binary, but only Fuego (in this
> particular case) allows specifying the parameters to pass to iozone; finally
> both parse the test output log. Fuego does all of this using a set of function
> callbacks (test_run, test_cleanup..) while Linaro test definition has no
> defined structure. For Linaro, parsing occurs on the target, while for Fuego it
> happens on the host.
> 
>   * test runner wrappers: these are wrappers scripts for test runners (e.g.,
> ltp, ptest, autopkgtest). Linaro test definitions can also be used as a test
> runner [6]. Here we can compare the ptest test runner from Linaro [7] and
> Fuego [8]. The one in Linaro is written in python, whereas the one in Fuego is
> a shell script and a python file for the parser. Both of them call ptest-runner,
> and then parse the output. Fuego converts the parsed output into Fuego's
> test results unified format (JSON) while Linaro converts the results into LAVA
> serial port messages.
> Note: there is a script created by Chase to convert a Fuego results JSON into
> LAVA serial port messages [9].
> 
>   * original test scripts: these are usually test scripts that call binaries on the
> target filesystem and check that they work as expected. For example, this is
> what Fuego busybox test [10] and Linaro busybox test [11] are doing. These
> type of scripts tend to be a bit shallow on the testing side, and are mostly a
> basic confirmation that the binaries work.

While some of these "original test scripts" are shallow, I think there are certain tests
for which this solution is very easily written and this is quite a useful
category of test.   These tests should be the easiest to write, and
can be written quickly to test basic distribution functionality.  For example,
it's very easy to write a test that checks that the filesystem contains 
required components (testing the install of a distribution), that permission
or file system attributes are correct, or that some required service is executing.

> 
> 2) Language
> 
> In Fuego, we are using shell scripts for most phases (dependencies, build,
> deploy, run, cleanup) and depending on the test we use python for parsing.
> If the test is simple enough we use a shell script.
> In Linaro, I think that you are using mostly shell scripts but sometimes
> python. Once thing that we could do is to check whether python is available
> and if it isn't provide a simpler parsing code based on awk/grep/etc. Another
> option for Linaro is to add python to the dependencies of the test. This
> would work on most modern Debian-based OS image builders such as Debos
> or ISAR.
> Finally, an alternative would be to use Go static binaries for the parsing
> phase. Go would work in Fuego (on the host side) and on Linaro (on the
> target side) unless you are using an OS or architecture not supported by Go
> [12].

I'm keen on keeping shell script as the language used for target-side test
execution instructions.  More on this later.

> 3) Directory structure
> 
> Fuego has a defined structure [13]
> - fuego_test.sh: script that contains the test wrapper phases (e.g build, run,
> ..)
> - optional
>    - parser.py: parses the ouput with the help of a library that produces a JSON
> file that complies with a schema very close to kernelci/squad schemes.
>    - source tarball
>    - test.yaml with metadata
>    - test specs: various combinations of parameters that should be tried
> 
> I think this structure can cover most needs. In Linaro, you also store the LAVA
> YAML job files however i think that those are not strictly necessary. Linaro
> YAML files have three main sections:
> 
> * os dependencies: these ones can be added to the test.sh's precheck phase
> * board dependencies: these depend on the lab and the boards you have so
> they should be handled outside of the test definitions.
> * execution instructions: all test definitions in Linaro seem to follow the same
> pattern, so you should be able to just abstract that into a template. For
> example:
>     steps:
>         - cd ./automated/linux/busybox/ <-- convert to automated/linux/$TEST
>         - ./busybox.sh <-- convert to ./test.sh
>         - ../../utils/send-to-lava.sh ./output/result.txt

Right now, Fuego's fuego_test.sh stores the test instructions in something
that looks like a shell script.  While the instructions themselves are expressed
as shell commands, in reality, it is not necessary to keep the instruction blocks
in shell script format.  Fuego has a phase where fuego_test.sh is read and
combined with other data to actually generate the shell script to execute.
We used to actually execute the test itself as a shell script, but support
for that was dropped a while ago.

We could migrate fuego_test.sh to a yaml format with relative ease, if we thought
that would buy us anything.  At the moment, I'm not sure what the benefit would
be.  But my point is that Fuego already does conversion of test instruction data
during a test run (to create the "prolog" file, so we already have the hook points to do this.

> 4) Output format
> 
> Tests already have several standards (junit, tap etc), but I think that all test
> frameworks end up creating a parser/converter that collects those
> "standard" logs and transform them into a unified per-framework format.
> For example, a JSON file in Fuego or a set of LAVA serial messages in Linaro.
> 
> In my super-biased opinion, the test definitions should produce a JSON file
> similar to what Squad/kernelci (or bigquery!?) can accept. And then, each
> framework can have some glue code to convert that JSON file into their
> preferred format (for example, a JSON to LAVA script).

I've been thinking about this, and I think there are two separate outputs to
consider here:
 - 1) output from the test program - this is usually line-based (but not always)
and often is intended to be human-readable.  It is often not structured very 
well for data extraction.
 - 2) output and storage format for the test framework

I don't think we can dictate the format of 1, although it would be good for us to 
choose a format and recommend that for use with new tests.  To some extent
we've used TAP13 for that, but in my own experimentation, I've come to the conclusion
that TAP13 is inadequate for certain test situations.

>From what I can tell, Linaro tests augment 1 to make it easier to convert into a
organized storage format, which can be done either on-DUT or off-DUT.
They do this in real-time while the test is running.

The second format is the format produced by the test framework.  In Fuego this
is how the data is stored, and the canonical form for results data.  We can take this
and convert it to Squad/KernelCI data.

I think that we should keep 1 and 2 separate.  I wouldn't recommend converting
tests to produce json data natively.  I wouldn't discourage it, but I think most developers
are not going to want to do the extra work to structure their result data that way.

> 
> 5) Test naming
> 
> I think we should adopt the nomenclature introduced by Squad which
> includes test results (pass, fail,..) and metrics (bps, I/O ops/sec etc..)
> 
> {
>   "test1": "pass",
>   "test2": "pass",
>   "testsuite1/test1": "pass",
>   "testsuite1/test2": "fail",
>   "testsuite2/subgroup1/testA": "pass",
>   "testsuite2/subgroup2/testA": "pass",
>   "testsuite2/subgroup2/testA[variant/one]": "pass",
>   "testsuite2/subgroup2/testA[variant/two]": "pass"
> }
> 

My 'tguid' syntax uses '.' as the delimiter, but this is not critical.  I think having a path-like
syntax is very useful.  Right now we used a mutli-level nested object structure to
indicate the 'path' to the result, but internally Fuego also uses a line-based, flattened
format for certain data files we use for results visualization.  So we already have
code to convert between nodes and flat strings.  I'm agnostic about which one to use.
The structured nodes format potentially scales better, but it's a pain to generate and parse.
(But we're already doing it so that doesn't matter that much.)

> {
>   "v1": 1,
>   "v2": 2.5,
>   "group1/v1": [1.2, 2.1, 3.03],
>   "group1/subgroup/v1": [1, 2, 3, 2, 3, 1]
> }
I don't understand this format, with multiple result values for a test result identifier.

> 
> If we agree on that, then we can prepare a list of tests that adhere to such
> nomenclature. For example (just invented):
> 
> ltp/syscalls/madvice/madvice01
> busybox/cp/symbolic/sym01

Agreed.  It would be very nice to get agreement on the testcase name format and the
names themselves.

> 
> 6) Repository
> 
> Tests and test runners should be where they are already.
> But the standard set of test definitions can be shared, so it would be nice to
> put them on a separate directory.
> I propose gitlab.com because it has an easy to use CI system. We could
> create a new user (automated_testing) and then
> a project inside (test_definitions).
> Note: anybody here took this address?
> https://gitlab.com/automated_testing

This would be good.

> 
> [1] https://elinux.org/Automated_Testing
> [2] https://github.com/Linaro/test-
> definitions/blob/master/automated/lib/sh-test-lib
> [3] https://github.com/sangorrin/test-
> definitions/blob/master/automated/linux/fuego/fuego.sh
> [4] https://github.com/sangorrin/test-
> definitions/tree/master/automated/linux/iozone
> [5] https://bitbucket.org/fuegotest/fuego-
> core/src/master/engine/tests/Benchmark.IOzone/
> [6] https://bitbucket.org/fuegotest/fuego-
> core/src/next/tests/Functional.linaro/
> [7] https://github.com/sangorrin/test-
> definitions/tree/master/automated/linux/ptest
> [8] https://bitbucket.org/fuegotest/fuego-
> core/src/next/tests/Functional.ptest/
> [9] https://github.com/chase-qi/test-
> definitions/blob/master/automated/linux/fuego-multinode/parser.py
> [10] https://bitbucket.org/fuegotest/fuego-
> core/src/next/tests/Functional.busybox/
> [11] https://github.com/Linaro/test-
> definitions/tree/master/automated/linux/busybox
> [12] https://github.com/golang/go/wiki/MinimumRequirements
> [13] http://fuegotest.org/fuego-1.0/Test_definition
> [14] https://github.com/Linaro/squad/blob/master/doc/intro.rst#input-file-
> formats
> --
> _______________________________________________
> automated-testing mailing list
> automated-testing at yoctoproject.org
> https://lists.yoctoproject.org/listinfo/automated-testing