[Automated-testing] LTP and test metadata

Thu Aug 22 03:15:40 PDT 2019

Hi!
> Please check my summary and comments.
> 
> 1) Test dependency metadata is stored on each test case (.c file)
> inside a "tst_test" structure. Dependencies are specified through
> variables that start with "needs_" such as needs_root, needs_kconfigs,
> or needs_tmpdir.
> https://github.com/linux-test-project/ltp/blob/master/include/tst_test.h
> https://elinux.org/Test_Dependencies
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/mkdir/mkdir05.c
> 
> [Comment] perhaps it would be cleaner to separate metadata from actual
> data and functions. For example, by adding a new element to "tst_test"
> called "metadata" that points to an array of "struct tst_metadata".
> These tst_metadata structs would contain the "needs_xxx" configuration
> options.
>
> - Using an array would allow specifying slightly different
>   requirements depending on the test case variant or parameters.
> - Separating the metadata would also simplify parsing because you
>   would not need to filter out non-metadata elements.
> - Downside: the change might be a bit intrusive.

I would like to keep the simple requirements as a bit flags since it's
much easier to work with these than the metadata array.

We can probably move everything more complicated, e.g. needs_kconfigs to
the metadata though.

> [Comment] Probably we need to specify some requirements in a
> parametric way. For example: "this test needs 10MB*num_threads
> memory", where num_threads is a test case parameter or another
> element.
> 
> [Comment] Sometimes, you may need to specify requirements
> conditionally. For example, if you pass parameter "-s", then you need
> "root" permissions. The same can happen with test variants.

I'm aware of this, the problem here is that to make things flexible
enough we would need to be able to express quite complex structures.

So in the end we would probably need to embed json, toml or similar data
format or maybe even domain-specific language to express the
dependencies and doing it in a way that will not suck would be hard.

One problem here is that test requirements has to be available to the
test at runtime, which means that the test library will have to be able
to parse the test requirements as well. Which means that these has to be
accessible somewhere from the C structures.

For the start I would like to pass the requirements we already have to
the testrunner, once that is working we can proceed to moving test
variants into the testcases and add per-variant requirements as well.

> [Note] here I am assuming that you can pass parameters to the test
> cases (e.g.: size, number of times, units, number of threads etc..)
> and that the set of parameters could affect the
> dependencies/requirements.

Indeed they do, usually required amount of RAM scales with number of
processes/threads as you wrote already, required free diskspace varies
with test variants as well and I'm sure there are many more examples
like this.

> 2) Test writers can add additional metadata by writting comments in a
> specific format layout. The format layout is an open question.
> 
> [Comment] Could you describe the format that you support in your proof-of-concept?

At this point it's just plain text with sections that are started with a
string enclosed in brackets. There is nothing that parses the text in
any way so efectively there is no format.

What I have in mind is that there will be different sections encoded in
different formats and each section would have it's handler that would
parse the data. The test description would be in some kind of text
markup, the test variants may be in a json array, etc. But at this point
nothing is decided yet, as I said it's hard to come up with anything
unless we have consumers for the data.

> [Comment] It would be nice to have information such as:
> - regular expressions to identify well-known errors, and a string
>   explaining why that error may have occurred.

Interesting idea.

> - a string for each variant that explains what that variant is testing.

That is indeed the plan.

> - upstream commit ids that need to be in the kernel for the test to
>   pass (in particular for CVE tests this might be useful).

And this is already there, I do have a simple html table build from the
json file, see:

http://metan.ucw.cz/outgoing/metadata.html

> [Comment] It is necessary to distinguish what metadata needs to go to
> the data structures and which one needs to go to the documentation.
> Apart from the binary size, is there any reason not to put everything
> in the data structures (tst_test)?.

I was thinking of that and I do not like having the test description,
i.e. several paragraphs of markup text, stored in a C strings. It would
be nice to have this printed on the -h switch, but it's not strictly
required.

I even tried a version with a pre-proceesing step where I parsed all the
information from a comment and build a header with C structures that
would be included when a test is compiled, but that overly complicated
the build.

So in the end I settled down for a middle ground, which is having
requirements encoded in a C structures and documentation stored in a
comments. I'm not 100% decided on the current split though, if you have
a better idea I would like to hear it.

> 3) docparse: a parser program that extracts the metadata (needs_xxx
> variables and commented documentation) from each test case into a
> single JSON file.
> 
> [Comment] how big is that JSON file? would you create it on-the-fly
> including only the tests and variants that you want to run (e.g.:
> specified through a tag or a wildcard)? or would you create it with
> all possible tests even though some of them will not run?

At this point 52Kb I would expect that it will grow a bit when we add
more documentation to tests etc, but it would be still in a single
digits of megabytes, which I do consider small enough.

And given that I want to get rid of runtest files the idea is to build
the database of all tests during the LTP build and install the file
along with LTP. Then the testrunner will make subsets of tests based on
that file and on some information supplied by user, e.g. tag, wildcard,
etc. So the plan is that the testrunner will always get the full JSON.

> 4) The JSON file with metadata can be used for test runners/frameworks
> to skip test cases depending on the hardware or software limitations,
> dynamically select the board to execute a test, specify the test cases
> you want to run in a more flexible way (e.g. with tags or wildcards),
> or to create a report with possible failure reasons. It should also
> allow running some test cases in parallel.
> 
> [Comment] I would add an attribute to specify how many CPU cores a
> test case needs. 0: it can share the CPU(s) with other test cases
> (e.g., a simple, functional test), 1: it needs to be the only test
> case running on the CPU core (e.g., to check for a hardware
> vulnerability), 2+: it needs more than one CPU core (e.g., multi-core
> testing). This information could be useful when you want to
> parallelize the execution of the test cases. 

That is indeed planned along with other restrictions on global resources
such as RAM.

-- 
Cyril Hrubis
chrubis at suse.cz