[Automated-testing] Structured feeds
Konstantin Ryabitsev
konstantin at linuxfoundation.org
Wed Nov 6 12:50:51 PST 2019
On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
>This is an non-trivial problem, fwiw. Patchwork's email parser clocks
>in
>at almost thirteen hundred lines, and that's with the benefit of the
>Python standard library. It also regularly gets patched to handle
>changes to email systems (e.g. DMARC), changes to git (git request-pull
>format changed subtly in 2.14.3), the bizzare ways people send email,
>and so on.
I'm actually very interested in seeing patchwork switch from being fed
mail directly from postfix to using public-inbox repositories as its
source of patches. I know it's easy enough to accomplish as-is, by
piping things from public-inbox to parsemail.sh, but it would be even
more awesome if patchwork learned to work with these repos natively.
The way I see it:
- site administrator configures upstream public-inbox feeds
- a backend process clones these repositories
- if it doesn't find a refs/heads/json, then it does its own parsing
to generate a structured feed with patches/series/trailers/pull
requests, cross-referencing them by series as necessary. Something
like a subset of this, excluding patchwork-specific data:
https://patchwork.kernel.org/api/1.1/patches/11177661/
- if it does find an existing structured feed, it simply uses it (e.g.
it was made available by another patchwork instance)
- the same backend process updates the repositories from upstream using
proper manifest files (e.g. see
https://lore.kernel.org/workflows/manifest.js.gz)
- patchwork projects then consume one (or more) of these structured
feeds to generate the actionable list of patches that maintainers can
use, perhaps with optional filtering by specific headers (list-id,
from, cc), patch paths, keywords, etc.
Basically, parsemail.sh is split into two, where one part does feed
cloning, pulling, and parsing into structured data (if not already
done), and another populates actual patchwork project with patches
matching requested parameters.
I see the following upsides to this:
- we consume public-inbox feeds directly, no longer losing patches due
to MTA problems, postfix burps, parse failures, etc
- a project can have multiple sources for patches instead of being tied
to a single mailing list
- downstream patchwork instances (the "local patchwork" tool I mentioned
earlier) can benefit from structured feeds provided by
patchwork.kernel.org
>Patchwork does expose much of this as an API, for example for patches:
>https://patchwork.ozlabs.org/api/patches/?order=-id so if you want to
>build on that feel free. We can possibly add data to the API if that
>would be helpful. (Patches are always welcome too, if you don't want to
>wait an indeterminate amount of time.)
As I said previously, I may be able to fund development of various
features, but I want to make sure that I properly work with upstream.
That requires getting consensus on features to make sure that we don't
spend funds and efforts on a feature that gets rejected. :)
Would the above feature (using one or more public-inbox repositories as
sources for a patchwork project) be a welcome addition to upstream?
-K
More information about the automated-testing
mailing list