[Automated-testing] Structured feeds

Don Zickus dzickus at redhat.com
Fri Nov 8 07:26:09 PST 2019


On Fri, Nov 08, 2019 at 08:58:44AM +0100, Dmitry Vyukov wrote:
> On Thu, Nov 7, 2019 at 9:44 PM Don Zickus <dzickus at redhat.com> wrote:
> >
> > On Thu, Nov 07, 2019 at 02:35:08AM +1100, Daniel Axtens wrote:
> > > > As soon as we have a bridge from plain-text emails into the structured
> > > > form, we can start building everything else in the structured world.
> > > > Such bridge needs to parse new incoming emails, try to make sense out
> > > > of them (new patch, new patch version, comment, etc) and then push the
> > > > information in structured form. Then e.g. CIs can fetch info about
> > >
> > > This is an non-trivial problem, fwiw. Patchwork's email parser clocks in
> > > at almost thirteen hundred lines, and that's with the benefit of the
> > > Python standard library. It also regularly gets patched to handle
> > > changes to email systems (e.g. DMARC), changes to git (git request-pull
> > > format changed subtly in 2.14.3), the bizzare ways people send email,
> > > and so on.
> >
> > Does it ever make sense to just use git to do the translation to structured
> > json?  Git has similar logic and can easily handle its own changes.  Tools
> > like git-mailinfo and git-mailsplit probably do a good chunk of the
> > work today.
> >
> > It wouldn't pull together series info.
> 
> Hi Don,
> 
> Could you elaborate? What exactly do you mean? I don't understand the
> overall proposal.

The problem I was looking at was, patchwork has this large elaborate python
code to translate human git formatted patches into some structured form.
And rightfully so.

But git has similar code in order to make git-am work.

When applying an email to public-inbox, I had assumed it was using a tool
like git-am that would call into git-mailsplit and git-mailinfo to split
apart the email into various pieces and put them in .git/rebase-apply.

At that point most of the text parsing is done.

So the thought was to have another public-inbox tool that took advantage of
the already split data and just take the small step to finish converting
into a structured file 'j'.  As opposed to sending the text email through an
external tool like patchwork to re-split the data into structured pieces
again.

Then adding to that thought was, every time git changed its format or text
output, instead of updating external tools, just leverage git's existing
knowledge of the change (assuming public-inbox used the latest git tool
consistently) would reduce the ripple effect of having to update all
external tools before developers can utilize new git features or changes.

But looking through the public-inbox code, it appears to do things
differently, so the idea may not work at all.

So just treat my idea as looking at the problem from a different angle to
see if there is an easier solution.

Cheers,
Don



More information about the automated-testing mailing list