[Automated-testing] Structured feeds

Fri Nov 8 06:52:57 PST 2019

On Fri, Nov 08, 2019 at 09:05:02AM +0100, Dmitry Vyukov wrote:
> On Thu, Nov 7, 2019 at 9:53 PM Don Zickus <dzickus at redhat.com> wrote:
> >
> > On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> > > Hi,
> > >
> > > This is another follow up after Lyon meetings. The main discussion was
> > > mainly around email process (attestation, archival, etc):
> > > https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> > >
> > > I think providing info in a structured form is the key for allowing
> > > building more tooling and automation at a reasonable price. So I
> > > discussed with CI/Gerrit people and Konstantin how the structured
> > > information can fit into the current "feeds model" and what would be
> > > the next steps for bringing it to life.
> > >
> > > Here is the outline of the idea.
> > > The current public inbox format is a git repo with refs/heads/master
> > > that contains a single file "m" in RFC822 format. We add
> > > refs/heads/json with a single file "j" that contains structured data
> > > in JSON format. 2 separate branches b/c some clients may want to fetch
> > > just one of them.
> > >
> > > Current clients will only create plain text "m" entry. However, newer
> > > clients can also create a parallel "j" entry with the same info in
> > > structured form. "m" and "j" are cross-referenced using the
> > > Message-ID. It's OK to have only "m", or both, but not only "j" (any
> > > client needs to generate at least some text representation for every
> > > message).
> >
> > Interesting idea.
> >
> > One of the nuisances of email is the client tools have quirks.  In Red Hat,
> > we have used patchworkV1 for quite a long time.  These email client 'quirks'
> > broke a lot of expectations in the database leading us to fix the tool and
> > manually clean up the data.
> >
> > In the case of translating to a 'j' file.  What happens if the data is
> > incorrectly translated due to client 'quirks'?  Is it expected the 'j' data
> > is manually reviewed before committing (probably not).  Or is it left alone
> > as-is? Or a follow-on 'j' change is committed?
> 
> Good point.
> I would expect that eventually there will be updates to the format and
> new version. Which is easy to add to json with "version":2 attribute.
> Code that parses these messages will need to keep quirks for older
> formats.
> Realistically nobody will review the data (besides the initial
> testing). I guess in the end it depends on (1) how bad it's screwed,
> (2) if correct data is preserved in at least some form or not
> (consider a client pushes bad structured data, but it's also
> misrepresented in the plain text form, or simply missing there).
> Fixing up data later is not possible. Appending corrections is possible.

Ok.  Yeah, in my head I was thinking the data is largely right, just
occasionally 1 or 2 fields was misrepresented due to bad client tool or
human error in the text.

In Red Hat was use internal metadata for checking our patches through our
process (namely Bugzilla id).  It isn't unusual for someone to accidentally
fat-finger the bugzilla id when posting their patch.

I was thinking if there is a follow-on 'type' that appends corrections as you
stated, say 'type: correction' that 'corrects the original data.  This would
have to be linked through message-id or some unique identifier.

Then I assume any tool that parses the feed 'j' would correlate all the data
based around some unique ids such that picking up corrections would just be
a natural extension?

Cheers,
Don

> 
> > A similar problem could probably be expanded to CI systems contributing their
> > data in some result file 'r'.
> 
> The idea is that all systems push "j'. It's the contents of the feed
> that matter. CI systems will push messages of different types (test
> results), but we don't need "r" for this.
> 
> > Cheers,
> > Don
> >
> > >
> > > Currently we have public inbox feeds only for mailing lists. The idea
> > > is that more entities will have own "private" feeds. For example, each
> > > CI system, static analysis system, or third-party code review system
> > > has its own feed. Eventually people have own feeds too. The feeds can
> > > be relatively easily converted to local inbox, important into GMail,
> > > etc (potentially with some filtering).
> > >
> > > Besides private feeds there are also aggregated feeds to not require
> > > everybody to fetch thousands of repositories. kernel.org will provide
> > > one, but it can be mirrored (or build independently) anywhere else. If
> > > I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> > > creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> > > then the aggregated feed will map these to the following branches:
> > > refs/heads/github.com/dvyukov/kfeed/master
> > > refs/heads/github.com/dvyukov/kfeed/json
> > > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> > > refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> > > Standardized naming of sub-feeds allows a single repo to host multiple
> > > feeds. For example, github/gitlab/gerrit bridge could host multiple
> > > individual feeds for their users.
> > > So far there is no proposal for feed auto-discovery. One needs to
> > > notify kernel.org for inclusion of their feed into the main aggregated
> > > feed.
> > >
> > > Konstantin offered that kernel.org can send emails for some feeds.
> > > That is, normally one sends out an email and then commits it to the
> > > feed. Instead some systems can just commit the message to feed and
> > > then kernel.org will pull the feed and send emails on user's behalf.
> > > This allows clients to not deal with email at all (including mail
> > > client setup). Which is nice.
> > >
> > > Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> > > blob's right into feeds. This would allow users to fetch only the
> > > blobs they are interested in. But this does not need to happen from
> > > day one.
> > >
> > > As soon as we have a bridge from plain-text emails into the structured
> > > form, we can start building everything else in the structured world.
> > > Such bridge needs to parse new incoming emails, try to make sense out
> > > of them (new patch, new patch version, comment, etc) and then push the
> > > information in structured form. Then e.g. CIs can fetch info about
> > > patches under review, test and post strctured results. Bridging in the
> > > opposite direction happens semi-automatically as CI also pushes text
> > > representation of results and that just needs to be sent as email.
> > > Alternatively, we could have a separate explicit converted of
> > > structured message into plain text, which would allow to remove some
> > > duplication and present results in more consistent form.
> > >
> > > Similarly, it should be much simpler for Patchwork/Gerrit to present
> > > current patches under review. Local mode should work almost seamlessly
> > > -- you fetch the aggregated feed and then run local instance on top of
> > > it.
> > >
> > > No work has been done on the actual form/schema of the structured
> > > feeds. That's something we need to figure out working on a prototype.
> > > However, good references would be git-appraise schema:
> > > https://github.com/google/git-appraise/tree/master/schema
> > > and gerrit schema (not sure what's a good link). Does anybody know
> > > where the gitlab schema is? Or other similar schemes?
> > >
> > > Thoughts and comments are welcome.
> > > Thanks
> > > --
> > > _______________________________________________
> > > automated-testing mailing list
> > > automated-testing at yoctoproject.org
> > > https://lists.yoctoproject.org/listinfo/automated-testing
> >