[Automated-testing] Structured feeds

Thu Nov 7 12:53:04 PST 2019

On Tue, Nov 05, 2019 at 11:02:21AM +0100, Dmitry Vyukov wrote:
> Hi,
> 
> This is another follow up after Lyon meetings. The main discussion was
> mainly around email process (attestation, archival, etc):
> https://lore.kernel.org/workflows/20191030032141.6f06c00e@lwn.net/T/#t
> 
> I think providing info in a structured form is the key for allowing
> building more tooling and automation at a reasonable price. So I
> discussed with CI/Gerrit people and Konstantin how the structured
> information can fit into the current "feeds model" and what would be
> the next steps for bringing it to life.
> 
> Here is the outline of the idea.
> The current public inbox format is a git repo with refs/heads/master
> that contains a single file "m" in RFC822 format. We add
> refs/heads/json with a single file "j" that contains structured data
> in JSON format. 2 separate branches b/c some clients may want to fetch
> just one of them.
> 
> Current clients will only create plain text "m" entry. However, newer
> clients can also create a parallel "j" entry with the same info in
> structured form. "m" and "j" are cross-referenced using the
> Message-ID. It's OK to have only "m", or both, but not only "j" (any
> client needs to generate at least some text representation for every
> message).

Interesting idea.

One of the nuisances of email is the client tools have quirks.  In Red Hat,
we have used patchworkV1 for quite a long time.  These email client 'quirks'
broke a lot of expectations in the database leading us to fix the tool and
manually clean up the data.

In the case of translating to a 'j' file.  What happens if the data is
incorrectly translated due to client 'quirks'?  Is it expected the 'j' data
is manually reviewed before committing (probably not).  Or is it left alone
as-is? Or a follow-on 'j' change is committed?

A similar problem could probably be expanded to CI systems contributing their
data in some result file 'r'.

Cheers,
Don

> 
> Currently we have public inbox feeds only for mailing lists. The idea
> is that more entities will have own "private" feeds. For example, each
> CI system, static analysis system, or third-party code review system
> has its own feed. Eventually people have own feeds too. The feeds can
> be relatively easily converted to local inbox, important into GMail,
> etc (potentially with some filtering).
> 
> Besides private feeds there are also aggregated feeds to not require
> everybody to fetch thousands of repositories. kernel.org will provide
> one, but it can be mirrored (or build independently) anywhere else. If
> I create https://github.com/dvyukov/kfeed.git for my feed and Linus
> creates git://git.kernel.org/pub/scm/linux/kernel/git/dvyukov/kfeed.git,
> then the aggregated feed will map these to the following branches:
> refs/heads/github.com/dvyukov/kfeed/master
> refs/heads/github.com/dvyukov/kfeed/json
> refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/master
> refs/heads/git.kernel.org/pub/scm/linux/kernel/git/torvalds/kfeed/json
> Standardized naming of sub-feeds allows a single repo to host multiple
> feeds. For example, github/gitlab/gerrit bridge could host multiple
> individual feeds for their users.
> So far there is no proposal for feed auto-discovery. One needs to
> notify kernel.org for inclusion of their feed into the main aggregated
> feed.
> 
> Konstantin offered that kernel.org can send emails for some feeds.
> That is, normally one sends out an email and then commits it to the
> feed. Instead some systems can just commit the message to feed and
> then kernel.org will pull the feed and send emails on user's behalf.
> This allows clients to not deal with email at all (including mail
> client setup). Which is nice.
> 
> Eventually git-lfs (https://git-lfs.github.com) may be used to embed
> blob's right into feeds. This would allow users to fetch only the
> blobs they are interested in. But this does not need to happen from
> day one.
> 
> As soon as we have a bridge from plain-text emails into the structured
> form, we can start building everything else in the structured world.
> Such bridge needs to parse new incoming emails, try to make sense out
> of them (new patch, new patch version, comment, etc) and then push the
> information in structured form. Then e.g. CIs can fetch info about
> patches under review, test and post strctured results. Bridging in the
> opposite direction happens semi-automatically as CI also pushes text
> representation of results and that just needs to be sent as email.
> Alternatively, we could have a separate explicit converted of
> structured message into plain text, which would allow to remove some
> duplication and present results in more consistent form.
> 
> Similarly, it should be much simpler for Patchwork/Gerrit to present
> current patches under review. Local mode should work almost seamlessly
> -- you fetch the aggregated feed and then run local instance on top of
> it.
> 
> No work has been done on the actual form/schema of the structured
> feeds. That's something we need to figure out working on a prototype.
> However, good references would be git-appraise schema:
> https://github.com/google/git-appraise/tree/master/schema
> and gerrit schema (not sure what's a good link). Does anybody know
> where the gitlab schema is? Or other similar schemes?
> 
> Thoughts and comments are welcome.
> Thanks
> -- 
> _______________________________________________
> automated-testing mailing list
> automated-testing at yoctoproject.org
> https://lists.yoctoproject.org/listinfo/automated-testing