[yocto] What's this

Wed Jun 8 04:30:36 PDT 2016

On Wed, 2016-06-08 at 20:59 +1200, Paul Eggleton wrote:
> On Tue, 07 Jun 2016 23:20:26 Richard Purdie wrote:
> > On Wed, 2016-06-08 at 09:24 +1200, Paul Eggleton wrote:
> > > On Tue, 07 Jun 2016 17:20:12 Burton, Ross wrote:
> > > > On 7 June 2016 at 17:02, Burton, Ross <ross.burton at intel.com>
> > > > wrote:
> > > > > It means the hash calculated my the bitbake master was
> > > > > different
> > > > > to the hash calculated when the worker started up.  This
> > > > > usually means
> > > > > that you're using something like ${TIME} in the recipe but
> > > > > not marking 
> > > > > it appropriatly so the cache ignores it.  Do you have a base
> > > > > -files
> > > > > bbappend that writes a timestamp?
> > > > 
> > > > The always wise Joshua reminds me that if your DISTRO_VERSION
> > > > contains ${DATETIME} then this happens.  If you're doing this
> > > > then
> > > > you'll want to set [vardepsexclude] on DISTRO_VERSION to stop
> > > > the
> > > > DATETIME from getting into the cache (or not put the current
> > > > date/time
> > > > into the distro version).
> > > 
> > > I think we need to handle this situation better - if it's really
> > > worth producing an error about then it's worth producing an error
> > > message
> > > that people can actually understand, particularly as it's
> > > recently added
> > > validation.
> > 
> > It was silently running into problems due to this all along but not
> > reporting it. Its now reporting it which is better than silently
> > things
> > behaving strangely.
> 
> What are the actual consequences in a situation like this where we 
> have something like ${DATETIME} in another variable referenced by the
> task?

A mismatch means the cached internal values bitbake is using on the
cooker/server side don't match what the worker is actually doing. Its
hard to say for sure exactly what impact that would have.

If for example its in a do_build task, it means that some invalid data
may be written out to the do_build sigdata file and confuse later build
analysis based on those files. You can usually tell such a file by the
fact its named with one hash name but when you compute the value from
the entries in the file, you get a different hash.

If its a more important task like do_install, it could mean something
doesn't rerun when it should, or vice versa and the effect of a corrupt
sidinfo/sigdata is more significant.

With the changes to error in these cases, we get to know about cases of
mismatch and we went and fixed the majority of causes of these. I
believe those mismatches were in part at fault for things like diffsigs
not working as expected in some cases.

> I haven't looked at all of the code involved (and that's probably the
> root of my problem) but I don't quite understand how this is coming
> about. DATE and TIME are supposed to be determined at the start of
> the build and explicitly passed to the worker, so they don't change
> during the build. What am I missing?

Bitbake computes the hashes at parse time and caches them. At run time,
it then builds a new data store in the worker. These errors come about
when the values in the cache from parsing don't match the runtime
execution environment.

> > Its very hard for bitbake to know why the hashes differ, it only
> > knows
> > the values afterwards and hence that they've changed, the
> > information
> > about how that hash was constructed is not present in any of
> > bitbake's
> > caches. That implies to have better messages we need to write out
> > more
> > data.
> > 
> > I did add a patch to make bitbake write out data to allow
> > reconstruction of basehash (which is part of taskhash). Sadly the
> > parsing performance was diabolical (10 times slower). I think that
> > could perhaps be improved if the files don't require an atomic move
> > during creation but I haven't had time to look further at it.
> > 
> > So whilst I do agree, what is the price people are willing to pay
> > to
> > have those better messages?
> 
> Clearly a 10x slowdown is unacceptable, hopefully we can find another
> way of 
> dealing with this. I guess if we're able to do nothing else a brief 
> explanation of what to look for (i.e. variables that might change
> with time) 
> in the error message would be helpful, but I hope we can do better
> than that.

So do I, its a time problem on my part to look into it. As mentioned
above, I believe this problem was meaning diffsigs was unreliable and
it was having other unknown impact on builds rebuilding when they
shouldn't or vice versa. I believe those are important to fix. We need
better sstate hash debugging/telemetry but exactly how to do this
without performance impact remains to be seen. I'm continue to think on
it (along with too many other issues) but am open to ideas.

Cheers,

Richard