[yocto] [PATCH 4/4] update_layer.py: move layer validation to update.py (Performance improve)

Robert Yang liezhi.yang at windriver.com
Sun Apr 22 20:06:43 PDT 2018


Hi Paul,

Thanks for the reply, please see my comments inline.

On 04/23/2018 09:55 AM, Paul Eggleton wrote:
> Hi Robert
> 
> I like the performance improvements, thanks. A few comments below though.
> 
> On Wednesday, 18 April 2018 11:04:33 PM NZST Robert Yang wrote:
>> The utils.setup_django() costs a lot of time, but both update.py and
>> update_layer.py calls it, so move layer validation from update_layer.py to
>> update.py to avoid calling update_layer.py when possible can save a lot of
>> time.
>>
>> Now we don't have to call update_layer.py in the following cases:
>> * The branch doesn't exist
>> * The layer is already update to date on specified branch (when no
>>    reload)
>> * The layer dir or conf/layer.layer doesn't exist
>>
>> We can save up to 98% time in my testing:
>>
>> $ update.py -b master --nofetch [--fullreload]
>>
>>                     Before    Now       Reduced
>> No update:         276s      3.6s      98%
>> Partial update:    312s      87s       72%
>> Full repload:      1016s     980s      3%
>>
>> Note:
>> * All of the testing are based on --nofetch
>>
>> * "No update" means all layers on the branch is up-to-date, for
>>    example, when we run it twice, there is no update in the second run, so we
>>    only need about 3s now, which is the most common case when we use cron to run
>>    it per half an hour.
>>
>> * "Partly update" means part of the layers have been updated.
>>
>> * "Fullreload" means all of the layers have been updated.
>>
>> Signed-off-by: Robert Yang <liezhi.yang at windriver.com>
>> ---
>>   layerindex/update.py       | 79 +++++++++++++++++++++++++++++++++++++++++++++-
>>   layerindex/update_layer.py | 39 +++--------------------
>>   2 files changed, 82 insertions(+), 36 deletions(-)
>>
>> diff --git a/layerindex/update.py b/layerindex/update.py
>> index 44a7b90..2bb2df7 100755
>> --- a/layerindex/update.py
>> +++ b/layerindex/update.py
>> @@ -140,6 +140,15 @@ def fetch_repo(vcs_url, repodir, urldir, fetchdir, layer_name):
>>           logger.error("Fetch of layer %s failed: %s" % (layer_name, e.output))
>>           return (vcs_url, e.output)
>>   
>> +def print_subdir_error(newbranch, layername, vcs_subdir, branchdesc):
>> +    # This will error out if the directory is completely invalid or had never existed at this point
>> +    # If it previously existed but has since been deleted, you will get the revision where it was
>> +    # deleted - so we need to handle that case separately later
>> +    if newbranch:
>> +        logger.info("Skipping update of layer %s for branch %s - subdirectory %s does not exist on this branch" % (layername, branchdesc, vcs_subdir))
>> +    elif vcs_subdir:
>> +        logger.error("Subdirectory for layer %s does not exist on branch %s - if this is legitimate, the layer branch record should be deleted" % (layername, branchdesc))
>> +
>>   def main():
>>       if LooseVersion(git.__version__) < '0.3.1':
>>           logger.error("Version of GitPython is too old, please install GitPython (python-git) 0.3.1 or later in order to use this script")
>> @@ -195,7 +204,7 @@ def main():
>>   
>>       utils.setup_django()
>>       import settings
>> -    from layerindex.models import Branch, LayerItem, Update, LayerUpdate
>> +    from layerindex.models import Branch, LayerItem, Update, LayerUpdate, LayerBranch
>>   
>>       logger.setLevel(options.loglevel)
>>   
>> @@ -337,6 +346,74 @@ def main():
>>                           collections.add((layerbranch.collection, layerbranch.version))
>>   
>>                   for layer in layerquery:
>> +                    layerbranch = layer.get_layerbranch(branch)
>> +                    branchname = branch
>> +                    branchdesc = branch
>> +                    newbranch = False
>> +                    branchobj = utils.get_branch(branch)
>> +                    if layerbranch:
>> +                        if layerbranch.actual_branch:
>> +                            branchname = layerbranch.actual_branch
>> +                            branchdesc = "%s (%s)" % (branch, branchname)
>> +                    else:
>> +                        # LayerBranch doesn't exist for this branch, create it
>> +                        newbranch = True
>> +                        layerbranch = LayerBranch()
>> +                        layerbranch.layer = layer
>> +                        layerbranch.branch = branchobj
>> +                        layerbranch_source = layer.get_layerbranch(branchobj)
>> +                        if not layerbranch_source:
>> +                            layerbranch_source = layer.get_layerbranch(None)
>> +                        if layerbranch_source:
>> +                            layerbranch.vcs_subdir = layerbranch_source.vcs_subdir
>> +
> 
> It seems that you are missing a call somewhere to layerbranch.save() later
> on, guarded with "if newbranch:". This was easy to miss though when copying
> over from update_layer.py seeing as we were relying upon saving it later on
> there (and that is still needed).

We don't need it to save the new branch here, we just want to know whether it is
a new branch or not, the logic is:

if newbranch and "the branch doesn't exist" in the repo:
     continue
else: # "the branch exits in the repo"
     update_layer.py

So we don't have to save the new branch here, the update_layer.py will save the
new branch when needed, this can reducing a lot of calls to update_layer, which
can save a lot of time.

> 
>> +                    if layerbranch.vcs_last_rev == topcommit.hexsha and not update.reload:
>> +                        logger.info("Layer %s is already up-to-date for branch %s" % (layer.name, branchdesc))
>> +                        collections.add((layerbranch.collection, layerbranch.version))
>> +                        continue
>> +
>> +                    if layerbranch.vcs_last_rev != topcommit.hexsha or update.reload:
>> +                        # Check out appropriate branch
>> +                        if not options.nocheckout:
>> +                            utils.checkout_layer_branch(layerbranch, repodir, logger=logger)
> 
> Isn't "if layerbranch.vcs_last_rev != topcommit.hexsha or update.reload:"
> superfluous here? We're skipping to the next layer if that wasn't true via
> "continue" above, so it should always evaluate to true.

Let's ignore the update.reload here, then the logic will be clear:

if layerbranch.vcs_last_rev == topcommit.hexsha:
     continue
else:
     checkout_layer_branch()

So we can't ignore the else block here, the "if" block doesn't cover the "else"
block. And If update.reload, then we can't continue, so the first "if" is:
"if layerbranch.vcs_last_rev == topcommit.hexsha and not update.reload"

And for the second "if", we need checkout the branch when:
"layerbranch.vcs_last_rev != topcommit.hexsha or update.reload"

And check whether there is a conf/layer.layer in the repo, we have a lot of
layers which are only for specific releases, for example, a layer is for morty
branch only, then we cleanup its master branch, just leave a README
(no conf/layer.conf) file on master branch which says this layer is for morty
only, check conf/layer.conf in update.py rather than in update_layer.py can
save a lot of time, too.

> 
> Additionally this results in the checkout being done twice - once here and
> a second time in the update_layer.py script. I guess it's not taking much time
> the second time though so we could probably ignore it.

Yes, you're right, the second checkout will be much faster, and it's worth to do
it in update.py since calling update_layer.py is very slow.

The update_layer.py is slow is because setup_django() is slow, and we call
setup_django() in both update.py and update_layer.py, so avoid calling
update_layer.py when possible can save a lot of time.

// Robert

> 
> Cheers,
> Paul
> 



More information about the yocto mailing list