[yocto] [layerindex-web][PATCH 4/5] update_layer.py: use it as a module

Paul Eggleton paul.eggleton at linux.intel.com
Sun Feb 4 15:27:45 PST 2018


Hi Robert,

On Wednesday, 3 January 2018 6:42:25 PM NZDT Robert Yang wrote:
> It had been split to a separate tool and use subprocess to run it to avoid
> tinfoil issues (e.g., can't be shutdown correctly in old version), but the
> utils.setup_django() cost a lot of time which made it a little slow, use
> mulitprocessing.Process() to run it as function can avoid both issues (tinfoil
> and slow issues).
> 
> And remove bitbake modules when swith branch since it is may not be compatible
> between branches.
> 
> This can save about 5 minutes in my testing when "update.py -b <branch>",
> with parallel fetch and this, we can reduce the time from more than 9mins to
> less than 3 mins.
> 
> Signed-off-by: Robert Yang <liezhi.yang at windriver.com>
> ---
>  layerindex/update.py       | 115 +++++++++++++++++++--------------------------
>  layerindex/update_layer.py | 103 +++++++++-------------------------------
>  2 files changed, 70 insertions(+), 148 deletions(-)
> 
> diff --git a/layerindex/update.py b/layerindex/update.py
> index d6488f0..e13c9ab 100755
> --- a/layerindex/update.py
> +++ b/layerindex/update.py
> @@ -21,6 +21,9 @@ import utils
>  import operator
>  import re
>  import multiprocessing
> +import update_layer
> +import copy
> +import imp
>  
>  import warnings
>  warnings.filterwarnings("ignore", category=DeprecationWarning)
> @@ -34,61 +37,15 @@ except ImportError:
>      logger.error("Please install PythonGit 0.3.1 or later in order to use this script")
>      sys.exit(1)
>  
> -
> -def reenable_sigint():
> -    signal.signal(signal.SIGINT, signal.SIG_DFL)
> -
> -def run_command_interruptible(cmd):
> -    """
> -    Run a command with output displayed on the console, but ensure any Ctrl+C is
> -    processed only by the child process.
> -    """
> -    signal.signal(signal.SIGINT, signal.SIG_IGN)
> +def call_update_layer(options, queue):
> +    ret = 1
> +    output = ""
>      try:
> -        process = subprocess.Popen(
> -            cmd, cwd=os.path.dirname(sys.argv[0]), shell=True, preexec_fn=reenable_sigint, stdout=subprocess.PIPE, stderr=subprocess.STDOUT
> -        )
> -
> -        reader = codecs.getreader('utf-8')(process.stdout, errors='surrogateescape')
> -        buf = ''
> -        while True:
> -            out = reader.read(1, 1)
> -            if out:
> -                sys.stdout.write(out)
> -                sys.stdout.flush()
> -                buf += out
> -            elif out == '' and process.poll() != None:
> -                break
> -
> +        ret, output = update_layer.main(options, queue)
> +    except Exception:
> +        raise
>      finally:
> -        signal.signal(signal.SIGINT, signal.SIG_DFL)
> -    return process.returncode, buf
> -
> -
> -def prepare_update_layer_command(options, branch, layer, initial=False):
> -    """Prepare the update_layer.py command line"""
> -    if branch.update_environment:
> -        cmdprefix = branch.update_environment.get_command()
> -    else:
> -        cmdprefix = 'python3'
> -    cmd = '%s update_layer.py -l %s -b %s' % (cmdprefix, layer.name, branch.name)
> -    if options.reload:
> -        cmd += ' --reload'
> -    if options.fullreload:
> -        cmd += ' --fullreload'
> -    if options.nocheckout:
> -        cmd += ' --nocheckout'
> -    if options.dryrun:
> -        cmd += ' -n'
> -    if initial:
> -        cmd += ' -i'
> -    if options.loglevel == logging.DEBUG:
> -        cmd += ' -d'
> -    elif options.loglevel == logging.ERROR:
> -        cmd += ' -q'
> -    if options.keep_temp:
> -        cmd += ' --keep-temp'
> -    return cmd
> +        queue.put([ret, output])
>  
>  def update_actual_branch(layerquery, fetchdir, branch, options, update_bitbake, bitbakepath):
>      """Update actual branch for layers and bitbake in database"""
> @@ -266,6 +223,7 @@ def main():
>      try:
>          lockfn = os.path.join(fetchdir, "layerindex.lock")
>          lockfile = utils.lock_file(lockfn)
> +        update_log_path = update_layer.update_log_path
>          if not lockfile:
>              logger.error("Layer index lock timeout expired")
>              sys.exit(1)
> @@ -315,8 +273,15 @@ def main():
>              # We now do this by calling out to a separate script; doing otherwise turned out to be
>              # unreliable due to leaking memory (we're using bitbake internals in a manner in which
>              # they never get used during normal operation).
> +            update_layer_options = copy.copy(options)
>              last_rev = {}
>              for branch in branches:
> +                # The bitbake modules are not compatible between branches, so
> +                # remove them, they will be imported again when needed.
> +                for name, module in sys.modules.copy().items():
> +                    if '__file__' in module.__dict__ and module.__file__.startswith(bitbakepath):
> +                        logger.debug('Removing module %s' % name)
> +                        del sys.modules[name]

This part worried me, so I did some quick searching and came across this:

  http://justus.science/blog/2015/04/19/sys.modules-is-dangerous.html

Relying upon removing old modules from memory just seems fraught with
danger to me. I'd really rather not go back to debugging the contamination
issues we dealt with earlier for the sake of saving a few minutes.

Another problem with this is it breaks supporting python 2 and python 3 in the
same parse - in the OE layer index we actually still enable parsing back to fido
(1.8). I honestly don't know how many updates those branches actually see,
and we should probably disable parsing for fido and perhaps jethro, but I suspect
at least krogoth (2.1) still gets some usage. Until we disable parsing of all of 
the python 2 based branches we need the ability to handle both versions and
that requires update_layer.py to be run separately.

I do agree in general that performance of the update process could be
improved - one particularly poor area is where we collect the variable values 
for each layer, that part is very slow and we're doing it regardless of whether
the layer has changed or not which seems wasteful. That would be my
recommendation for where to look next (assuming you have time).

Cheers,
Paul

-- 

Paul Eggleton
Intel Open Source Technology Centre



More information about the yocto mailing list