Last week, I landed concurrent downloads in hashin. The example was that you do something like...


$ time hashin -r some/requirements.txt --update-all

...and the whole thing takes ~2 seconds even though it that some/requirements.txt file might contain 50 different packages, and thus 50 different PyPI.org lookups.

Just wanted to point out, this is not unique to use with --update-all. It's for any list of packages. And I want to put some better numbers on that so here goes...

Suppose you want to create a requirements file for every package in the current virtualenv you might do it like this:


# the -e filtering removes locally installed packages from git URLs
$ pip freeze | grep -v '-e ' | xargs hashin -r /tmp/reqs.txt

Before running that I injected a little timer on each pypi.org download. It looked like this:


def get_package_data(package, verbose=False):
    url = "https://pypi.org/pypi/%s/json" % package
    if verbose:
        print(url)
+   t0 = time.time()
    content = json.loads(_download(url))
    if "releases" not in content:
        raise PackageError("package JSON is not sane")
+   t1 = time.time()
+   print(t1 - t0)

I also put a print around the call to pre_download_packages(lookup_memory, specs, verbose=verbose) to see what the "total time" was.

The output looked like this:

▶ pip freeze | grep -v '-e ' | xargs python hashin.py -r /tmp/reqs.txt
0.22896194458007812
0.2900810241699219
0.2814369201660156
0.22658205032348633
0.24882292747497559
0.268247127532959
0.29332590103149414
0.23981380462646484
0.2930259704589844
0.29442572593688965
0.25312376022338867
0.34232664108276367
0.49491214752197266
0.23823285102844238
0.3221290111541748
0.28302812576293945
0.567702054977417
0.3089122772216797
0.5273139476776123
0.31477880477905273
0.6202089786529541
0.28571176528930664
0.24558186531066895
0.5810830593109131
0.5219211578369141
0.23252081871032715
0.4650228023529053
0.6127192974090576
0.6000659465789795
0.30976200103759766
0.44440698623657227
0.3135409355163574
0.638585090637207
0.297544002532959
0.6462509632110596
0.45389699935913086
0.34597206115722656
0.3462028503417969
0.6250648498535156
0.44159507751464844
0.5733060836791992
0.6739277839660645
0.6560370922088623
SUM TOTAL TOOK 0.8481268882751465

If you sum up all the individual times it would have become 17.3 seconds. It's 43 individual packages and 8 CPUs multiplied by 5 means it had to wait with some before downloading the rest.

Clearly, this works nicely.

Comments

Your email will never ever be published.

Related posts