Pip, Pis, Pandas and Wheels

Posted on 07 May 2018 in Technology

A user attempting to install Baby Buddy submitted an interesting issue with the following error during the pipenv install process:

THESE PACKAGES DO NOT MATCH THE HASHES FROM Pipfile.lock!. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
    docopt==0.6.2 from https://www.piwheels.org/simple/docopt/docopt-0.6.2-py2.py3-none-any.whl#sha256=0340515c74203895f92f87702896e45424bf51dc71bf15b4748450f50be04346 (from -r /tmp/pipenv-vf5_eub9-requirements/pipenv-k7_dvsro-requirement.txt (line 1)):
        Expected sha256 49b3a825280bd66b3aa83585ef59c4a8c82f2c8a522dbe754a8bc8d08c85c491
             Got        0340515c74203895f92f87702896e45424bf51dc71bf15b4748450f50be04346

Hash checking and Pipfile.lock are a part of the pipenv toolchain and meant to verify the integrity of packages being installed. Committing the lock file is recommended practice and generally something I have not had many problems with. There are some old tickets on GitHub reporting issues with this hashing between operating systems, but the latest versions of pipenv supposedly do not have these problems.

Why is this user getting a hash match error? I had a Pi lying around, so I decided to try replicating the issue. Many hours later, I got Baby Buddy up and running on my (second) Pi and learned a lot about the Python packaging process and how it can go wrong on ARM devices.

Kernel Panics and Segfaults

In my initial tests, I couldn't even seem to get my Pi to install pipenv itself, despite the report from the user of issues after that point. The test Pi's syslog filled with errors similar to:

May  5 23:05:31 raspberrypi kernel: [18801.650989] [<8010ffd8>] (unwind_backtrace) from [<8010c240>] (show_stack+0x20/0x24)
May  5 23:05:31 raspberrypi kernel: [18801.653797] [<8010c240>] (show_stack) from [<807840a4>] (dump_stack+0xd4/0x118)
May  5 23:05:31 raspberrypi kernel: [18801.655292] [<807840a4>] (dump_stack) from [<80254a20>] (print_bad_pte+0x150/0x1b4)
May  5 23:05:31 raspberrypi kernel: [18801.658037] [<80254a20>] (print_bad_pte) from [<80256d6c>] (unmap_page_range+0x57c/0x668)
May  5 23:05:31 raspberrypi kernel: [18801.660979] [<80256d6c>] (unmap_page_range) from [<80256ea4>] (unmap_single_vma+0x4c/0x54)
May  5 23:05:31 raspberrypi kernel: [18801.664093] [<80256ea4>] (unmap_single_vma) from [<80257150>] (unmap_vmas+0x64/0x78)
May  5 23:05:31 raspberrypi kernel: [18801.667274] [<80257150>] (unmap_vmas) from [<8025d468>] (exit_mmap+0xac/0x154)
May  5 23:05:31 raspberrypi kernel: [18801.668941] [<8025d468>] (exit_mmap) from [<8011ac74>] (mmput+0x58/0x108)
May  5 23:05:31 raspberrypi kernel: [18801.670592] [<8011ac74>] (mmput) from [<80121d84>] (do_exit+0x35c/0xb9c)
May  5 23:05:31 raspberrypi kernel: [18801.672289] [<80121d84>] (do_exit) from [<8012265c>] (do_group_exit+0x4c/0xe4)
May  5 23:05:31 raspberrypi kernel: [18801.673960] [<8012265c>] (do_group_exit) from [<8012da40>] (get_signal+0x36c/0x6bc)
May  5 23:05:31 raspberrypi kernel: [18801.677190] [<8012da40>] (get_signal) from [<8010b2f4>] (do_signal+0xc4/0x3e4)
May  5 23:05:31 raspberrypi kernel: [18801.678882] [<8010b2f4>] (do_signal) from [<8010b7fc>] (do_work_pending+0xb8/0xd0)
May  5 23:05:31 raspberrypi kernel: [18801.682219] [<8010b7fc>] (do_work_pending) from [<80108094>] (slow_work_pending+0xc/0x20)
May  5 23:05:31 raspberrypi kernel: [18801.685688] BUG: Bad page map in process pip  pte:1739f75f pmd:31beb835
May  5 23:05:31 raspberrypi kernel: [18801.687473] page:ba6e525c count:1 mapcount:-1 mapping:b1ab06c9 index:0x3f1
May  5 23:05:31 raspberrypi kernel: [18801.689239] flags: 0x68(uptodate|lru|active)
May  5 23:05:31 raspberrypi kernel: [18801.690919] raw: 00000068 b1ab06c9 000003f1 fffffffe 00000001 ba6e5228 ba7194c4 00000000
May  5 23:05:31 raspberrypi kernel: [18801.694201] raw: 00000000
May  5 23:05:31 raspberrypi kernel: [18801.695918] page dumped because: bad pte
May  5 23:05:31 raspberrypi kernel: [18801.697552] addr:003f1000 vm_flags:00100073 anon_vma:b1ab06c8 mapping:  (null) index:3f1
May  5 23:05:31 raspberrypi kernel: [18801.700631] file:  (null) fault:  (null) mmap:  (null) readpage:  (null)
May  5 23:05:31 raspberrypi kernel: [18801.702198] CPU: 0 PID: 4148 Comm: pip Tainted: G    B D  C      4.14.34-v7+ #1110
May  5 23:05:31 raspberrypi kernel: [18801.705262] Hardware name: BCM2835

and

May 5 23:05:30 raspberrypi kernel: [18800.789345] Internal error: Oops: 5 [#1] SMP ARM

After trying some random things, I ran filesystem and memory tests. Everything checked out, but the Pi was consistently failing under load. I also eventually did some testing on the power supply and other aspects of the Pi, but that will have to wait for another post. I had a second Pi lying around, so I grabbed that and managed to reproduce an error similar to the reported one.

Hash Match Failures

Similar to the initial user report, the pipenv install command failed with a hash match issue. But unlike the other affected user, my failing has was for a different package.

I eventually stumbled on this very interesting blog post from Raspberry Pi Community Manager Ben Nuttall: building a faster Python package repository for Raspberry Pi users. In a nutshell, Ben explains that PyPI uses "wheels" to prebuild Python packages for installations on specific architectures, but ARM (Raspberry Pi's architecture) is not well represented by maintainers who build these wheels. Ben improved this situation greatly by developing a service to build and distribute ARM wheels: piwheels. The piwheels service is now a part of the default Raspbian distribution, making it much faster to pip install [...] Python packages out of the box!

What does this have to do hash match failures? Pipenv uses a special file, Pipfile.lock to store hash information for all dependencies in a project and committing this file is generally recommended as a security measure. The problem is that the hashes are generated from a specific source (most often PyPI) and for the large majority of instances the same source will likely be used by other users installing a project. Unfortunately -

A package from PyPI, locked on an AMD64 system will have different hash than one from piwheels, locked on an ARM system like the Pi.

I develop Baby Buddy on my AMD64 desktop, so the lock file included in the repository has AMD64 hashes generated from PyPI. Using this same lock file on a Raspberry Pi with Rasbian results in the THESE PACKAGES DO NOT MATCH THE HASHES FROM Pipfile.lock! error from pipenv.

How can this be resolved? There are only really two options I could come up with:

  1. Execute pipenv lock before pipenv install on ARM devices.
  2. Include the --skip-lock flag when the pipenv install command.

Neither option is particularly ideal, as they both compromise the security aspect of the lock file (and locking can take a long time on a Pi). But for the purposes of further troubleshooting, I chose to use --skip-lock to get past this issue. And it worked!

Pip and Pandas

Happy to have sorted out the lock issue, I executed pipenv install --three --dev --skip-lock and waited for pipenv to do its thing. Sadly, the install eventually failed with the error:

Double requirement given: numpy==1.12.1 [...] (already in numpy==1.9.3 [...], name='numpy')

Numpy is a requirement of Pandas, which Baby Buddy uses for one of its graphs. I have actually thought about refactoring it out in the past as they are both pretty heavy requirements and hardly used, but I just haven't found the time.

Anyway, Baby Buddy pins Pandas below version 0.22.0 because a while back I had issue with the latest version of Pandas failing to build for Python 3.4. As it turns out, there has been a fairly recently issue with new versions of pip (10.x) and Pandas: Pandas installation problems with pip version 10. Two workaround options are provided:

  1. use an older version of pip (9.x), or
  2. use pip install pandas --no-build-isolation.

Before attempting these workarounds, I tried a new install of Pandas without the version pinning. Surprisingly, this worked -- Pandas and all of it's requirements installed from wheels on piwheels.

For Baby Buddy, perhaps the logical thing to do to workaround this issue will be to drop support for Python 3.4. This would allow me to remove the Pandas pinning and not have to provide special instructions for ARM-based installations. Before I could think too deeply about that, I had another error on my hands...

Psycopg2 and Pyscopg2-binary

Psycopg2 is a Python package that provides a PostgreSQL adapter. It is an important option to provide to potential Baby Buddy users and also required for the project's Heroku deployment method. Building psycopg2 requires some external development libraries and other tools that are not always present in deployment environments. In order to avoid extra dependencies, Baby Buddy uses the psycopg2-binary package. The reasoning for the two packages in explained in the Psycopg 2.7.4 release post.

Piwheels does have a psycopg2-binary wheel page, but it currently does not list any wheels. Unable to find any wheels, pip attempts to download and build the package from source, which fails with the error:

    Error: pg_config executable not found.

pg_config is required to build psycopg2 from source.  Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:

    python setup.py build_ext --pg-config /path/to/pg_config build ...

or with the pg_config option in 'setup.cfg'.

If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.

For further information please check the 'doc/src/install.rst' file (also at
<http://initd.org/psycopg/docs/install.html>).

Piwheels does provide wheels for psycopg2, but not for the latest version (2.7.4 as of this writing) so pipenv install psycopg2 fails with the exact same error message. Ultimately, this issue can be overcome by pinning psycopg2 to <2.4.7, allowing pip to find and (quickly!) install the wheel from piwheels. Not a long term solution, but it moves things forward a bit here.

ImportError: No module named 'Image'

After working around the issue with Pandas and Psycopg2, the pipenv install process works. However, further down the line in the manual install process, the gulp collectstatic command spits out this error:

Traceback (most recent call last):
  File "/home/pi/.local/share/virtualenvs/public-BuvyXxxq/lib/python3.5/site-packages/easy_thumbnails/utils.py", line 11, in <module>
    from PIL import Image
  File "/home/pi/.local/share/virtualenvs/public-BuvyXxxq/lib/python3.5/site-packages/PIL/Image.py", line 60, in <module>
    from . import _imaging as core
ImportError: libopenjp2.so.7: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  [...]
  File "/home/pi/.local/share/virtualenvs/public-BuvyXxxq/lib/python3.5/site-packages/easy_thumbnails/utils.py", line 13, in <module>
    import Image
ImportError: No module named 'Image'

Blargh. Luckily, this message is pretty verbose and has an easy fix. The libopenjp2.so.7 library is not included in Raspbian and (apparently?) included in other distributions, as I have never seen this error before. This is overcome with one additional command during the install process:

sudo apt-get install libopenjp2-7-dev

After installing this library, everything else goes smoothly and Baby Buddy runs on the Pi! Hooray!

Conclusions

Packaging is hard. I am often concerned about the fragility of the ecosystem around web development (or my little corner of it, at least). Baby Buddy is a fairly simple web app - it uses Django on the backend and Bootstrap on the frontend. It is not an SPA. It is not using React or Vue or any other complex libraries. But it still relies on a lot of external moving parts and a somewhat complicated build process. Heroku and Docker help to ease this pain, but also add time and complexities of their own.

But I still love it. And I love the community that surrounds it.

It never occurred to me to run Baby Buddy on a Pi. But one user tried it and the collective knowledge of other developers, users and community members (of various communities!) helped us sort it out (temporarily, anyway).

I'm not quite sure yet how I will address this use case for Baby Buddy (suggestions welcome), but I had a damn fun time troubleshooting it! Next up, what the hell is wrong with the first Pi I tested this on...