annotate content/Coding/005-django-unicode-error-uploads.rst @ 23:e4f02a31925d

New blog post for moving simulators to GitHub.
author Brian Neal <bgneal@gmail.com>
date Thu, 02 Jul 2020 15:20:39 -0500
parents 7ce6393e6d30
children
rev   line source
bgneal@4 1 Django Uploads and UnicodeEncodeError
bgneal@4 2 #####################################
bgneal@4 3
bgneal@4 4 :date: 2011-06-04 20:00
bgneal@4 5 :tags: Django, Python, Linux, Unicode
bgneal@4 6 :slug: django-uploads-and-unicodeencodeerror
bgneal@4 7 :author: Brian Neal
bgneal@4 8
bgneal@4 9 Something strange happened that I wish to document in case it helps others. I
bgneal@4 10 had to reboot my Ubuntu server while troubleshooting a disk problem. After the
bgneal@4 11 reboot, I began receiving internal server errors whenever someone tried to view
bgneal@4 12 a certain forum thread on my Django_ powered website. After some detective work,
bgneal@4 13 I determined it was because a user that had posted in the thread had an avatar
bgneal@4 14 image whose filename contained non-ASCII characters. The image file had been
bgneal@4 15 there for months, and I still cannot explain why it just suddenly started
bgneal@4 16 happening.
bgneal@4 17
bgneal@4 18 The traceback I was getting ended with something like this:
bgneal@4 19
bgneal@4 20 .. sourcecode:: python
bgneal@4 21
bgneal@4 22 File "/django/core/files/storage.py", line 159, in _open
bgneal@4 23 return File(open(self.path(name), mode))
bgneal@4 24
bgneal@4 25 UnicodeEncodeError: 'ascii' codec can't encode characters in position 72-79: ordinal not in range(128)
bgneal@4 26
bgneal@4 27 So it appeared that the ``open()`` call was triggering the error. This led me on
bgneal@4 28 a twisty Google search which had many dead ends. Eventually I found a suitable
bgneal@4 29 explanation. Apparently, Linux filesystems don't enforce a particular Unicode
bgneal@4 30 encoding for filenames. Linux applications must decide how to interpret
bgneal@4 31 filenames all on their own. The Python OS library (on Linux) uses environment
bgneal@4 32 variables to determine what locale you are in, and this chooses the encoding for
bgneal@4 33 filenames. If these environment variables are not set, Python falls back to
bgneal@4 34 ASCII (by default), and hence the source of my ``UnicodeEncodeError``.
bgneal@4 35
bgneal@4 36 So how do you tell a Python instance that is running under Apache / ``mod_wsgi``
bgneal@4 37 about these environment variables? It turns out the answer is in the `Django
bgneal@4 38 documentation`_, albeit in the ``mod_python`` integration section.
bgneal@4 39
bgneal@4 40 So, to fix the issue, I added the following lines to my ``/etc/apache2/envvars``
bgneal@4 41 file:
bgneal@4 42
bgneal@4 43 .. sourcecode:: bash
bgneal@4 44
bgneal@4 45 export LANG='en_US.UTF-8'
bgneal@4 46 export LC_ALL='en_US.UTF-8'
bgneal@4 47
bgneal@4 48 Note that you must cold stop and re-start Apache for these changes to take
bgneal@4 49 effect. I got tripped up at first because I did an ``apache2ctrl
bgneal@4 50 graceful``, and that was not sufficient to create a new environment.
bgneal@4 51
bgneal@4 52 .. _Django: http://djangoproject.com
bgneal@4 53 .. _Django documentation: https://docs.djangoproject.com/en/1.3/howto/deployment/modpython/#if-you-get-a-unicodeencodeerror