view content/Coding/005-django-unicode-error-uploads.rst @ 8:e29fd75628d6

Turn on disqus comments.
author Brian Neal <bgneal@gmail.com>
date Sat, 01 Feb 2014 13:58:51 -0600
parents 7ce6393e6d30
children
line wrap: on
line source
Django Uploads and UnicodeEncodeError
#####################################

:date: 2011-06-04 20:00
:tags: Django, Python, Linux, Unicode
:slug: django-uploads-and-unicodeencodeerror
:author: Brian Neal

Something strange happened that I wish to document in case it helps others.  I
had to reboot my Ubuntu server while troubleshooting a disk problem. After the
reboot, I began receiving internal server errors whenever someone tried to view
a certain forum thread on my Django_ powered website. After some detective work,
I determined it was because a user that had posted in the thread had an avatar
image whose filename contained non-ASCII characters. The image file had been
there for months, and I still cannot explain why it just suddenly started
happening. 

The traceback I was getting ended with something like this:

.. sourcecode:: python

   File "/django/core/files/storage.py", line 159, in _open
   return File(open(self.path(name), mode))

   UnicodeEncodeError: 'ascii' codec can't encode characters in position 72-79: ordinal not in range(128)

So it appeared that the ``open()`` call was triggering the error. This led me on
a twisty Google search which had many dead ends. Eventually I found a suitable
explanation. Apparently, Linux filesystems don't enforce a particular Unicode
encoding for filenames. Linux applications must decide how to interpret
filenames all on their own. The Python OS library (on Linux) uses environment
variables to determine what locale you are in, and this chooses the encoding for
filenames.  If these environment variables are not set, Python falls back to
ASCII (by default), and hence the source of my ``UnicodeEncodeError``.

So how do you tell a Python instance that is running under Apache / ``mod_wsgi``
about these environment variables? It turns out the answer is in the `Django
documentation`_, albeit in the ``mod_python`` integration section.

So, to fix the issue, I added the following lines to my ``/etc/apache2/envvars``
file:

.. sourcecode:: bash

   export LANG='en_US.UTF-8'
   export LC_ALL='en_US.UTF-8'

Note that you must cold stop and re-start Apache for these changes to take
effect. I got tripped up at first because I did an ``apache2ctrl
graceful``, and that was not sufficient to create a new environment.

.. _Django: http://djangoproject.com
.. _Django documentation: https://docs.djangoproject.com/en/1.3/howto/deployment/modpython/#if-you-get-a-unicodeencodeerror