bgneal@4: Django Uploads and UnicodeEncodeError bgneal@4: ##################################### bgneal@4: bgneal@4: :date: 2011-06-04 20:00 bgneal@4: :tags: Django, Python, Linux, Unicode bgneal@4: :slug: django-uploads-and-unicodeencodeerror bgneal@4: :author: Brian Neal bgneal@4: bgneal@4: Something strange happened that I wish to document in case it helps others. I bgneal@4: had to reboot my Ubuntu server while troubleshooting a disk problem. After the bgneal@4: reboot, I began receiving internal server errors whenever someone tried to view bgneal@4: a certain forum thread on my Django_ powered website. After some detective work, bgneal@4: I determined it was because a user that had posted in the thread had an avatar bgneal@4: image whose filename contained non-ASCII characters. The image file had been bgneal@4: there for months, and I still cannot explain why it just suddenly started bgneal@4: happening. bgneal@4: bgneal@4: The traceback I was getting ended with something like this: bgneal@4: bgneal@4: .. sourcecode:: python bgneal@4: bgneal@4: File "/django/core/files/storage.py", line 159, in _open bgneal@4: return File(open(self.path(name), mode)) bgneal@4: bgneal@4: UnicodeEncodeError: 'ascii' codec can't encode characters in position 72-79: ordinal not in range(128) bgneal@4: bgneal@4: So it appeared that the ``open()`` call was triggering the error. This led me on bgneal@4: a twisty Google search which had many dead ends. Eventually I found a suitable bgneal@4: explanation. Apparently, Linux filesystems don't enforce a particular Unicode bgneal@4: encoding for filenames. Linux applications must decide how to interpret bgneal@4: filenames all on their own. The Python OS library (on Linux) uses environment bgneal@4: variables to determine what locale you are in, and this chooses the encoding for bgneal@4: filenames. If these environment variables are not set, Python falls back to bgneal@4: ASCII (by default), and hence the source of my ``UnicodeEncodeError``. bgneal@4: bgneal@4: So how do you tell a Python instance that is running under Apache / ``mod_wsgi`` bgneal@4: about these environment variables? It turns out the answer is in the `Django bgneal@4: documentation`_, albeit in the ``mod_python`` integration section. bgneal@4: bgneal@4: So, to fix the issue, I added the following lines to my ``/etc/apache2/envvars`` bgneal@4: file: bgneal@4: bgneal@4: .. sourcecode:: bash bgneal@4: bgneal@4: export LANG='en_US.UTF-8' bgneal@4: export LC_ALL='en_US.UTF-8' bgneal@4: bgneal@4: Note that you must cold stop and re-start Apache for these changes to take bgneal@4: effect. I got tripped up at first because I did an ``apache2ctrl bgneal@4: graceful``, and that was not sufficient to create a new environment. bgneal@4: bgneal@4: .. _Django: http://djangoproject.com bgneal@4: .. _Django documentation: https://docs.djangoproject.com/en/1.3/howto/deployment/modpython/#if-you-get-a-unicodeencodeerror