Search This Blog

Tuesday, April 15, 2014

Wrong name of attached file in inbound email: "=?UTF8?Q?=....

Hello guys

Last week I have fought with a very strange issue: from time to time inbound emails from one particular sender contained attachments which names had issues with encoding.

Just to show you I have copied such email with two Excel files attached to my test database.
I know that these two Excel files have non-ASCII characters in their names (they have Ukrainian characters actually) but I have never experienced any issues with sending/receiving files with non-ASCII characters in Lotus Notes.
However, as you can see below, @AttachmentNames returned file names with encoding issues in their names.

At first I wrote a simple script to check what properties lotusscript EmbeddedObject has.

It was interesting to found out that 'Source' property contained correct file name - it was automatically decoded somehow. At first I decided that this was the workaround: all I had to do is to reattach file to the email using lotusscript capabilities but use 'Source' value instead of 'Name' to get a proper file name.
However later I discovered that:
- Lotus Notes clients of other people in the same environment still had a mess in 'Source' property
- our Lotus Domino server 8.5.3FP4 on Linux 3.0 also didn't have correct value in 'Source' property of corresponding EmbeddedObject. When I run the same script in the Domino server context and called 'messagebox eo.Source' I received the following output:

So, after failing with idea of using EmbeddedObject.Source as a workaround I called for Google help.
I didn't find too much about this issue though. 
However I found this article that helped me to understand a little why it may happen at all.
Plus I found a hint that string like "=D0=92=D0=B0=D1=81=D1=8F" means that it is encoded as Quoted-Printable.

So, knowing that file name is encoded using Quoted-Printable format I decided to decode file name using lotusscript capabilities and check what I would get. 

I wrote a simple script with the next core:

.....
Set stream = s.CreateStream
ForAll x In rtitem.embeddedobjects
Set eo = x
Call stream.Truncate()
Call stream.WriteText(eo.Name) 'eo.Name contains =?UTF8?Q?=D0=9F=D0=B5=D1=80=D0=B5=D0=BB....
Set convdoc = db.CreateDocument
Set mime = convDoc.CreateMIMEEntity
Call mime.SetContentFromText(stream, "text/plain;charset=utf-8", ENC_QUOTED_PRINTABLE)
Call mime.DecodeContent

End ForAll

When I tried it in Domino designer I have found that the result is very close to the real file name: highlighted parts are exactly what I need. The only one problem is that correct parts of file name are still spitted by some encoding-related characters

Eventually, I didn't find any smart way how to decode it correctly without having those =?UTF8?Q? .... characters. So I have removed them like a dummy:

.....
Set stream = s.CreateStream
ForAll x In rtitem.embeddedobjects
Set eo = x

Call stream.Truncate()
Call stream.WriteText(eo.Name)

Set convdoc = db.CreateDocument
Set mime = convDoc.CreateMIMEEntity
Call mime.SetContentFromText(stream, "text/plain;charset=utf-8", ENC_QUOTED_PRINTABLE)
Call mime.DecodeContent

filename = mime.Contentastext
filename = Replace(filename, "=?UTF8?Q?", "")
filename = Replace(filename, "?= ", "")
filename = Replace(filename, "?", "")

MsgBox "filename=" & filename
End ForAll

Finally, I have got a correct file name :-)

However, if you know how to resolve this encoding issue better, please let me know, I am really eager to find better solution since these replacements I used made me feel ashamed.

No comments:

Post a Comment