Parse Email Attachments
Since the beginning of mailparser.io one particular feature request popped up every couple of days. People were asking if it was possible to automatically read and process text stored inside e-mail attachments. Quickly it became clear to us that this is something we wanted to build for our customers! Read on to find out how you can parse email attachments with Mailparser.
Update: This article covers the e-mail parsing functionality of Mailparser. If you are looking for a solution to parse complex PDF attachments and convert to Excel, we have exciting news for you! We decided to launch a service dedicated to PDF document parsing. Check out Docparser.
Today we are super happy to announce the launch of our new Email Attachment Parser! It is now possible to pull text from files that are attached to your e-mails. The file formats from which our parser can read and send data from PDF, Excel Spreadsheets, CSV, TXT, or XML files.
To take advantage of this new feature:
How to parse email attachments
This is how it works:
- Send over an email with a file attachment to your @mailparser.io inbox.
- Create a new parsing rule for each data field you want to extract from the attachment.
- Set the source of the parsing rule to “Attachment” and choose “Text Content”. If you are parsing a Spreadsheet file or you want to extract table rows from a PDF file, switch the filter to “Text Cells”.
- You will then see the text data of the attached file. The image below shows this step.
From there on, everything works as usual: you can chain up multiple text filters until your data field is isolated from the rest. Have a look at the next image to see this in action. With a first filter, we define where the data we want to extract is located inside the text of the attachment. Then, we define where the data field ends with another filter.
Which file formats are supported?
Right now it is possible to extract text from .pdf files, .xls and .xlsx Excel Spreadsheets, .csv files, generic .txt and .xml files. If your file format is not listed yet, let us know and we will look into it. The basic idea stays always the same. No matter which file format, the text will be extracted and is then available as if it was text in the body of the email.