Read and Process E-mail Attachments
Update: This article covers the e-mail parsing functionality of mailparser.io. If you are looking for a solution to parse email attachments, and convert PDF to Excel, we have exciting news for you! We decided to launch a service dedicated to PDF document parsing. Check out Docparser and get an invite to our closed Beta.
Since the beginning of mailparser.io one particular feature request popped up every couple of days. People were asking if it was possible to automatically read and process text stored inside e-mail attachments. Quickly it became clear to us that this is something we want to build for our customers!
Today we are super happy to announce the launch of our new Email Attachment Parser! It is now possible to pull text from files which are attached to your e-mails. The file formats from which our parser can read data from are PDF, Excel Spreadsheets, CSV, TXT or XML files.
How to extract text text from attached files?
This is how it works: Send over an email with a file attachment to your @mailparser.io inbox. Then, you can create a new parsing rule for each data field you want to extract from the attachment. Set the source of the parsing rule to “Attachment” and choose “Text Content”. If you are parsing Spreadsheet file or you want to extract table rows from a PDF file, switch the filter to “Text Cells”. You will then see the text parts of the attached file. The image below shows this step.
From there on, everything works as usual: you can chain up multiple text filters until your data field is isolated from the rest. Have a look at the next image to see this in action. With a first filter we define where the data we want to extract is located inside the the text of the attachment. Then, we define where the data field ends with another filter.
Which file formats are supported?
Right now it is possible to extract text from .pdf files, .xls and .xlxs Excel Spreadsheets, .csv files, generic .txt and .xml files. If your file format is not listed yet, let us know and we will look into it . The basic idea stays always the same. No matter which file format, the text will be extracted and is then available as if it was text in the body of the email.