airflow : wait to receive email and process data contained in attached file - python-3.x

I am looking for a wait to schedule tasks based on the reception of an email.
More precisely, I received an email with some attached data every week and I need to add these data into a database (and process some information). Is there a way to do it automatically?
Would airflow be a good option to do this? I found that airflow can send email but I did not find anything about reading mails.
I know it is possible to read email and download attached file in python. But what would be the best way to check if a specific email is received (defined by a sender) and process its data as soon as it is received ?

Airflow is a great option for this workflow.
Airflow has the concept of SensorOperators which are derived from the BaseSensorOperator. Using a SensorOperator will allow you to easily control the poke_interval and timeout of the task as well as how to handle the various situations depending on whether or not the email arrives as expected.

You could schedule some BashOperator or PythonOperator that periodically checks new mail and if they find one they start processing it. While I cannot give you any specific library, I am sure there must be a way to read and handle email in Python.

Related

Email if a cron job succeeds on Cronitor

Is it possible to add a setting on Cronitor that sends an email once a cron job succeeds. Right now I only get email if a cron job fails.
This is what i have at the moment :
Alert preferences Failure tolerance Duration alerts
Alert notes
(X) Send alerts any time there is a problem
() Only send alerts if my job has consistently not run when expected...
I tried to email/contact support but now answer from Cronitor yet.
I'm one of the creators of Cronitor. I'm really sorry we somehow missed your support inquiry.
The answer is that, yes, this is possible but you cannot set this up without an API call. After the initial setup is complete (and the new alert rule is added) you can continue managing the monitor from the dashboard.
If you email support#cronitor.io again with specifics I can help you with adding this rule via the API.

Scheduling an email with the Gmail API

I found a similar question from 2016, however at that time Gmail itself did not support scheduled sending of emails.
Now that you can schedule messages to send later directly from Gmail, I was wondering if there was a way to do it with their API.
Interestingly, scheduled emails appear as message objects when calling messages.list, but they do not contain any labels.
Any help would be appreciated! And if it's not possible at the moment, it would be awesome to get a reply from someone at Google about when this will become possible (I believe they officially endorse the gmail-api tag to StackOverflow)
I don't think a time-based trigger will work--even if you write the code to store email send data and then build something that regularly checks whether it's time for an email to be sent. See Google's documentation on triggers, and you'll notice that time-based triggers aren't available for Gmail scripts.
Unfortunately, there is no Gmail API endpoint for scheduling the sending of emails directly.
One workaround would be to write a script in Google Apps Script (https://script.google.com) which handles the composing of the email you wish to send, as well as a function to send the mail via the API. You can then use the built-in 'Apps Script Project Triggers' feature to trigger the function to run on a schedule; for example on action/event or at a specific/repeated time.
Button for adding trigger to Apps Script

Implementing logging and retry mechanism in netsuite suitescripts

I am in need of a way to implement the error logging and providing a way to the admins to rtry any failure that occurs within a suitescript.
Here are my thoughts on the implementation:
Lets say for restlet i can log the datain, or the incoming data in any userevent script in a text file along with its status as success or failure. Later have a scheduled script to process that text file that may send those errors to my .Net Api and I can provide a way for Admins to retry.
Could anyone suggest me how its normally done in netsuite projects?
For similar systems, I typically advise you create Custom Records. Your custom records can have a field to store the raw data (JSON, xml, etc) as well as a Status (Succeeded, Failed, Retry, etc). You could consider retry mechanisms like having a User Event on the Custom Record that immediately retries upon creation of the record, then if that fails have a Map/Reduce that runs on a regular schedule to clean things up.
If the native Execution Logs aren't providing enough functionality for you in that respect, you can add a Custom Record for "logging" as well, but I'd suggest trying to use the native logs first. The Script Execution Logs UI provides reasonable searching/filtering capabilities.

Spark Email Processing

We are developing a big data solution in which one requirement is to process incoming emails. The technology stack is not finalized yet but mostly we might go with Sendmail as MTA and Procmail as MDA. We are open to any other very efficient solution.
These emails are essentially carry data in attachments and are not meant for end user, so the email flow ends with Spark processing.
My first thought was it would be great if there was a message queuing system such as Apache-Kafka which could accept emails as messages and then provide them to the client such as Spark on demand but it seems that sort of technology/approach is not available in any of the message brokering systems.
This means we would have to receive emails via SMTP MTA and then extract the information from the MDA.
We could use Procmail to extract the contents of the email and the attachments and put them in a folder per email and then scan the folders and process them in spark.
Alternatively if Spark has any plugins which could pull in emails from an MDA and break it down into it's attachments it would make life much simpler.
If there is any other smarter solution it would be welcome.
So the fundamental question is what technology is available for channelizing emails through Spark for processing. Connectors etc.
Mailgun or Sendgrid incoming email processing is so easy that I could hardly imagine any alternative for a new, especially big, system. I only played with them, but my impression was that my any actual or potential (billions of emails) problem related to emails is solved for good. Not related to Spark, those system just post email content as http POST request to a URL you provide.
Sendgrid used to incorrectly parse encoding, their support ignored my emails and eventually deleted a ticket without solving the problem. Mailgun always returns UTF8 regardless of original encoding. Manual MIME parsing is such a grandiose task itself so it is better to use existing solutions, unless emails are generated by a computer. But even then, IaaS services are so much cheaper than developer time.

automation of tasks - email using web application

I have a web application that monitors farms in certain areas. Right now I am having a problem of performing automation with some of the tasks.
Users of the web application can send reports or checkins using keywords. If the reports or checkins correspond to certain keywords, for example "alert", I need the web application to send an alert to the user via email using that web application. But that alert must be sent two weeks after the date of the report received, and to that particular user only.
Would it be possible to use cron to perform this? If not, can anyone suggest me a workaround?
A possible approach you might consider is to store an entry in a database for each of these reminder emails you need to send, at the time your user does whatever action in your application that determines the need to send that email exists. Include the recipient, the date to be sent, and the email content as content you store for each entry. Schedule a single cron job to run periodically to process these database records by due date, and populate an email template to be sent out. You can then either delete the database records, or a better option, include a column that indicates they were sent and mark them as sent.
It would help to provide which technology stack you're operating on and what the application is developed in. Others might be able to point you to technology specific approaches or pre-built plugins/extensions that already do this for the situation you're in, to help you avoid the need to write your own code for the solution.

Resources