Check out the Issue Explorer
Looking to fund some work? You can submit a new Funded Issue here.
The scope of this task is:
- Rewrite [export_dag.py](https://github.com/blockchain-etl/ethereum-etl-airflow/blob/master/dags/export_dag.py) to use PythonOperator instead of BashOperator (see Notes for context). Easiest way is to use functions from ethereumetl.cli package, e.g. to export blocks and transactions call `ethereumetl.cli.export_block_and_transactions(...)`.
- Update environment variables if necessary, e.g. ETHEREUMETL_REPO_BRANCH is not needed etc.
- Add logging (for BashOperator `set -o xtrace && set -o pipefail &&` made all steps to be logged, for PythonOperator this has to be done explicitly).
- Test export_dag.py in Cloud Composer. Test load_dag.py in Cloud Composer (it can break due to Python version upgrade).
- Update README with instructions on how to configure and deploy the dags. The instructions should include high-level steps for deployment.
- The reason export_dag.py uses BashOperator is Google Cloud Composer didn't support Python 3, which is a dependency for ethereum-etl. Python 3 is supported today: https://cloud.google.com/composer/docs/release-notes#october_2_2018_composer-120-airflow-190.
- ethereum-etl package can be added as a dependency via Cloud Composer console https://cloud.google.com/composer/docs/how-to/using/installing-python-dependencies.
- You have to make sure the temporary files are cleaned up after every task (for BashOperator Airflow creates a temp directory for every task and cleans it at the end).