This job was posted over 90 days ago and may no longer be available.
The ideal candidate is an experienced data engineer. You will help us develop and maintain our data pipelines, built with Python, Standard SQL, pandas, and Airflow within Google Cloud Platform. We are in a transitional phase of refactoring our legacy Python data transformation scripts into iterable Airflow DAGs and developing CI/CD processes around these data transformations. If that sounds exciting to you, you’ll love this job. You will be expected to build scalable data ingress and egress pipelines across data storage products, deploy new ETL pipelines and diagnose, troubleshoot and improve existing data architecture. Working in a fast-paced, flexible, start-up environment; we welcome your adaptability, curiosity, passion, grit, and creativity to contribute to our cutting-edge research of this growing, fascinating industry.
- Build and maintain ETL processes with our stack: Airflow, Standard SQL, pandas, spaCy, and Google Cloud.
- Write efficient, scalable code to munge, clean, and derive intelligence from our dataPage Break
Qualifications & Skills:
- 1-3 years experience in a data-oriented Python role, including use of:
- Google Cloud Platform (GCE, GBQ, Cloud Composer, GKE)
- CI/CD like: GitHub Actions or CircleCI
- Fluency in the core tenants of the Python data science stack: SQL, pandas, scikit-learn, etc.
- Familiarity with modern NLP systems and processes, ideally spaCy
- Demonstrated ability to collaborate effectively with non-technical stakeholders
- Experience scaling data processes with Kubernetes
- Experience with survey and/or social media data
- Experience preparing data for one or more interactive data visualization tools like PowerBI or Tableau
- Choose your own laptop
- Health Insurance