|
Data Engineer - Research and Clinical Data - Fairfax Virginia
Company: Vibrent Health Location: Fairfax, Virginia
Posted On: 02/02/2025
Overview:Vibrent is a high-performance organization committed to driving innovation and delivering exceptional value to clinical research, biotech, and academic medical centers. We thrive on a culture of resilience, continuous improvement, and collaboration. Our mission is to grow the business while delighting our customers, and we embrace a 'one team' mindset to achieve shared success.As a Research Data Engineer at Vibrent Health, you will bridge the gap between research methodology and engineering implementation. You will ensure that the data collected via apps, wearable devices, EHR integrations, and other digital health sources are transformed into research-ready datasets. Your work will empower researchers and data scientists to derive meaningful insights that improve health outcomes at scale.This role involves working with large, complex datasets, ensuring data quality and compliance with regulatory standards, and collaborating with cross-functional teams to derive actionable insights from healthcare data. You will be responsible for designing, implementing, and maintaining data pipelines and infrastructure to support our customers' clinical and medical research initiatives.Responsibilities for this role will include, but not be limited to:Data Management & Engineering: - Design, maintain, and optimize data models to create robust, research-ready datasets.
- Build required infrastructure for optimal data extraction, transformation and loading of data using cloud technologies like AWS, Azure etc.
- Design, develop, and optimize data pipelines to ingest, process, and store clinical and medical research data from various sources (e.g., EHRs, clinical trials, wearable devices, genomic data).
- Ensure efficient ETL/ELT processes for transforming raw data into structured, analysis-ready datasets.
- Build and refine ETL processes using SQL and Python to transform raw health data into structured formats suitable for analysis.
- Develop and maintain data warehouses, databases, and data lakes tailored to research needs.
- Collaborate with engineering teams to ensure data pipelines are reliable, scalable, and performant.Collaboration & Support:
- Collaborate with clinical researchers, data scientists, and IT teams to understand data requirements and develop solutions that support their research goals.
- Coordinate with downstream users to ensure that outputs meet requirements of end users.
- Translate research questions into optimized queries, aggregations, and summaries that facilitate quick, accurate analysis.
- Provide technical support to research teams by enabling efficient data access and analysis.
- Work with regulatory and compliance teams to ensure adherence to industry regulations (e.g., HIPAA, GDPR, FDA 21 CFR Part 11).
- Participate in code reviews, agile sprints, and continuous improvement initiatives.Data Quality & Governance:
- Implement data validation, cleaning, and monitoring processes to ensure data integrity and accuracy.
- Manage and maintain pipelines and troubleshoot data in data lake or warehouse.
- Establish and enforce data governance policies, including metadata management and data lineage tracking.
- Ensure proper documentation of data workflows, schemas, and transformations.
- Create and maintain comprehensive data dictionaries, metadata standards, and codebooks to enhance data transparency and reproducibility.
- Conduct periodic data quality checks and audits to ensure compliance with research standards and regulatory requirements.Required Education and Experience:
- Bachelor's or Master's degree in Computer Science, Data Engineering, Biomedical Informatics, or a related field.
- 5+ years of experience in data engineering, preferably in healthcare, clinical research, or life sciences.
- Proven track record of handling health/clinical datasets and supporting research analysis.
- Experience creating ELT and ETL to ingest data into data warehouse and data lakes.
- Experience visualizing large datasets with BI tools and other data visualization methods.
- Experience working with genomic data, imaging data, and wearable device data.
- Experience of data modeling, database design, and data governance.
- Experience deploying data pipelines in the cloud.
- Experience with unstructured data processing and transformation.
- Experience developing and maintaining data pipelines for large amounts of data efficiently.Required Skills and Knowledge:
- Knowledgeable of research processes and language in biological or medical fields and be able to effectively communicate and support researchers in these domains.
- Strong understanding of end-to-end processes for data collection, extraction and analysis needs by end users in research.
- Strong ability to develop technical specifications based on communication from stakeholders.
- Knowledge of statistical analysis techniques and tools used in medical research.
- Expert level proficiency with Python/R; experienced in creating custom functions.
- Strong SQL and database design skills (PostgreSQL, MySQL, SQL Server, NoSQL databases).
- Proficiency in data processing frameworks such as Apache Spark, Hadoop, or cloud-based equivalents.
- Strong proficiency in utilizing cloud platforms (AWS, Azure, or GCP) and relevant services (Redshift, BigQuery, Snowflake).
- Knowledge of healthcare data standards (FHIR, HL7, CDISC, OMOP) and clinical terminologies (LOINC, SNOMED, ICD).
- Familiarity with compliance frameworks such as HIPAA, GDPR, or GxP.What you bring to the role:
|
|