We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.

Job posting has expired

#alert
Back to search results

Data Curation Engineer

Spectraforce Technologies
United States, Illinois, North Chicago
April 18, 2024
Job Title: Data Curation Engineer

Location: Lake County, IL - (Eastern time zone preferred) Hybrid if near LC, Remote otherwise

Duration: 12 months

As a Data Analyst in Genomics Research Center's Bioinformatics Engineering team, you will be responsible for developing and running workflows to standardize and load project datasets in a centralized environment. You will work closely with Bioinformatics Engineering, as well as therapeutic area facing Bioinformatics research scientists to leverage common data models, and support GRC Bioinformaticians' needs for loading and querying. Your expertise in PostgreSQL for database management and Python and R for scripting and automation will be crucial in developing and maintaining ETL processes to ensure data quality and integrity.

What are the top 3-5 skills, experience or education required for this position:

  • PostgreSQL
  • Bioinformatics datasets (BulkRNAseq, CRISPR)
  • Python/R
  • Common data models
  • Code management and documentation


Responsibilities:

  • Develop and maintain a functional understanding of the GRC common data models, loading processes, and requirements, and perform accurate and efficient loading of new and historical datasets into the GRC's Omics Data Server.
  • Collaborate with Bioinformatics Engineers to develop and implement additional data loading workflows.
  • Partner with Bioinformatics research scientists to identify, process, and load project data into the common data models.
  • Build and execute ETL processes to integrate non-GRC generated high-value datasets into the common data models.
  • Keep thorough documentation for tracking datasets and loading tasks.
  • Ensure Reproducibility and facilitate collaboration with team members by documenting and versioning code with git.


Qualifications:

  • Bachelor's degree in computer science, Bioinformatics, or a related field +3 years of experience.
  • Experience with building and running workflows for RDMS data loading and ETL processes.
  • Proficient in PostgreSQL (or equivalent) and ability to write complex queries for data extraction and analysis.
  • Strong programming skills in Python for scripting and automation. Additional experience with R is preferred.
  • Familiarity with genomic data formats and databases commonly used in Bioinformatics research.
  • Knowledge of data modeling concepts and implementing common data models in a relational database.
  • Familiarity with data cleaning, normalization, and quality control processes.
  • Excellent communication skills and ability to collaborate with researchers and stakeholders.

(web-5bb4b78774-f7f6c)