We are looking to hire an experienced data engineer who will be responsible to build and continuously evolve our next-generation data platform which would process high scale data from RDBMS, APIs, and various other interfacing systems. In this role, you will drive design and development of key components of the platform including data processing pipeline, feature extraction, data modelling and optimal integrations with key internal or external business platforms.
Professional & Technical requirements
Bachelor's degree or higher in computer science/statistics/engineering
2-3 years of experience in building and deploying large scale data processing pipelines using distributed storage platforms like HDFS, S3, NoSQL databases in a production environment
Hands on experience in distributed processing platforms like Hadoop, Spark/PySpark, Hive
Must have worked on end to end big data solution covering data ingestion, data cleansing, ETL, data mart creation and exposing data for consumers
Handle large, complex data sets from different sources and converge them onto a single compute platform
Should have worked on both static as well as real-time data ingestion methodologies for on-premise and cloud-based data storage
Query authoring (Advanced SQL) as well as working familiarity with NoSQL Databases
Proficient in exchanging data via microservices, API gateway and across languages (R/Python)
Working knowledge of one or more scripting languages: R, Python, Scala etc.
Familiarity with Unix Commands and basic work experience on Unix shells and servers
Strong technical and analytical aptitude; Excellent communication skills