Design and implement distributed data processing pipelines using Spark, Hive, Sqoop, Python, and other tools and languages prevalent in the Hadoop ecosystem. Ability to design and implement end to end solution.
Build utilities, user defined functions, and frameworks to better enable data flow patterns.
Research, evaluate and utilize new technologies/tools/frameworks centered around Hadoop and other elements in the Big Data space.
Define and build data acquisitions and consumption strategies
Build and incorporate automated unit tests, participate in integration testing efforts.
Work with teams to resolving operational & performance issues
Work with architecture/engineering leads and other teams to ensure quality solutions are implements, and engineering best practices are defined and adhered to.
MS/BS degree in a computer science field or related discipline
6+ years experience in large-scale software development
2+ years experience in Hadoop
Strong Java programming, Python, shell scripting, and SQL
Strong development skills around Hadoop, Spark, MapReduce, Hive, and Pig
Strong understanding of Hadoop internals
Good understanding of file formats including JSON, Parquet, Avro, and others
Experience with databases like Oracle
Experience with performance/scalability tuning, algorithms and computational complexity
Experience (at least familiarity) with data warehousing, dimensional modeling and ETL development
Ability to understand ERDs and relational database schemas
Proven ability to work cross functional teams to deliver appropriate resolution
DATA SCIENCE TECHNOLOGIES LLC is an equal opportunity employer inclusive of female, minority, disability and veterans, (M/F/D/V). Hiring, promotion, transfer, compensation, benefits, discipline, termination and all other employment decisions are made without regard to race, color, religion, sex, sexual orientation, gender identity, age, disability, national origin, citizenship/immigration status, veteran status or any other protected status. DATA SCIENCE TECHNOLOGIES LLC will not make any posting or employment decision that does not comply with applicable laws relating to labor and employment, equal opportunity, employment eligibility requirements or related matters. Nor will DATA SCIENCE TECHNOLOGIES LLC require in a posting or otherwise U.S. citizenship or lawful permanent residency in the U.S. as a condition of employment except as necessary to comply with law, regulation, executive order, or federal, state, or local government contract