Skip to content
This repository has been archived by the owner on May 3, 2023. It is now read-only.

Latest commit

 

History

History
34 lines (22 loc) · 2.01 KB

river.md

File metadata and controls

34 lines (22 loc) · 2.01 KB

Name: Harel Ben Attia

Talk Title: River - A data flow management infrastructure

Abstract

Starting from the algorithms which are at Outbrain's core, and ending with Internal and Customer reporting, the Outbrain backend is a data processing monster. As the company grows, our data processing needs grow as well, leading to very complex dependencies between the various processes. These dependencies form a growing challange, both from an operational viewpoint and from a development viewpoint. The Outbrain River infrastructure has been created in order provide a solution for this challenge.

Outbrain River provides the following major features:

  • Declarative job definitions
  • Event-driven dependency management
  • Decentralized development of data flows
  • Ops-level managability
  • Out-of-the-box support for JDBC and Hive/Hadoop, easily extensible to any other unit-of-work
  • A clear roadmap for distributed processing and high availability

Outbrain is working on open sourcing River.

Biography

I am a senior software engineer with 12 years of experience in the field. I've been working for Outbrain in the Data Infrastructure team for the last year, and previously with vmware and b-hive Networks. With more than 7 years of experience in large scale monitoring, lots of OS and networking knowledge, and working on big data infrastructures, I am an "all around" engineer. Being part of the world of software from the age of the Sinclair Spectrum I had at age 10, and up to working with Hadoop and Storm clusters on live systems in the present, I enjoy both the macro and the micro of the software engineering world.

I am the creator of q - A Linux tool which merges the world of linux and the world of databases, allowing to treat text files as databases.

Specialties: Monitoring, Performance Analysis, Large scale designs and topologies, Python and Java, Linux, DevOps

LinkedIn

Twitter @harelba

q

Github