The third Y-DATA meetup is fully dedicated to real-world recommender systems at large scale. First talk according to our format is given by the expert from Yandex and for the second guest talk we are happy to host amazing Inbar Naor from Taboola.
Talks will be given in English.
Big Thanks to SimilarWeb for hosting us.
Visit Y-DATA webpage to find out about our data science program at TAU campus: ydata.co.il
Agenda for the meetup:
18:00 - 18:30 Gathering and Mingling, Snacks & Beer
18:30 - 18:45 Opening words from Y-DATA and our host SimilarWeb
18:45 - 19:30 Large-scale recommender system for algorithm-driven content feed (Andrey Zimovnov, Yandex)
19:30 - 20:15 Lessons from building deep learning recommendation systems (Inbar Naor, Taboola)
"Large-scale recommender system for algorithm-driven content feed"
Yandex.Zen is a personal recommendations service created by Yandex that uses machine learning technology to create a feed of content that automatically adapts to reflect the user's active interests. The selection of content is done through the use of advanced machine learning techniques, employing both classical algorithms and deep learning to drive our recommendations engine. Over the course of the talk, Andrey will share his experience working with large-scale recommender system, starting with tips and tricks for user-item matrix factorization, followed by a dive into the topic of neural content representations, at last culminating with insights into implementing a fast nearest-neighbors search in order to find the best items to present to our users.
Speaker info: Andrey Zimovnov, Yandex
Andrei Zimovnov graduated from Moscow State University in 2013 with a computer science degree. Andrei is a senior data scientist at Yandex, where he has been working on various machine learning projects involving computer vision, natural language processing and recommender systems, currently working as part of Yandex.Zen team. Andrei is also a senior lecturer at Higher School of Economics – one of Russia's top universities, where he reads courses on machine learning.
"Lessons from building deep learning recommendation systems"
Deep Learning models have been gaining increasing attention in the recommendation systems community, replacing some of the traditional methods. The sparse nature of the problems and the different input types offer unique challenges for feature engineering and architecture planning, in order to balance between memorization and generalization.
During the past 2 years the algorithms team in Taboola moved all of their algorithms to DL. In this talk Inbar will share the lessons the team learned doing so. She'll talk about building NN with multiple input types (click history, text and pictures); feature engineering in DL; capturing interactions between features; and the way modelling decisions are related to system engineering and research culture.
Speaker info: Inbar Naor, Taboola
Inbar is a Data Scientist at Taboola, where she applies deep learning techniques for content recommendations. In the past, she worked with different types of data, including DNA sequences, neurological recordings, click streams, texts and images. She has an M.Sc. in Computer Science with a focus on machine learning research, and a B.Sc. in Computer Science and Cognitive Science. In her spare time she is the host of Unsupervised – a podcast about data science in Israel; a co-founder and manager of DataHack, a Data Science and Machine Learning Hackathon and the DataTalk meetup.
https://www.meetup.com/PyData-Tel-Aviv/events/258446789/Agenda for the meetup:
18:00 - 18:30 Gathering
18:30 - 18:40 A word from our host, SimilarWeb
18:45 - 19:15 Deep Learning for Named Entity Recognition (Kfir Bar / Basis Technology)
19:15 - 19:45 From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (Sonya Liberman / Outbrain)
19:45 - 20:15 Shaky Ground (truth): Learning with Label Noise (Yaniv Katz / Similarweb)
Deep Learning for Named Entity Recognition (Kfir Bar)
Named Entity Recognition is one of the key tasks in commercial Natural Language Processing applications. Its objective is to identify named entity mentions, such as people, organizations, and locations, in running text. State-of-the-art approaches are purely data-driven, leveraging deep neural networks. In this talk, I will present a few of those works, followed by a description of our own deep NER implementation, based on TensorFlow. We'll look at accuracy, speed, and memory footprint, while comparing some of the best known deep architectures with a basic statistical approach.
From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (Sonya Liberman)
Serving tens of billions of personalized recommendations a day under a latency of 30 milliseconds is a challenge. In this talk, I'll share our algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer, that enables running complex models under difficult scale constraints and shorten the cycle between research and production.
Shaky Ground (truth): Learning with Label Noise (Yaniv Katz)
Labeled data containing incorrect labels, termed label noise, has gained much attention in machine learning research due to its adverse impact on supervised models. This effort has increased in recent years, as the usage of larger data sets, which are more prone to label noise, has become prevalent. To tackle this problem, studies have explored the sensitivity of the learning process to label noise and devised robust methodologies to overcome it. This talk covers basic concepts in label noise research and explores suggested approaches for overcoming its negative effects. It also showcases two practical examples of easy-to-use methods which were tested on training sets contaminated by label noise and by target value noise.
Tech Talk at The Top is back! (Ohhh Yesss)
This time, we are happy to host Rust TLV event in our office.
What's the plan?
~Two exciting talks
18:00 - Gathering, Pizza and Beer
18:30 - Rust - a (Pre)production Story by Yoav Yanilov
19:00 - Developing Redis Modules in Rust by Gavrie Philipson
Rust - a (Pre)production Story by Yoav Yanilov
In this talk, I'll walk you through my first experience with Rust, a drop-in replacement for an IO-bound and CPU-intensive production service, performing parallel in-memory aggregations on high-volume compressed text.
We'll see how the Rust implementation reduced memory footprint and improved CPU utilization. We'll also highlight some of the popular 3rd-party crates I used in the process (tokio, hyper, and rusoto), and examine a quirk or two.
Yoav is a software developer at SimilarWeb, working on the B2B platform backend, where he likes to tackle infrastructure and design challenges.
Developing Redis Modules in Rust by Gavrie Philipson
Redis is well known and loved in the open source world. It has long been possible to write modules for Redis, but so far this has been done mostly in C. This is due to Redis itself being written in C, and its API being in the form of a C header file.
It's not easy to write code that is both memory-safe and performant, which is what Rust excels at. This makes it a natural match for writing modules for Redis: They running in the same memory space as Redis, and a bad memory access can crash the whole process.
Writing bindings between Rust and C code is not hard. Our main challenge has been (and still is) to come up with a clean, safe and idiomatic Rust API that hides all the ugly stuff and allows easily writing modules.
Gavrie's current role is Cluster Architect at Redis Labs. He has been hacking away for far too many years at a variety of startup companies. On a never-ending quest for the ultimate programming language, recent favorites have included Kotlin, Go and Python. Since starting with Rust he hasn't looked back.