The third Y-DATA meetup is fully dedicated to real-world recommender systems at large scale. First talk according to our format is given by the expert from Yandex and for the second guest talk we are happy to host amazing Inbar Naor from Taboola.
Talks will be given in English.
Big Thanks to SimilarWeb for hosting us.
Visit Y-DATA webpage to find out about our data science program at TAU campus: ydata.co.il
Agenda for the meetup:
18:00 - 18:30 Gathering and Mingling, Snacks & Beer
18:30 - 18:45 Opening words from Y-DATA and our host SimilarWeb
18:45 - 19:30 Large-scale recommender system for algorithm-driven content feed (Andrey Zimovnov, Yandex)
19:30 - 20:15 Lessons from building deep learning recommendation systems (Inbar Naor, Taboola)
"Large-scale recommender system for algorithm-driven content feed"
Yandex.Zen is a personal recommendations service created by Yandex that uses machine learning technology to create a feed of content that automatically adapts to reflect the user's active interests. The selection of content is done through the use of advanced machine learning techniques, employing both classical algorithms and deep learning to drive our recommendations engine. Over the course of the talk, Andrey will share his experience working with large-scale recommender system, starting with tips and tricks for user-item matrix factorization, followed by a dive into the topic of neural content representations, at last culminating with insights into implementing a fast nearest-neighbors search in order to find the best items to present to our users.
Speaker info: Andrey Zimovnov, Yandex
Andrei Zimovnov graduated from Moscow State University in 2013 with a computer science degree. Andrei is a senior data scientist at Yandex, where he has been working on various machine learning projects involving computer vision, natural language processing and recommender systems, currently working as part of Yandex.Zen team. Andrei is also a senior lecturer at Higher School of Economics – one of Russia's top universities, where he reads courses on machine learning.
"Lessons from building deep learning recommendation systems"
Deep Learning models have been gaining increasing attention in the recommendation systems community, replacing some of the traditional methods. The sparse nature of the problems and the different input types offer unique challenges for feature engineering and architecture planning, in order to balance between memorization and generalization.
During the past 2 years the algorithms team in Taboola moved all of their algorithms to DL. In this talk Inbar will share the lessons the team learned doing so. She'll talk about building NN with multiple input types (click history, text and pictures); feature engineering in DL; capturing interactions between features; and the way modelling decisions are related to system engineering and research culture.
Speaker info: Inbar Naor, Taboola
Inbar is a Data Scientist at Taboola, where she applies deep learning techniques for content recommendations. In the past, she worked with different types of data, including DNA sequences, neurological recordings, click streams, texts and images. She has an M.Sc. in Computer Science with a focus on machine learning research, and a B.Sc. in Computer Science and Cognitive Science. In her spare time she is the host of Unsupervised – a podcast about data science in Israel; a co-founder and manager of DataHack, a Data Science and Machine Learning Hackathon and the DataTalk meetup.
https://www.meetup.com/PyData-Tel-Aviv/events/258446789/Agenda for the meetup:
18:00 - 18:30 Gathering
18:30 - 18:40 A word from our host, SimilarWeb
18:45 - 19:15 Deep Learning for Named Entity Recognition (Kfir Bar / Basis Technology)
19:15 - 19:45 From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (Sonya Liberman / Outbrain)
19:45 - 20:15 Shaky Ground (truth): Learning with Label Noise (Yaniv Katz / Similarweb)
Deep Learning for Named Entity Recognition (Kfir Bar)
Named Entity Recognition is one of the key tasks in commercial Natural Language Processing applications. Its objective is to identify named entity mentions, such as people, organizations, and locations, in running text. State-of-the-art approaches are purely data-driven, leveraging deep neural networks. In this talk, I will present a few of those works, followed by a description of our own deep NER implementation, based on TensorFlow. We'll look at accuracy, speed, and memory footprint, while comparing some of the best known deep architectures with a basic statistical approach.
From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (Sonya Liberman)
Serving tens of billions of personalized recommendations a day under a latency of 30 milliseconds is a challenge. In this talk, I'll share our algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer, that enables running complex models under difficult scale constraints and shorten the cycle between research and production.
Shaky Ground (truth): Learning with Label Noise (Yaniv Katz)
Labeled data containing incorrect labels, termed label noise, has gained much attention in machine learning research due to its adverse impact on supervised models. This effort has increased in recent years, as the usage of larger data sets, which are more prone to label noise, has become prevalent. To tackle this problem, studies have explored the sensitivity of the learning process to label noise and devised robust methodologies to overcome it. This talk covers basic concepts in label noise research and explores suggested approaches for overcoming its negative effects. It also showcases two practical examples of easy-to-use methods which were tested on training sets contaminated by label noise and by target value noise.
Tech Talk at The Top is back! (Ohhh Yesss)
This time, we are happy to host Rust TLV event in our office.
What's the plan?
~Two exciting talks
18:00 - Gathering, Pizza and Beer
18:30 - Rust - a (Pre)production Story by Yoav Yanilov
19:00 - Developing Redis Modules in Rust by Gavrie Philipson
Rust - a (Pre)production Story by Yoav Yanilov
In this talk, I'll walk you through my first experience with Rust, a drop-in replacement for an IO-bound and CPU-intensive production service, performing parallel in-memory aggregations on high-volume compressed text.
We'll see how the Rust implementation reduced memory footprint and improved CPU utilization. We'll also highlight some of the popular 3rd-party crates I used in the process (tokio, hyper, and rusoto), and examine a quirk or two.
Yoav is a software developer at SimilarWeb, working on the B2B platform backend, where he likes to tackle infrastructure and design challenges.
Developing Redis Modules in Rust by Gavrie Philipson
Redis is well known and loved in the open source world. It has long been possible to write modules for Redis, but so far this has been done mostly in C. This is due to Redis itself being written in C, and its API being in the form of a C header file.
It's not easy to write code that is both memory-safe and performant, which is what Rust excels at. This makes it a natural match for writing modules for Redis: They running in the same memory space as Redis, and a bad memory access can crash the whole process.
Writing bindings between Rust and C code is not hard. Our main challenge has been (and still is) to come up with a clean, safe and idiomatic Rust API that hides all the ugly stuff and allows easily writing modules.
Gavrie's current role is Cluster Architect at Redis Labs. He has been hacking away for far too many years at a variety of startup companies. On a never-ending quest for the ultimate programming language, recent favorites have included Kotlin, Go and Python. Since starting with Rust he hasn't looked back.
20-30 mins short lecture about career development followed by workshops ("round tables")
Women in Hi-Tech were creating with the purpose of advancing and empowering professional women in the high tech community in Israel.
Women in Hi-Tech was born in order to fill the gap by utilizing the "sisterhood" to support, empower and achieve personal growth, by the community for the community.
Today Women in High- tech community include of 7000+ professional women in different roles and fields and promote sharing ideas and open conversation about pressing subjects like career development, leadership in the boys club, professional challenges and juggling work-life balance from a women point of view.
Part of being a leader means being willing to continuously grow and learn. Knowledge can come from many different sources: lecture videos, books, blog posts from all over the world and more. The most valuable source of information can be as simple as the occasional small talk at the conference where you meet a group of people who encountered the same challenges, during a meetup event, learning how to apply an architecture/tool/process in a much better way from a person who experimented with it from a completely different angle. Those talks happen spontaneously, but If you place great developers in the same location great discussions will just happen.
If you are passionate about continuously improve your BI Development skills and knowledge, you are welcome to join the BI Development Leaders Forum kickoff. This forum is all about sharing our challenges and having a fruitful discussion. In order to do that, this forum will be limited to 20 people so we can sit in a roundtable and talk.
If you want to lead one of the topics discussions or have an idea for any other topics let us know.
Enum Converter by Nitzan Ohana
BusBoardIL by Guy Sheffer
RaspberryPi FullPageOS by Guy Sheffer
After your RSVP please fill this form - https://docs.google.com/forms/d/e/1FAIpQLSdkNS0OSzN1bI6-JpKo_THcv9tvWnFi5yumP_cHaPpZXx-79g/viewform
Coding Meetups provide a great way for developers to interact, network and learn from each other, while contributing to the community.
It's essentially 3~ hours of doing good by giving back to the open source community and writing code together to fix bugs, add new features, etc...
This event is FOR ANYONE who likes to code. Open source by definition means getting help from people like you!
There is room for only 50 people so if you RSVP make sure you do arrive since you are taking someone else's place.
18:00- 18:20 - Make sure you are in the right place and start to mingle.
18:20- 18:30 - Presenting all the projects.
18:30- 20:45 - Let the coding begin!
20:45 - 21:00 - Meeting summary
Join our tech talk about the next Big Data Revolution! We are excited to invite you to hear how to reduce time to insight, reduce costs, improve performance, and learn how big organizations scout and adopt new technology.
Networking, beers, and pizza
/\\ A journey of reducing queries performance, from minutes to seconds /\\
By Ido Senesh - Senior Software Engineer @SimilarWeb
Ido will share his experience, difficulties, and thoughts regarding working on a POC at Similarweb, aimed to reduce queries performance, using Spark to re-shape the data, and Varada DB for fast query execution.
/\\ A DBA, an architect, and an analyst walk into a bar, Varada in-line indexing /\\
Varada enables business agility by solving the constant trade-offs data teams are facing combining innovation in storage and computing solutions. Varada is an SQL data analytics tier that runs in your account and on any dataset in your data lake. Zoom in on a dataset, and Varada will index it on the fly. Query any dimension from an on-demand unique SSD based distributed cluster architecture that self-adapts to the data schema and the usage workload, removing the need to prepare and model.
The purpose of this talk is to describe how in-line indexing combined with a synchronized materialized view on an all-flash array can solve many of the data preparation options.
About the speakers:
Ori Reshef: Throughout Ori’s career, his focus has been on leveraging data, particularly customer-generated, brand-generated, and operational data, to build solutions that improve a brand’s customer lifetime value (LTV) through a better understanding of customer needs and increased sales, retention and efficiency. Ori is excited to combine this expertise with Varada’s groundbreaking technology and an innovative approach to data.
David Krakov: David is the co-founder and CTO of Varada. With product building as his core skill, David has led teams and designed software of almost every kind, from autonomous firmware devices to distributed high-performance storage clusters. At Varada, he uses his data skills and memes to help analysts be agile and data architects to be loved.
David holds a number of patents and an M.Sc in computer science.
Panel - The rise of the Data Platform team
Eran Vanounou Varada CEO (Ex Global CTO & GM Liveperson)
Ziv Peled - VP Global Client Services & Product, AppsFlyer
Barak Gitsis - Head of Bigdata Engineering - SimilarWeb
Ori Reshef - VP Product Varada - Moderator.
The meetup will take place at the SimilarWeb Offices in the Azrieli Sarona building, on the 42nd floor. If you are coming by car, there are multiple parking options in the area (e.g. the Millennium Parking Lot, which costs 20 NIS).