Schedule

Get ready for amazing talks
Expand All +
  • Auditorium Precision


  • Stacking model is a proven strategy to achieve high accuracy in machine learning solutions. At first glance, the idea behind stacking looks simple, but there are many pitfalls that can lead a wining solution to become a complete disaster when predicting new data. Techniques, tips and tricks on how to stack models, architectures, cross-validation, feature engineering, training algorithms, hyperparameter tuning and previous Kaggle top solutions will be presented and explored.
    Machine Learning

  • Comtravo's data-science team assists travel agents by automating certain steps in the offer and booking process given a certain travel request. One of these automation steps consists in automatically extracting all the necessary and relevant elements of a booking request sent via an email. To handle email bookings we developed a complex NLP pipeline, which analyses incoming booking requests by identifying and classifying all the necessary elements in the request in order to automatically generate offers that can fulfill a booking request. The NLP pipeline is composed of several different modules. First, a custom tokenization processed is applied and the tokens are stored in a memory-efficient representation of the message; then, we perform a simple information extraction task by recognizing a custom set of named-entities from the message, next we apply several message level classifiers identifying wether the message is in fact a booking request and what type of travel items are present (e.g.: Hotel, Flight, Train, etc.); knowing the scope of the message and having named-entities we apply a slot labeler module which assigns each named-entity to a particular slot (e.g.: destination, origin, departure time, etc.). Finally, we have a semantification process, e.g.: airports are assigned IATA codes, time expressions are grounded to a date or time range, train stations are mapped to IBNR codes. In my presentation I will describe the developed pipeline, giving an overview of our NLP stack and point out some of the challenges.

  • In this talk, I will present a tutorial on translation quality estimation, a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways. In particular, I will present Unbabel's quality estimation system, where we achieve remarkable improvements by exploiting synergies between the related tasks of word-level quality estimation and automatic post-editing. First, we stack a new, carefully engineered, neural model into a rich feature-based word-level quality estimation system. Then, we use the output of an automatic post-editing system as an extra feature, obtaining a new state of the art for word-level and sentence-level quality estimation. I will end with some thoughts about future work in this area.
    Deep Learning
    Machine Translation
    NLP

  • Time series problems provide some of the most fascinating challenges one encounters in our day-to-day jobs as data scientists. Traditionally, we have approached them through the eyes of moving-average based techniques, such as ARIMA models. However, we can also harness the power of Machine Learning to solve for non-linearities within time series which were previously impossible to diagnose and/or address. In this talk I'll discuss how a good data exploration analysis and virtually any regression Machine Learning model can easily supersede canonical methods.
    Big Data
    Causality
    Statistics
    Time Series

  • In the beginning of 2018, the Justice Department of the United States of America charge more than a dozen Russian citizens for interference in the 2016 US elections. Some months later, two professors at Clemson University are able to fetch around 3 million tweets associated with Russian trolls and make it available to the worldwide public with the help of FiveThirtyEight. What did they tweet about? Are we able to understand how they operated both within the democratic and republican publics? Armed with one of the biggest datasets around twitter trolls so far and with the help of topic analysis, we’re going to try to understand how these trolls behaved and evolved through time.
    NLP
    Topic Analysis

  • Data scientist’ has been frequently labeled as the ‘sexiest job of the 21st century’, following a 2012 article in the Harvard Business Review. This is one of the signs of the increasing popularity of data science and artificial intelligence (terms which are often and incorrectly used interchangeably). Two potential side effects of this popularity are a second AI winter and the self-destruction of the data science field. In this talk I will address the latter, focusing on automated machine learning developments, which may ultimately lead to fully autonomous machine learning systems development and, thus, the end of the data science profession.
    Auto ML
    Machine Learning
    Meta Learning

  • Building a good machine learning model end-to-end requires creativity but it also includes several tedious, repetitive, time consuming, or error prune steps. Several flavors of AutoML -- frameworks to automate building machine learning models -- have been proposed but they mostly focus on feature generation only and frequently ignore meta-data knowledge and domain knowledge. In this talk we present an end-to-end data science pipeline automation that includes feature generation, feature selection, sample building, model creation, hyperband parameter tuning and even takes advantage of domain and meta-data knowledge. We evaluate several algorithmic choices and show order-of-magnitude benefits vs human generated models.
    Auto ML
    Machine Learning

  • What do computers, cells, and brains have in common? Computers are electronic devices designed by humans; cells are biological entities crafted by evolution; brains are the containers and creators of our minds. But all are, in one way or another, information-processing devices, equivalent to each other, in accordance with the Church-Turing thesis. Recent advances in science and technology, particularly in the fields of Artificial Intelligence, Machine Learning and Neurosciences, could enable us to create digital minds, sometime in the future. These digital minds would reproduce in a computer the intelligent behavior of the human brain, either by direct emulation or by some other, synthetic, approach. If digital minds come into existence, what will be the social, legal, and ethical implications? Will digital minds be our partners, or our rivals?
    Artificial Intelligence
    Machine Learning


  • Auditorium Recall


  • As part of the performance management cycle at Booking.com, employees participate in periodic performance appraisals, where the performance scores for individual employees are calibrated between groups of managers on a department level to promote enhanced transparency and consistency. As a data-driven company, we are using various analytical techniques (from proportions to machine learning) and study design to understand how we can best: 1. Improve the efficiency of these calibration meetings; 2. Create standard measures for our desired results; and 3. Optimise the perceived "fairness" of the overall performance appraisal process.
    Machine Learning

  • Current clinical practices and guidelines rely on population based trials, from where an average patient is inferred. Although effective on a population basis, it does not guarantee an optimal treatment for each individual. Advances in data processing and AI algorithms provide a unique opportunity to make healthcare personalized. Not only clinical practice, but clinical research has the potential to leap forward with data-based hypothesis testing and consequent incorporation in clinical practice. However, incorporating AI into clinical practice has its own specific set of challenges and dangers. Understanding how to uniquely mix and interface human and machine intelligence is key towards addressing it. In this presentation, an overview of the above challenges, opportunities and specific solutions will be discussed.
    Artificial Intelligence
    Machine Learning

  • There was a time, not long ago, when Data Science flourished and Big Data was everything. Well, no one really knew what it was (just like the teenage sex joke), but you had to have it! Such behaviour created a series of Anti Patterns for Data Science teams: from the cotton candy-powered Data Unicorn™ to the Second-grade Data Plumber™, going through the multi-title Jack of all (Data) Trades™ to the fabulous Trophy Data Scientist. This talk will take you on a journey that hopefully leads to (not another buzzworded concept) DataOps. And with such great knowledge, comes great... technology! We'll also cover the S.M.A.C.K. Stack, because honestly, you wanna do things The Right Way™!
    Big Data
    DevOps

  • In 2017, an average of 200,000 new malware samples have been captured, each day. This value increased by 328%. Cybercriminals have stepped up their game, and they already use advanced techniques to penetrate organisation defences and steal critical data causing millions in losses. Organisations' cyber defence departments have to reinvent their defence mechanisms to keep up with the new threats whereas regular SIEM are not sufficient. Adding more bodies to defensive efforts no longer improves defence due to a diminished returns from increased human labour and manual defensive tactics. This evolving landscape of threats demands innovation. A plethora of new defensive tactics are mandatory to advance a defensive posture in a challenging and impactful cyber warzone. It is time to bring AI to the fight. During this talk we will explain the journey to expand and enhance our current defences. We have designed and implemented a next-generation data analytics platform which leverages the current SIEM by extending its limited storage with a serverless data lake and by strength the alarm system with AI capabilities. On this talk, we will explore the solution high-level architecture and how we leverage AWS serverless components to build a lake that collects per day 6 TB of new log data from 3 continents with a rate of 60000 events per second. Hence, decoupling storage (S3) from processing (Spark) allows to scale each component independently. With this architecture, security analysts can perform their forensic analysis over years of historical data without the system compromises query performance or availability. On the other hand, AWS SageMaker cloud instance armed with GPUs empower our Data Scientists to train models on years of historical data but taking only a few training hours. As outcome, models deployed to production not only reduced the vast amount of false positives compared to the rule-based SIEM alerting system but also managed to identify new threats.
    Artificial Intelligence
    Big Data
    Cyber Security
    Serverless

  • In this talk, I will describe our work on character-based models for natural language processing. This talk will be divided in two parts. In the first part, I will describe our previous work on character-based word representation, where the syntax and semantic of word level units are leanly solely from the character-level. We show that these models can automatically learn morphologically oriented features that would be difficult to discover manually as features and show significant improvements in many natural language processing tasks. In the second part of the talk, I shall describe our ongoing work on character-based language generation and word segmentation/discovery. We show that the challenges involved in building character-level generation models, and then describe our incremental research contributions towards models that can automatically generate language but also jointly learn word segments.
    Deep Learning
    NLP