September 19th: Recent Advances in Natural Language Processing With (Some) Theory & Applications

Click here to RSVP

Presented by Joachim Rahmfeld, Ph.D

Lecturer at UC Berkeley

Today’s vast volumes of language-based, unstructured, electronic data found in various form factors such as blogs, email, social media, or web sites, promises competitive advantages for those who can extract business-relevant information most effectively and efficiently. Natural Language Processing (NLP) holds the key in leveraging this type of data in order to answer and respond to important questions such as, “What is the perception of my company in the Twittersphere?”, “Are these reviews largely positive or negative?”, “What is the translation of this in Spanish?”, “This document has the answer to my question somewhere… but where?”, “Are these products similar?”

Particularly over the last five years, revolutionary advances in NLP-related Deep Learning algorithms combined with dramatically increased compute resources have created unprecedented opportunities to not only answer natural language questions with greater accuracy and speed, but often also while requiring significantly less training data.

In this session, we will provide an overview of these advances and how you can make use of them. A particular focus will be transfer learning where large, pretrained models, such as Google’s BERT model, can be leveraged for a broad set of NLP use cases delivering excellent results, and enabling you to answer questions that were not possible to answer just a few years ago.

So this is a great time to learn about recent NLP developments and go through explicit examples.

Continue reading “September 19th: Recent Advances in Natural Language Processing With (Some) Theory & Applications”

September 4th Happy Hour: Making Data Lakes more Reliable with Apache Spark and Delta Lake

Presented by Tathagata Das, Databricks

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs. In this talk, we will cover • All technical aspects of Delta Features • What’s coming • How to get started using it • How to contribute

Continue reading “September 4th Happy Hour: Making Data Lakes more Reliable with Apache Spark and Delta Lake”

July 2019: Data Management From A Penetration Tester’s Perspective

Click here to download the presentation slides

Presented by John Stephens, CISSP

Managing Partner, Luminant Digital Security

Data Management from a Penetration Tester’s Perspective – Zero Trust and Compliance

It’s pretty much a daily occurrence where we hear some vulnerability or hack or about this or that breach, resulting in information disclosure on what seems increasingly to be hundreds of thousands or millions of records. And if that wasn’t enough, it’s become a regular occurrence where we hear about how some city opted to pay hundreds of thousands of dollars in ransom. Now, we could spend all day talking about all the things that went wrong to get to this point. That could include security patching, application development, system configuration, etc. One item that’s often overlooked is Data Management and its impact on security. In nearly every hack or breach, the ultimate goal of the attacker is to get to the data so it can be monetized. So how you manage the data is critical.

This presentation is designed to give you insight into how attacks are executed, the tools and tricks the attackers use, and how data management can play a role in minimizing the damage when a breach occurs, or perhaps stopping it altogether. This effort can be significantly enhanced by adopting a zero trust approach with data access and backups. It can be significantly hindered by checklist “compliance” efforts that are not grounded in secure best practices. We’ll talk about these items based on observations and experience during actual Penetration Tests, so you can hear firsthand how data management can play a role in securing your data.

Continue reading “July 2019: Data Management From A Penetration Tester’s Perspective”

May 2019: Solving Common Data Problems, Methods & Tactics

Click here to view and download the presentation slides

Presented by Neil Barton

CTO, WhereScape

Neil and his colleagues at WhereScape Consulting are an experienced set of chaps that will build (or assist with) data warehouse projects. For this months presentation we have asked Neil to come and talk to us about common data problems he encounters with large data migrations and how he and the team at Wherescape solve them as well as his thoughts around identifying and normalizing master data when combining multiple legacy systems We intend this to be an interactive session so bring your questions or data problems you are struggling with and let’s start the discussion.

Continue reading “May 2019: Solving Common Data Problems, Methods & Tactics”

April 2019: Data Lakes & AutoML

Click here to download Jason’s slides

Presented by Jason Robey

Cloud Solution Architect, Data & AI, Microsoft

This talk is a two-parter. Microsoft Cloud Solution Architect Jason Robey joins us on April 18th to share two areas of great interest for our members.

The first area of discussion will be on the establishment of a data lake: governance and best practices and how to keep your lake ecosystem healthy and vibrant. Have you swam in a data lake before? Has your company? No matter what your experience level this portion of the talk should offer some great tips.

The second half of Jason’s talk will be on automated machine learning (AutoML). Automating the construction and tuning of machine learning models has long been a goal for many analytics teams. Existing automated machine learning (AutoML) techniques have been remarkably successful in identifying good parameters for a given model, sometimes even outperforming humans. AutoML iterates over many combinations of machine learning algorithms and parameters. It then finds the best-fit model based on your chosen accuracy metric. Jason will demo some of these experiments during the second half of his presentation.

Continue reading “April 2019: Data Lakes & AutoML”