Boston Data Festival
Boston Fall Data Festival
Monday November 4 - Sunday November 10

Boston's first ever Data Festival brings together the meetup community, entrepreneurs, VCs and others to highlight our data-centric scene. Metro Boston is wonderfully diverse, with some of the best minds, universities, and companies globally.

linked in BDF

Program Schedule
More Info
Monday November 4
06:00 PM Dr. William Kahn – Keynote – Hyatt Regency Cambridge

Dr. William Kahn leads the Analytic Capabilities team in the Science function at American International Group, a $70+ Billion financial services company. Previously, he led the statistical team at Capital One. His talk is titled “Ten challenges for the next generation data scientist from a last generation statistician.” [RSVP]

07:00 PM Boston All-Star Data Panel – Hyatt Regency Cambridge

Data all-star panel drawn from the world of venture capital, academia, and industry. Panelists include Chris Lynch – Partner, Atlas Venture, Sam Madden – Professor and Lead of BigData@CSAIL & MIT, Dr. Willard (Bill) Simmons,Co-Founder and CTO of DataXu. [RSVP]

08:00 PM Network Social

Connect with fellow data enthusiasts from across the greater Boston area. [RSVP]

Tuesday November 5
05:45 PM Q&A with Adam Broun formerly CTO in Residence at Fintech Innovation Lab @ Plastiq

Hosted by the Boston FinTech meetup and featuring a Q&A with Adam Broun, currently of Kensho Finance and formerly CTO in Residence at Fintech Innovation Lab, Managing Director and CIO of Front Office systems at Credit Suisse. [RSVP]

06:00 PM Mining Highly Imbalanced Data @ hack/reduce

David Weisman gives a talk titled “Mining Highly Imbalanced Data.”
Constructing classifiers from imbalanced data is fascinating from both theoretical and practical perspectives. Validating classifiers is also challenging with imbalanced data, as a trivial model that always predicts the majority class will superficially appear accurate. We’ll survey class imbalance from several perspectives, and investigate successful approaches to constructing classifiers from imbalanced data. [RSVP]

06:00 PM Using Data & Analytics to Solve Marketing’s Toughest Challenges @ DataXu

Join DataXu co-founder and SVP of Analytics and Innovation Sandro Catanzaro as he looks at how data and analytics are disrupting the marketing industry. We’ll look at how these new technological advances turn advertising and marketing into real-time, always-on market research, and how marketers and brands are adapted to this new normal. Stick around following the talk for snacks and networking! [RSVP]

07:20 PM Schemaless SQL : Easily Question All Data Types Through One Familiar Interface @ hack/reduce

In this session you will learn how to dramatically reduce the complexity of multi-structured data analysis on Hadoop and accelerate time to insights. Key concepts in this talk will include:

    – Creating a query able view of not only traditional structured data, but also non-relational data such as text, documents and key-value pairs
    – Identify and present dynamically changing attributes within data, thereby dramatically reducing ETL
    – Enable analysts to run Machine Learning algorithms over multi-structured in parallel as SQL functions”


08:20 PM Secure, Scalable NoSQL for Real-Time Apps with Apache Accumulo @ hack/reduce

Adam Fuchs presents his talk titled “Secure, Scalable NoSQL for Real-Time Apps with Apache Accumulo”
Data volumes and security requirements present serious challenges for real-time applications. Apache Accumulo enables online model building and dynamic indexing to support both retrospective analysis and enrichment of streaming data. These mechanisms are built on a foundation of fine-grained access control, supporting a bloom of innovative applications without sacrificing security. This talk will outline the framework that we use to support secure, scalable real-time analysis, as well as dive deeper into many of the supporting features of Accumulo. [RSVP]

Wednesday November 6
06:00 PM Seminar: How Text Becomes Data @ hack/reduce

People and businesses want to make decisions based on large amounts of quantifiable data. This talk by Rob Speer will show you how to create text models that can be built into useful tools such as search engines, recommender systems, and classifiers. [RSVP]

06:30 PM Data Vis 101: Principles for Design @ CIC

Our speaker, Lynn Cherny, has condensed an intro workshop into 45 minutes and will review the principles for successful design with data, including tips on visual encodings, story-finding, and principles for developing exploratory or explanatory visualizations. We’ll look at a couple redesigns and award winners, plus maybe a few #WTFvis examples along the way. [RSVP]

07:00 PM Predicting Diabetes from a Relational Database @ hack/reduce

Jeremy Achin presents “Predicting Diabetes from a Relational Database of Medical Records.” The theme here is “how to build highly accurate predictive models when your data is messy and spread out across many data sources.” [RSVP]

07:30 PM Humanizing Big Data with Data Visualization @ CIC

Mark Schindler presents his talk – ‘Data visualization is the human front-end of big data’. In order for people to solve problems and make decisions using insights drawn from big data, they need a clear understanding of the stories that are often buried. How can UI designers and data visualization practitioners help make those insights understandable and useful to decision-makers? We all deal with the challenge of how to identify meaningful objects or events in a raw datastream, and present those events to users in a way that provides context and helps them get a qualitative understanding of what is going on. We’ll look at approaches to accomplishing this, and how techniques like visual abstraction, attention-management and metaphor can help. [RSVP]

Thursday November 7
06:00 PM Deep Learning – The Future of Machine Learning and AI @ Fidelity Auditorium

An overview of Deep Learning (the future of machine learning and artificial intelligence).
Speakers: Dallin Akagi and/or Alec Radford. Dallin worked on deep learning at the NSA for the last 2 years, while Alec is an accomplished Kaggler and an expert at applied deep learning. [RSVP]

06:30 PM Ignite Data Boston @ hack/reduce

“Enlighten us, but make it fast”
Featured in various cities all over the country, Ignite presentations give experts, professionals, and just plain geeks the chance to share their passions with an audience. What’s the twist? The presentations only contain 20 slides that auto-advance every 15 seconds, leaving presenters with a strict five-minute presentation.Ignite Data Boston will give attendees the opportunity to see some of the diverse and interesting data projects going on in the Boston area. [RSVP]

07:00 PM Predicting Stock Prices with Maximum Accuracy @ Fidelity Auditorium

Predicting Stock Prices with Maximum Accuracy – This talk will be presented by Sergey Yergenson. Sergey, currently ranked 11 out of over 100,000 Kaggle users, most recently placed 2nd out of 448 competitors in a stock price prediction Kaggle. Sergey will show people how to predict stock price movement by walking through his 2nd place solution to the competition. [RSVP]

Friday November 8
03:30 PM MATLAB Workshop @ CIC

Todd Atkins lead this MATLAB Workshop, Friday 3:30 to 5:30pm at the CIC. [RSVP]

06:00 PM Data Science Careers @ hack/reduce

Data Scientists Panel Event. Panelist include

  • Andy Palmer, Co-Founder at Data Tamer
  • John Piekos, VP of Engineering at VoltDB
  • Catherine Havasi, Co-Founder and CEO of Luminoso
  • Chris Rocca, VP of Engineering at Hadapt
  • Network with recruiters from sponsoring companies (Before and after the panel discussion). [RSVP]

    06:00 PM API Tutorial with Use Case (Boston’s Meetup Ecosystem) @ Microsoft NERD

    APIs continue to grow in numbers as evidenced by the website the Programmable Web. The night will have two parts: first an example use case, Boston’s Meetup Ecosystem, (which used API generated data) will be presented, and then a hands-on tutorial will be given on how to pull data from an API towards using it for analysis, mashups, etc. [RSVP]

    07:00 PM Data-centric Startup Showcase @ hack/reduce

    Data-centric startup showcase highlighting innovative companies doing cool things with data: featuring Outbrain, Luminoso, JAZE, and Nutonian. [RSVP]

    08:00 PM Data Social @ Hack/Reduce

    Data (Speed) Dating: pitch your data-centric company or idea and find potential team members. [RSVP]

    Saturday November 9
    12:00 PM CoderDojo @ hack/reduce

    Hack/Reduce hosts a coding workshop for local high school students.

    01:00 PM Beginner R Workshop @ CIC

    This 3-hour workshop is focused on learning R. In recent years, R has become an essential tool for data mining, machine learning, predictive analytics, and more traditional statistical methods. Facebook, Google, and the New York Times use R for complex data analysis and information visualization.  Numerous employers are actively seeking strong R skills.

    This workshop is designed to get folks up and running with R. We will cover basic R usage including data management, simple statistical analyses, and data visualization. After completing this hands-on workshop, you will have a solid foundation for moving onto more extensive analysis in R. [RSVP]

    [See also our 2-part Intermediate R workshop on Sunday at 10 AM]

    Sunday November 10
    09:00 AM Pre-Hackathon MATLAB Workshop @ hack/reduce

    Jiro Doke will lead this pre-hackathon MATLAB Workshop, Sunday 9:00 to 10:00am. The Sunday session will be a consolidated version of the Friday one. [RSVP]

    10:00 AM Stock Prediction Hackathon @ hack/reduce

    Boston Data Festival is hosting a Hackathon on Sunday 11/10/13 from 10 am to 6 pm. The event will take place at Hack/Reduce (275 Third Street, Cambridge, MA). The goal of the Hackathon is to predict the directional accuracy of a stock prices.
    The following cash prices will be awarded!

      1st place: $500
      2nd place: $250
      Best submission using Matlab: $250
      Self-respect from doing better than a monkey throwing darts: Priceless.

    During the Hackathon every participant can use a free license from Matlab. [RSVP]

    10:00 AM Programming with Data: Intermediate R and Matrix Decomposition @ CIC

    This workshop consists of two 90-minute segments, “Intermediate R” and “Matrix Decomposition”.

    In “Programming with Data: Intermediate R” we will teach you to start thinking like an R programmer. We will cover topics including advanced data I/O, vectorized computation, _apply_ functions and the map-reduce pattern, table reshaping and pivoting, string manipulation, and coding patterns for data visualization with ggplot2. The Intermediate R workshop assumes some experience with R programming. Advanced statistical knowledge is not required for this workshop.

    Matrix Decomposition is the followup to the Boston Data Mining Meetup talk “Mining the Matrix (Decomposition).” The workshop doesn’t assume any prior knowledge of matrix decomposition and will start with a recap of matrix decomposition and the examples from the talk, making it accessible to a general audience. We will cover how to perform the decompositions in R and focus on how to interpret the resulting factors as actional information. In addition to a number of toy examples, we will cover a Netflix-style movie genre discovery problem, and a stock market trend problem. Requirements: a laptop with a working R install and a basic understanding of the R environment. [RSVP]

    Speaker Lineup
    More Info
    Bill Kahn

    Dr. William Kahn


    Dr. William Kahn leads the Analytic Capabilities team in the Science function at American ...
    More Info

    Prof. Samuel Madden, MIT

    Data All-Star Panelist

    Dr. Samuel Madden is an associate professor of electrical engineering and computer science ...
    More Info

    Chris Lynch, Atlas Venture

    Data All-Star Panelist

    Christopher Lynch – Chris Lynch is a Partner at Atlas Venture, focusing on Big Data ...
    More Info

    Dr. Bill Simmons, DataXu

    Data All-Star Panelist

    Dr. Willard (Bill) Simmons, Co-Founder, Chief Technology Officer- the brain behind ...
    More Info

    Gregory Piatetsky-Shapiro, Ph.D.

    Moderator for All-Star Data Panel

    Gregory Piatetsky-Shapiro, Ph.D., is the Editor of, a co-founder of KDD ...
    More Info

    Andy Palmer

    Data Science Panel

    Co-Founder & Interim CEO, Data-Tamer. Andy Palmer is a serial entrepreneur who has ...
    More Info

    Prof. David Weisman

    Mining Imbalanced Data

    David Weisman is a data scientist consultant with over 35 years of experience in the ...
    More Info

    Dallin Akagi, DataRobot

    An Overview of Deep Learning: ML and AI

    Dallin Akagi is working on a new predictive analytics platform for a data science start-up ...
    More Info

    Dr. Catherine Havasi

    Seminar: How Text Becomes Data

    Dr. Catherine Havasi has been researching language and learning for nearly fifteen years. ...
    More Info

    Sandro Catanzaro
    DataXu co-founder and SVP of Analytics and Innovation.  Join Sandro as he looks at how ...
    More Info

    Mark Schindler

    Humanizing Big Data with Data Visualization

    Mark Schindler is co-founder and Managing Director of, a Cambridge, MA ...
    More Info

    Lynn Cherny

    Data Vis 101: Principles for Design

    Lynn Cherny is a local data analysis and visualization consultant who works in Python, R, ...
    More Info

    Rob Speer

    Seminar: How Text Becomes Data

    People and businesses want to make decisions based on large amounts of quantifiable data. ...
    More Info

    Thaddeus Diamond, Hadapt

    Schemaless SQL : Easily Question All Data Types Through One Familiar Interface

    Thaddeus Diamond is a Sr. Software Engineer at Hadapt. At Hadapt, he has played an ...
    More Info

    Jeremy Achin

    Predicting Diabetes from a Database

    Jeremy has over 9 years of experience building and implementing predictive models. He is ...
    More Info
    Dan Gerlanc

    Dan Gerlanc

    R Workshop

    Dan Gerlanc is an R expert and founder and managing Director at Enplus Advisors, a ...
    More Info

    Michael Schmidt, Nutonian

    Startup Showcase

    Michael Schmidt’s research focuses on “Machine Science” – a direction in ...
    More Info

    Dr. Matthew Eaton

    R Workshop

    Matthew Eaton, PhD is currently a postdoctoral researcher in computational biology at MIT ...
    More Info

    Adrienne Cochrane

    Data Social, Coder Dojo

    Adrienne Cochrane is passionate about the Boston tech community and its potential in the ...
    More Info

    Alec Radford

    An Overview of Deep Learning: ML and AI

    Alec Radford is currently a data scientist at a start-up in Boston and an undergraduate ...
    More Info

    John Verostek

    Meetup Ecosystem / API Tutorial

    John is a business strategy market research consultant working with startups as well as ...
    More Info

    Dr. Sergey Yergenson

    Predicting Stock Prices

    Sergey Yergenson is currently ranked 11 out of over 100,000 Kaggle users and recently ...
    More Info

    Dr. Rani Nelken

    Startup Showcase

    Rani Nelken is Director of Research at Outbrain, where he leads a research team focusing ...
    More Info

    Chris Rocca, Hadapt

    Data Science Panel

    Chris Rocca has been involved in various facets of data management for over 20 years and ...
    More Info

    Adam Fuchs, Sqrrl

    Secure, Scalable NoSQL for Real-Time Apps

    Adam Fuchs is the Chief Technology Officer and co-founder of Sqrrl ( ...
    More Info

    John Piekos, VoltDB

    Data Science Panel

    John Piekos heads up VoltDB’s engineering operations, including product development, QA, ...
    More Info

    Todd Atkins, MathWorks

    MATLAB Workshop, Friday @ CIC

    Todd Atkins manages the US Educational Technical Marketing team at MathWorks which is ...
    More Info

    Jiro Doke, MathWorks

    MATLAB Workshop, Sunday @ hack/reduce

    Jiro Doke, Ph.D, joined MathWorks in May 2006 as an Application Engineer. He received his ...
    More Info

    Kickoff Event
    Join us for you first ever Boston Data Festival kickoff event.
    More Info

    Stock Predicting Monkey

    Your main competition at the stock prediction hackathon

    Mr. Stock Predicting Monkey has been confounding Wall St. analysts and critics for years ...
    More Info