AT&T NYC Research Seminar Series


Scope: The AT&T NYC Research Seminar Series is organized by members of the AT&T Labs Big Data Research organization in our NYC offices. Our research spans a wide range of computer and data science topics, including databases, machine learning, networking, security, statistics, and data visualization. Much of our work is motivated by the massive data sets generated by our network each day. Understanding the dynamics of this data helps AT&T better serve its customers, improve its network, and develop new products and services.

Attending: Seminars are open to all AT&T employees as well as external visitors. However, as seating is limited, we require guests to RSVP (links included below for each talk) at least one day before the event. A photo ID is required and must be presented to security upon entering the building. All events will take place at 33 Thomas Street, New York, NY 10007 (see below for map, directions, and some history on the building without windows).

Upcoming Seminars

March 29, 2017 @ 11:00am (Refreshments @ 10:30am)

Ryan Hafen photo

Ryan Hafen, Hafen Consulting, Purdue University


Modern Approaches to Data Exploration with Trellis Display

Abstract: "Trellis Display", also known as "small multiples" display, is a commonly used exploratory visualization technique where data is split into groups and a plot is made for each group, with the resulting plots arranged in a grid. This approach is very simple yet is considered by data visualization experts to be "the best design solution for a wide range of problems in data presentation" because of its ability to display data in more detail and effectively elicit comparisons across groups. Historically, small multiple display systems presume the data is small enough to be presented in a single static display. Larger and more complex datasets call for a small multiple display system that allows displays to come alive by providing the ability to interactively navigate a potentially extremely large number of plots. This interactive navigation is enabled through the use of "cognostics" -- interesting summary statistics automatically computed for each group. In this talk, I will cover some history of small multiple displays, the principles that make them useful in exploratory data analysis, and recent work toward interactive, scalable small multiple displays that can be used to efficiently explore large data sets in detail. I will demonstrate the new TrelliscopeJS R package and show how it can be used to plug into common data analysis workflows to easily create interactive small multiple displays.

Bio: Ryan Hafen is a statistical consultant and remote adjunct assistant professor at Purdue University. Ryan's research focuses on methodology, tools, and applications in exploratory analysis, statistical model building, statistical computing, and machine learning on large, complex datasets. He is the developer of the datadr and Trelliscope components of the DeltaRho project (, as well as the rbokeh visualization package, and has developed several other R packages. Prior to his work as a consultant, Ryan worked at Pacific Northwest National Laboratory, where he analyzed large complex data spanning many domains, including power systems engineering, nuclear forensics, high energy physics, biology, and cyber security. Ryan has a B.S. in Statistics from Utah State University, M.Stat. in Mathematics from University of Utah, and Ph.D. in Statistics from Purdue University.

April 7, 2017 @ 11:00am (Refreshments @ 10:30am)

Matt Taddy photo

Matt Taddy, University of Chicago, Microsoft Research


Economic Artificial Intelligence

Abstract: We are in the middle of a remarkable rise in the use and capability of artificial intelligence systems. Much of this growth has been fueled by the success of deep learning architectures, and we are working on ways to direct these tools towards economic questions. Our approach uses economic theory to break complex questions into a series of machine learning tasks. Each task is then solved using mostly off-the-shelf ML, and we have recipes for combining the trained learners together to answer the original economic questions. I'll detail a couple of examples of this approach, including AI for optimal pricing and for causal inference in search advertisement. The end result is that we are able to automate and improve some common economic tasks, building towards a future system for Economic AI.

Bio: Matt Taddy joins Microsoft Research from the University of Chicago, where he is Professor of Econometrics and Statistics at the Booth School of Business and a fellow of the Computation Institute. He leads MSR’s Alice project on Economic AI. Taddy works at the intersections of statistics, economics, and machine learning. His research is directed towards development of new algorithms for machine learning, uncertainty quantification for these algorithms, and incorporation of artificial intelligence into the study of social and economic systems. Recent projects include optimization for complex demand and incentive systems, analysis of the polarization of political dialogue, and development of artificial intelligence for questions of causation. Taddy developed and teaches the Big Data class at Booth, an advanced MBA course that is designed to prepare students for careers at the interface of business strategy and Data Science. He has collaborated extensively with national laboratories, a variety of start-up ventures, and was a research fellow at eBay from 2014-2016. He earned his PhD in Applied Mathematics and Statistics in 2008 from the University of California, Santa Cruz, as well as a BA in Philosophy and Mathematics and an MSc in Mathematical Statistics from McGill University. He joined the Chicago Booth faculty in 2008 and Microsoft in 2016.

Previous Seminars

February 20, 2017 @ 11:00am (Refreshments @ 10:30am)

Felix Naumann photo

Felix Naumann, University of Potsdam

Presentation Slides (1.5MB)

Data Profiling

Abstract: Data profiling comprises a broad range of methods to efficiently analyze a given data set. In a typical scenario, which mirrors the capabilities of commercial data profiling tools, tables of a relational database are scanned to derive metadata, such as data types and value patterns, completeness and uniqueness of columns, keys and foreign keys, and occasionally functional dependencies and association rules. Individual research projects have proposed several additional profiling tasks, such as the discovery of inclusion dependencies or conditional functional dependencies. Data profiling deserves a fresh look for two reasons: First, the area itself is neither established nor defined in any principled way, despite significant research activity on individual parts in the past. Second, current data profiling techniques hardly scale beyond what can only be called small data. Finally, more and more data beyond the traditional relational databases are being created and beg to be profiled. The talk highlights the state of the art and proposes new research directions and challenges.

Bio: Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma (MA) in 1997 he joined the graduate school "Distributed Information Systems" at Humboldt University of Berlin. He completed his PhD thesis on "Quality-driven Query Answering" in 2000. In 2001 and 2002 he worked at the IBM Almaden Research Center on topics of data integration. From 2003 - 2006 he was assistant professor for information integration at the Humboldt-University of Berlin. Since then he holds the chair for information systems at the Hasso Plattner Institute at the University of Potsdam in Germany. His research interests are in data profiling, data cleansing, and text mining.

January 30, 2017 @ 11:00am (Refreshments @ 10:30am)

Shawndra Hill photo

Shawndra Hill, Microsoft Research

Television and Digital Advertising: Second Screen Response and Coordination with Sponsored Search

Abstract: We consider the potential to improve the efficiency and efficacy of broader advertising efforts through cross channel coordination. Past work has demonstrated a positive relationship between television advertising and online search activity. Here, we consider the types of devices on which search response predominantly manifests following TV advertisements, and the degree to which shifts in search activity can be used to evaluate the success of TV advertisers’ targeting efforts. We leverage data on TV advertising around Microsoft Windows 10 and an Xbox video game, in combination with large-scale proprietary search data from Microsoft Bing. Our identification strategy hinges on a combination of geographic heterogeneity in TV advertising exposure and continuous variation in the cost of TV advertisements (a proxy for TV audience size). We first demonstrate that search response peaks within three minutes of the airing of a TV advertisement, and that this manifests primarily via second-screen devices. Our estimated elasticities indicate that a 20% increase in advertising spend equates to an approximately 2.5% (3.4%) increase in search volumes for Windows 10 (the Xbox game). Second, we show that, indeed, the demographic groups targeted by TV advertisements are those most likely to respond, and we thereby demonstrate that TV ad effectiveness can be usefully measured via online search data. Third, examining sponsored search clicks in our query-level data, for queries involving brand-related keywords, we demonstrate a significant increase in rank-ordering effects in searches that take place in the minutes immediately following a TV advertisement, which implies a complementarity between TV and sponsored search advertisements.

Bio: Shawndra Hill is a Senior Researcher at Microsoft Research NYC . Before joining Microsoft, she was an Assistant Professor in the Operations and Information Management at the Wharton School of the University of Pennsylvania, where she is still an Annenberg Public Policy Center Distinguished Research Fellow, a Wharton Customer Analytics Initiative Senior Fellow, and a core member of the Penn Social Media and Health Innovation Lab. Generally, she researches the value to companies of mining data on consumers, including how consumers interact with each other on social media -- for targeted marketing, advertising, health and fraud detection purposes. Her current research focuses on the interactions between TV content and Social Media ( Dr. Hill holds a B.S. in Mathematics from Spelman College, a B.E.E. from the Georgia Institute of Technology and a Ph.D. in Information Systems from NYU's Stern School of Business.

December 12, 2016 @ 11:00am (Refreshments @ 10:30am)

Matthew Salganik photo

Matthew Salganik, Princeton

Presentation Slides (9.4MB)

Beyond Big Data

Abstract: The digital age has transformed how we are able to study social behavior. Unfortunately, researchers have not yet taken full advantage of these opportunities because we are too focused on "big data", such as digital traces of behavior. These big data can be wonderful for some research questions, but they have fundamental limitations for addressing many questions because they were never designed for research. This talk will argue that rather than focusing on "found data”, researchers should use the capabilities of the digital age to create new forms of "designed data.” I’ll provide three templates that researchers can use to combine the strengths of found data and designed data, and I’ll illustrate these templates with recent empirical studies. This talk is based on my forthcoming book—Bit by Bit: Social Research in the Digital Age—which is currently in Open Review at

Bio: Matthew Salganik is Professor of Sociology at Princeton University, and he is affiliated with several of Princeton's interdisciplinary research centers: the Office for Population Research, the Center for Information Technology Policy, the Center for Health and Wellbeing, and the Center for Statistics and Machine Learning. His research interests include social networks and computational social science. He is the author of the forthcoming book Bit by Bit: Social Research in the Digital Age. Salganik's research has been published in journals such as Science, PNAS, Sociological Methodology, and Journal of the American Statistical Association. His papers have won the Outstanding Article Award from the Mathematical Sociology Section of the American Sociological Association and the Outstanding Statistical Application Award from the American Statistical Association. Popular accounts of his work have appeared in the New York Times, Wall Street Journal, Economist, and New Yorker. Salganik's research is funded by the National Science Foundation, National Institutes of Health, Joint United Nations Program for HIV/AIDS (UNAIDS), Facebook, and Google. During sabbaticals from Princeton, he has been a Visiting Professor at Cornell Tech and a Senior Research are Microsoft Research.

November 18, 2016 @ 10:30am (Refreshments @ 10:00am)

Athina Markopoulou photo

Athina Markopoulou, UC Irvine

Experiences with Mobile Network Monitoring and Analysis

Abstract: Mobile network monitoring and analysis can provide insight into the activity of individual mobile devices as well as into collective user behavior. This creates opportunities for new applications and engineering optimizations, but also faces challenges in terms of privacy and performance. In the first part of the talk, I will present our work on analyzing mobile data (in particular, CDRs provided by cellular operators and geospatial data we collected from social networks) to characterize human activity in metropolitan areas, with applications to ride-sharing [UBICOMP 2014, SIGSPATIAL 2015], urban ecology [MOBIHOC 2015], and network provisioning [SmartCity 2016]. Time permitting, I will also present algorithms we designed to construct synthetic graphs that resemble real mobile and social network graphs within the dk-series framework [INFOCOM 2015, NetSci 2016]. In the second part of the talk, I will present our current work on AntMonitor - a system for on-device passive network monitoring, collection, and analysis. I will describe the design of AntMonitor as a user-space mobile app based on a VPN-service [SIGCOMM C2BID 2015], but without the need to route through a remote VPN server. Evaluation of our prototype shows that it significantly outperforms state-of-the-art approaches, both in terms of throughput and battery consumption. I will then describe the use of AntMonitor as a platform to enable a number of applications, including: (i) real-time detection and prevention of private information leakage from the device to the network; (ii) passive network performance monitoring; and (iii) application classification and user profiling.

Bio: Athina Markopoulou is an Associate Professor in EECS at the University of California, Irvine. She received the Diploma degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece (1996), and the Master's (1998) and Ph.D. (2003) degrees in Electrical Engineering from Stanford University. She has held short-term appointments at Sprintlabs (2003), Arista Networks (2005), IT University of Copenhagen (2012-2013), and she co-founded Shoelace Wireless (2012+). She received the Henry Samueli School of Engineering Faculty Midcareer Award for Research (2014) and the NSF CAREER Award (2008). She has been an Associate Editor for IEEE/ACM Transactions on Networking (2013-2015) and for ACM CCR, the General Chair for CoNext 2016, and the Director of the Networked Systems program at UCI. Her research interests are in the area of networked systems including network measurement and modeling, mobile and social networks, network security and privacy.

Nikolaos Laoutaris photo

Nikolaos Laoutaris, Telefonica Research

Measuring Online Behavioural Advertising

Abstract: In this talk I will present our recent results on detecting behavioural targeting in online advertising. I will describe the methods that we have developed to: 1) audit web domains for behavioural targeting by training artificial "personas", collecting ads, and identifying correlations between training and landing pages, 2) audit individual impression by using only browser history and online taxonomies for web-pages, and 3) audit individual impression by using crowdsourced data from multiple users. I will also present our initial findings on the amount of targeting going on, the most targeted categories, the existence of targeting even in sensitive personal categories for which the law requires explicit user consent, as well as our results on identifying the chain of companies involved in the delivery of such ads.

Bio: Nikolaos is the Chief Scientist and one of the co-founders of the Data Transparency Lab, a community of technologists, researchers, policymakers and industry representatives working to create a new wave of transparency software that will permit end users to sneak peek on what happens to their personal data behind the curtains of the web. He is currently working on answering questions like: "Why am I seeing this advertisement?", "Is the price that I see online for this ticket same as the one seen by you?", "How can we reconcile the information needs of online advertising/marketing and the privacy concerns of everyday people?". Before dropping everything to work on privacy and transparency Nikolaos spent many years conducting research and innovation in intelligent transportation, economics of networks, content distribution, new protocols for the Internet, energy efficient communications, social networks, algorithms, and others. More info at: http://

October 10, 2016 @ 11:00am (Refreshments @ 10:30am)

Ziawasch Abedjan photo

Ziawasch Abedjan, TU Berlin

Presentation Slides (1.5MB)

Data Curation in the Wild: Limits and Challenges

Abstract: According to the recent surveys, data scientists spend most of their time collecting, curating, and organizing data from heterogeneous and often dirty sources. In this process, datasets have to be cleaned from errors, equal entities from different data sources have to be matched, and data values have to be transformed into a common desired representation. In this talk, I will share our experience in using data curation systems in the wild. I will first report on our recent findings from testing state-of-the-art data cleaning systems on real world data and point out the limitations of current cleaning algorithms. Then, I will discuss the difficult task of data transformation discovery by presenting our data transformation discovery system, DataXFormer. Finally, I will shed light on our vision for future data curation systems and on how we intend to overcome the current limitations.

Bio: Ziawasch Abedjan is an assistant professor and the head of the "Big Data Management" (BigDaMa) Group at the TU Berlin in Germany and a Principal Investigator in the Berlin Big Data Center. Prior to that, Ziawasch was a postdoctoral associate at MIT CSAIL where he worked on various data cleaning topics. He received his PhD from the Hasso Plattner Institute in Potsdam, Germany, where he worked on methods for mining Linked Open Data. His current research focuses on data integration and data profiling. He is the recipient of the 2014 CIKM Best Student Paper Award, the 2015 SIGMOD Best Demonstration Award, and the 2014 Best Dissertation Award from the University of Potsdam.

September 12, 2016 @ 11:00am (Refreshments @ 10:30am)

Juliana Freire photo

Juliana Freire, NYU

Presentation Slides (24MB)

Democratizing Urban Data Analysis

Abstract: Today, 50% of the world's population lives in cities and the number will grow to 70% by 2050. Cities are the loci of economic activity and the source of innovative solutions to 21st century challenges. At the same time, cities are also the cause of looming sustainability problems in transportation, resource consumption, housing affordability, and inadequate or aging infrastructure. The large volumes of urban data, along with vastly increased computing power open up new opportunities to better understand cities. Encouraging success stories show better operations, more informed planning, improved policies, and a better quality of life for residents. However, analyzing urban data often requires a staggering amount of work, from identifying relevant data sets, cleaning and integrating them, to performing exploratory analyses over complex, spatio-temporal data. Our long-term goal is to enable domain experts to crack the code of cities by freely exploring the vast amounts of data cities generate. This talk describes challenges which have led us to fruitful research on data management, data analysis, and visualization techniques. I will present methods and systems we have developed to increase the level of interactivity, scalability, and usability for spatio-temporal analyses. This work was supported in part by the National Science Foundation, a Google Faculty Research award, the Moore-Sloan Data Science Environment at NYU, IBM Faculty Awards, NYU Tandon School of Engineering and the Center for Urban Science and Progress.

Bio: Juliana Freire is a Professor of Computer Science and Data Science at New York University. She is the Executive Director of the NYU Moore Sloan Data Science Environment. She holds an appointment at the Courant Institute for Mathematical Science, is a faculty member at the NYU Center for Urban Science and Progress and at the NYU Center of Data Science, where she is also the Director of Graduate Studies. Her recent research has focused on big-data analysis and visualization, large-scale information integration, provenance management, and computational reproducibility. Prof. Freire is an active member of the database and Web research communities, with over 150 technical papers, several open-source systems, and 11 U.S. patents. She is an ACM Fellow and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She has chaired or co-chaired several workshops and conferences, and participated as a program committee member in over 70 events. Her research grants are from the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, AT&T, Google, Amazon, the University of Utah, New York University, Microsoft Research, Yahoo! and IBM.

July 6, 2016 @ 1:30pm (Refreshments @ 1:00pm)

Paolo Merialdo photo

Paolo Merialdo, Università Roma Tre

Presentation Slides (4MB)

Fact Harvesting from Natural Language Text in Wikipedia

Abstract: Many approaches have been recently introduced to automatically create or augment Knowledge Graphs (KGs) with facts extracted from Wikipedia, particularly from its structured components like the infoboxes. Although these structures are valuable, they represent only a fraction of the actual information expressed in the articles, and surprisingly many KG miss facts that are indeed present in Wikipedia articles. In this work, we present Lector, an information extraction system that harvests new facts from the text of Wikipedia articles using information extraction techniques bootstrapped from the entities and relations of a given KG. Our preliminary experimental evaluations, which use Freebase as reference KG, reveal that we can augment several relations in the domain of people by more than 10%, with facts whose accuracy are over 95%. Moreover, the vast majority of these facts are missing from the infoboxes, YAGO and DBpedia.

Bio: Paolo Merialdo is with Università Roma Tre from 2001, first as a research associate and then as an associate professor. He graduated in Computer Engineering from Università di Genova (1990), and he received his PhD from Università di Roma "La Sapienza" (1998), under the supervision of prof. Paolo Atzeni. In 1997 and 1998 he spent several months at the University of Toronto as visiting researcher, working with prof. Alberto Mendelzon. He has published his research results in important journals of the field, and in the refereed proceedings of major conferences. He is co-founder of InnovAction Lab, the most important Italian startup program for university students, and he serves as advisor at the LuissEnlabs startup accelerator in Rome.

June 6, 2016 @ 11:00am (Refreshments @ 10:30am)

Ben Wellington photo

Ben Wellington, Two Sigma, Pratt Institute

Presentation Slides (17MB)

Changing a City with Urban Data Science

Abstract: In this talk, I’ll explore how I’ve used data science and my blog, I Quant NY, to make changes in the city I live in: New York City. From parking ticket geography, to restaurant inspection scores to subway and taxi pricing, I will discuss best practices for data science in the policy space, explore how story telling is an important aspect of data science and highlight the various data-driven interactions I've had with City agencies. Along the way, I will point out that data science need not always use complicated math and complex programs. I will show examples of the power of simple arithmetic, and show how often it is more about your curiosity and the questions you ask than the complexity of the equations you use.

Bio: Ben Wellington is the creator of I Quant NY, a data science and policy blog that focuses on insights drawn from New York City's public data, and advocates for the expansion and improvement of that data. Ben is a contributor to The New Yorker, and is a Visiting Assistant Professor in the City & Regional Planning program at the Pratt Institute in Brooklyn. Ben holds a Ph.D. in Computer Science from New York University.

May 16, 2016 @ 11:00am (Refreshments @ 10:30am)

Muthu photo

Muthu Muthukrishnan, Rutgers

Presentation Slides (9MB)

Data Stream Algorithms: New Directions

Abstract: There is a two decades long history of algorithms for dealing with data streams with small --- sublinear --- resources like space, time and communication. In this talk, I will review some of the achievements in this area, and will discuss emerging directions including stochastic streams, graphical models, graph and matrix algorithms and others. These methods have applications in statistical data analysis and machine learning, social data analysis and analytics for modern Big Data systems.

Bio: Muthu is a Professor at Rutgers University. His research interest is in algorithms, in particular, data stream algorithms and online advertising. He has a blog:

April 11, 2016 @ 11:00am (Refreshments @ 10:30am)

David Blei photo

David Blei, Columbia

Presentation Slides (24MB)

Probabilistic Topic Models and User Behavior

Abstract: Topic modeling algorithms analyze a document collection to estimate its latent thematic structure. However, many collections contain an additional type of data: how people use the documents. For example, readers click on articles in a newspaper website, scientists place articles in their personal libraries, and lawmakers vote on a collection of bills. Behavior data is essential both for making predictions about users (such as for a recommendation system) and for understanding how a collection and its users are organized. I will review the basics of topic modeling and describe our recent research on collaborative topic models, models that simultaneously analyze a collection of texts and its corresponding user behavior. We studied collaborative topic models on 80,000 scientists' libraries from Mendeley and 100,000 users' click data from the arXiv. Collaborative topic models enable interpretable recommendation systems, capturing scientists' preferences and pointing them to articles of interest. Further, these models can organize the articles according to the discovered patterns of readership. For example, we can identify articles that are important within a field and articles that transcend disciplinary boundaries.

Bio: David Blei is a Professor of Statistics and Computer Science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data. David has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013). He is a fellow of the ACM.

February 1, 2016 @ 11:00am (Refreshments @ 10:30am)

Steve Bellovin photo

Steven Bellovin, Columbia

Presentation Slides (2MB)

Thinking Security: Stopping Next Year's Hackers

Abstract: Many computer applications are bound to a particular point in time; more precisely, to a given set of technologies and costs. The same is true of computer security. Unfortunately, once something becomes possible people become wedded to it, and never look back at the environment and assumptions that made it possible or even necessary. This is especially serious for security, since it causes us to endure the costs and annoyances of marginally useful (or even harmful) mechanisms while blinding us to newer threats. What can be done? How can we recognize the implicit assumptions in what we're doing? Can we do better in the future? How do differing threat models affect the question?

Bio: Steven M. Bellovin is the Percy K. and Vidal L. W. Hudson Professor of computer science at Columbia University, where he does research on networks, security, and especially why the two don't get along, as well as related public policy issues. In his copious spare professional time, he does some work on the history of cryptography. He joined the faculty in 2005 after many years at Bell Labs and AT&T Labs Research, where he was an AT&T Fellow. He received a BA degree from Columbia University, and an MS and PhD in Computer Science from the University of North Carolina at Chapel Hill. While a graduate student, he helped create Netnews; for this, he and the other perpetrators were given the 1995 Usenix Lifetime Achievement Award (The Flame). Bellovin has served as Chief Technologist of the Federal Trade Commission. He is a member of the National Academy of Engineering and is serving on the Computer Science and Telecommunications Board of the National Academies of Sciences, Engineering, and Medicine. In the past, he has been a member of the Department of Homeland Security's Science and Technology Advisory Committee, and the Technical Guidelines Development Committee of the Election Assistance Commission; he has also received the 2007 NIST/NSA National Computer Systems Security Award and has been elected to the Cybersecurity Hall of Fame. Bellovin is the co-author of Firewalls and Internet Security: Repelling the Wily Hacker, and holds a number of patents on cryptographic and network protocols.

January 11, 2015 @ 11:00am (Refreshments @ 10:30am)

Claudia Perlich photo

Claudia Perlich, Dstillery

Presentation Slides (19MB)

Tales from the Data Trenches of Digital Advertising

Abstract: Digital advertising is one of the largest and open playgrounds for machine learning, data mining and related analytic approaches. This talk will touch on a number of challenges which arise in this environment: 1) high volume data streams of around 30 Billion daily consumer touch points, 2) low latency requirements on scoring and automated bidding decisioning within 100ms and 3) adversarial modeling in the light of advertising fraud and bots. Specifically, we will discuss an automated learning system implemented at Dstillery, that uses privacy friendly data representation to build sparse targeting models for thousands of products in Millions of dimensions. The solution incorporates ideas from transfer learning, Bayesian priors, stochastic gradient descent, hashing and learning rate estimation. On the sidelines, but of no less importance, are topics on bid optimization, data reliability, cross-device identification and observational methods for causal inference. Finally, I will touch on a few higher-level lessons around incentive misalignments/measurement issues in the advertising industry and measuring causality on observational data.

Bio: Claudia Perlich leads the machine learning efforts that power Dstillery’s digital intelligence for marketers and media companies. With more than 50 published scientific articles, she is a widely acclaimed expert on big data and machine learning applications, and an active speaker at data science and marketing conferences around the world. Claudia is the past winner of the Advertising Research Foundation’s (ARF) Grand Innovation Award and has been selected for Crain’s New York’s 40 Under 40 list, Wired Magazine’s Smart List, and Fast Company’s 100 Most Creative People. Claudia holds multiple patents in machine learning. She has won many data mining competitions and awards at Knowledge Discovery and Data Mining (KDD) conferences, and served as the organization’s General Chair in 2014. Prior to joining Dstillery in 2010, Claudia worked at IBM’s Watson Research Center, focusing on data analytics and machine learning. She holds a PhD in Information Systems from New York University (where she continues to teach at the Stern School of Business), and an MA in Computer Science from the University of Colorado.

December 7, 2015 @ 11:00am (Refreshments @ 10:30am)

Claudio Silva photo

Claudio Silva, NYU

Presentation Slides (41MB)

Visualization and Analysis of Urban Data

Abstract: Today, 50% of the world's population lives in cities and the number will grow to 70% by 2050. Urban data opens up many new opportunities to improve cities and people’s lives. In NYC, by integrating and analyzing data sets from multiple city agencies, the Bloomberg administration was able improve the success rate of inspections. A marked reduction in crime both in New York and Los Angeles has been in part attributed to data-driven policing. Policy changes have also been triggered by data-driven studies that, for example, showed correlations between foreclosures and increase in crime, the effects of subsidized housing on surrounding neighborhoods, and how low income households use the flexibility provided by vouchers to reach neighborhoods with high performing schools. But in each of these successes, the level of effort required to gather, integrate, analyze the relevant data, design and refine models, or develop and deploy apps, is staggering. Further as data volumes and data complexity continue to explode, these problems are only getting worse. In this talk, we will provide an overview of research in the development of new methods and systems for enabling interdisciplinary teams to better understand cities. We will also show some applications of our work.

Bio: Cláudio Silva is a professor of computer science and engineering and data science at New York University. Claudio’s research lies in the intersection of visualization, data analysis, and geometric computing, and recently he has been interested in the analysis of urban data and sports analytics. He has published over 220 journal and conference papers and is an inventor of 12 US patents. His work received over 10,000 citations according to Google Scholar and an h-index of 50. Cláudio has served on the editorial boards of several journals, including IEEE Transactions on Big Data, ACM Transactions on Spatial Algorithms and Systems, Computer Graphics Forum, The Visual Computer, Graphical Models, Computer and Graphics, Computing in Science and Engineering, and IEEE Transactions on Visualization and Computer Graphics. He helped developed a number of award-winning software systems, most recently Major League Baseball (MLB)'s Statcast player tracking system. He is an IEEE Fellow and was the recipient of the 2014 IEEE VGTC Visualization Technical Achievement Award “in recognition of seminal advances in geometric computing for visualization and for contributions to the development of the VisTrails data exploration system.” He is currently Chair of the IEEE Technical Committee on Visualization and Graphics.

November 16, 2015 @ 1:00pm (Refreshments @ 12:30pm)

Jay Emerson photo

Jay Emerson, Yale

Presentation Slides (4MB, internal AT&T only)

Statistics in Sports: From Probabilities to Predictions

Abstract: I will highlight two data-motivated projects in Olympic figure skating. I will then concentrate in more detail on prediction and modeling challenges arising in a range of problems not unique to sports but illustrated through the analysis of Olympic diving and college basketball.

Bio: John W. Emerson (Jay) is Director of Graduate Studies in the Department of Statistics at Yale University. He teaches a range of graduate and undergraduate courses as well as workshops, tutorials, and short courses at all levels around the world. His interests are in computational statistics and graphics, and his applied work ranges from topics in sports statistics to bioinformatics, environmental statistics, and Big Data challenges. He is the author of several R packages including bcp (for Bayesian change point analysis), bigmemory and sister packages (towards a scalable solution for statistical computing with massive data), and gpairs (for generalized pairs plots). His teaching style is engaging and his workshops are active, hands-on learning experiences.

October 12, 2015 @ 11:00am (Refreshments @ 10:30am)

Dennis Shasha photo

Dennis Shasha, NYU

Presentation Slides (3MB)

Liquid Version Climber: An Automated Tool for Upgrading Complex Software Systems

Abstract: Suppose you are given a software system that is composed of a set of packages each at a particular version. You want to update some packages to their most recent versions possible, but you want your software to run after the upgrades, thus perhaps entailing changes to the versions of other packages. One approach is trial and error, but that quickly ends in frustration. We advocate a reproducibility-based approach in which tools like ptrace, reprozip, pip, and virtual machines combine to enable us to explore version combinations of different packages even on a variety of platforms. Because the space of versions to explore grows exponentially with the number of packages, we have developed a memoizing algorithm that avoids exponential search while guaranteeing an optimum version combination.

This is joint work with Christophe Pradal, Sarah Cohen-Boulakia, and Patrick Valduriez.

Bio: Dennis Shasha is a professor of computer science at the Courant Institute of New York University and an Associate Director of NYU Wireless. He works with biologists on pattern discovery for network inference; with computational chemists on algorithms for protein design; with physicists and financial people on algorithms for time series; on clocked computation for DNA computing; and on computational reproducibility. Other areas of interest include database tuning as well as tree and graph matching. Because he likes to type, he has written six books of puzzles about a mathematical detective named Dr. Ecco, a biography about great computer scientists, and a book about the future of computing. He has also written five technical books about database tuning, biological pattern recognition, time series, DNA computing, resampling statistics, and causal inference in molecular networks. He has written the puzzle column for various publications including Scientific American, Dr. Dobb's Journal, and the Communications of the ACM. He is a fellow of the ACM and an INRIA International Chair.


AT&T, 33 Thomas Street, New York, NY 10007
All events will be held in the AT&T Research Center in downtown Manhattan located at 33 Thomas Street, New York, NY 10007.
See map. See directions.

Contact Us

The NYC Seminar Series is organized by Cheryl Flynn and Jim Klosowski.
Email us.