Transient

Email: polozov@cs.washington.edu
Phone: +1 (425) 623-4121
Office: CSE 362 

Welcome

I am Alex (Oleksandr) Polozov, a third-year graduate student in the Computer Science & Engineering Department at the University of Washington. My advisors are Sumit Gulwani and Zoran Popović. I work on inductive program synthesis (also known as programming by example), formal logic, and their applications in the areas of education and data wrangling.

I received my B.S. in System Analysis with honors from the National Technical University of Ukraine "Kyiv Polytechnic Institute" in 2012.

Research

My interests lie in the applications of formal methods (software engineering, program synthesis, answer set programming) for math/programming education and data manipulation. Currently, I am working on the following projects:

  • FlashMeta, a meta-framework for program synthesis in domain-specific languages from inductive specifications (input-output examples)
  • Personalized mathematical word problem generation using answer set programming
  • LaSE: end-user query languages for semi-structured data extraction

Please see my CV and the projects page for more details.


Publications

  • FlashMeta: A Framework for Inductive Program Synthesis   [SPLASH (OOPSLA) 2015]
    Oleksandr Polozov, Sumit Gulwani
    [abstract]    [pdf]

    Inductive synthesis, or programming-by-examples (PBE) is gaining prominence with disruptive applications for automating repetitive tasks in end-user programming. However, designing, developing, and maintaining an effective industrial-quality inductive synthesizer is an intellectual and engineering challenge, requiring 1-2 man-years of effort.

    Our novel observation is that many PBE algorithms are a natural fall-out of one generic meta-algorithm and the domain-specific properties of the operators in the underlying domain-specific language (DSL). The meta-algorithm propagates example-based constraints on an expression to its subexpressions by leveraging associated witness functions, which essentially capture the inverse semantics of the underlying operator. This observation enables a novel program synthesis methodology called data-driven domain-specific deduction (D4), where domain-specific insight, provided by the DSL designer, is separated from the synthesis algorithm.

    Our FlashMeta framework implements this methodology, allowing synthesizer developers to generate an efficient synthesizer from the mere DSL definition (if properties of the DSL operators have been modeled). In our case studies, we found that 10+ existing industrial-quality mass-market applications based on PBE can be cast as instances of D4. Our evaluation includes reimplementation of some prior works, which in FlashMeta become more efficient, maintainable, and extensible. As a result, FlashMeta-based PBE tools are deployed in several industrial products, including Microsoft PowerShell 3.0 for Windows 10, Azure Operational Management Suite, and Microsoft Cortana digital assistant.


     
  • User Interaction Models for Disambiguation in Programming by Example   [UIST 2015]
    Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, Oleksandr Polozov, Rishabh Singh, Ben Zorn, Sumit Gulwani
    [abstract]    [pdf]   

    Programming by Examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples, though easy to provide, are an ambiguous specification of the user's intent. Because of that, a key impedance in adoption of PBE systems is the lack of user confidence in the correctness of the program that was synthesized by the system.

    We present two novel user interaction models that communicate actionable information to the user to help resolve ambiguity in the examples. One of these models allows the user to effectively navigate between the huge set of programs that are consistent with the examples provided by the user. The other model uses active learning to ask directed example-based questions to the user on the test input data over which the user intends to run the synthesized program.

    Our user studies show that each of these models significantly reduces the number of errors in the performed task without any difference in completion time. Moreover, both models are perceived as useful, and the proactive active-learning based model has a slightly higher preference regarding the users' confidence in the result.


     
  • Personalized Mathematical Word Problem Generation   [IJCAI 2015]
    Oleksandr Polozov, Eleanor O'Rourke, Adam M. Smith, Luke Zettlemoyer, Sumit Gulwani, Zoran Popović
    [abstract]    [project info]    [pdf]    [slides]

    Word problems are an established technique for teaching mathematical modeling skills in K-12 education. However, many students find word problems unconnected to their lives, artificial, and uninteresting. Most students find them much more difficult than the corresponding symbolic representations. To account for this phenomenon, an ideal pedagogy might involve an individually crafted progression of unique word problems that form a personalized plot.

    We propose a novel technique for automatic generation of personalized word problems. In our system, word problems are generated from general specifications using answer-set programming (ASP). The specifications include tutor requirements (properties of a mathematical model), and student requirements (personalization, characters, setting). Our system takes a logical encoding of the specification, synthesizes a word problem narrative and its mathematical model as a labeled logical plot graph, and realizes the problem in natural language. Human judges found our problems as solvable as the textbook problems, with a slightly more artificial language.


     
  • LaSEWeb: Automating Search Strategies over Semi-structured Web Data   [KDD 2014]
    Oleksandr Polozov, Sumit Gulwani
    [abstract]    [project info]    [pdf]    [slides]

    We show how to programmatically model processes that humans use when extracting answers to queries (e.g., "Who invented typewriter?", "List of Washington national parks") from semi-structured Web pages returned by a search engine. This modeling enables various applications including automating repetitive search tasks, and helping search engine developers design micro-segments of factoid questions.

    We describe the design and implementation of a domain-specific language that enables extracting data from a webpage based on its structure, visual layout, and linguistic patterns. We also describe an algorithm to rank multiple answers extracted from multiple webpages.

    On 100,000+ queries (across 7 micro-segments) obtained from Bing logs, our system LaSEWeb answered queries with an average recall of 71%. Also, the desired answer(s) were present in top-3 suggestions for 95%+ cases.