## Personalized Mathematical Word Problem Generation

Word problems are an established technique for teaching mathematical modeling skills in K-12 education. However, many students find word problems unconnected to real life, artificial, and uninteresting. Most students find them much more difficult than the corresponding symbolic representations. To account for this phenomenon, an ideal pedagogy should entail an individually crafted progression of unique word problems that form a personalized plot.

We developed a system that generates word problems procedurally using answer set programming from general specifications. The specifications include tutor requirements (properties of a mathematical model), and student requirements (personalization, plot characters, setting). Our system generates a word problem narrative, a mathematical model, and a natural language representation according to the provided specifications. It makes use of an ontology of plot elements that can be parameterized by a literary setting.

This work has been done together with Eleanor O'Rourke, Adam M. Smith, Luke Zettlemoyer, Sumit Gulwani, and Zoran Popović.

**Publications:**

- Personalized Mathematical Word Problem Generation
*(IJCAI 2015)*

## LaSE: Languages for Structure Extraction

**LaSE** is a family of domain-specific languages designed to help end-users quickly extract data from various semi-structured formats. We call the format *semi-structured* if it has implicit structure and repetitiveness, this structure can be recognized by humans, but it hasn't been explicitly labeled for machine use. Some examples of semi-structured formats include:

- Spreadsheets: used by more than 500 million people worldwide with different backgrounds. Most users deal with embedding multi-dimensional data in a two-dimensional spreadsheet by aligning and separating additional dimensions with stylistic attributes (colors, borders, spaces, subheaders, fonts).
- Structured images: pixel images that were generated using some software process and that have an underlying hierarchical and repetitive structure. Such images arise in a variety of domains such as data tables, math worksheets, board games, crossword puzzles, bead drawings, and charts.
- Webpages: often contain information absent in structural databases, but presented in unstructured or semi-structured textual form.

Together with Sumit Gulwani, we developed **LaSEWeb**, the system for semi-structured data extraction from webpages, and **LaSEImg**, the system for semi-structured data extraction from structured images and spreadsheets. Both languages allow end-users to declaratively specify the desired data pattern, and explicitly separate information from style.

The work has been done during two internships at Microsoft Research.

**Publications:**

## Structure and Term Prediction for Mathematical Text

Inputting mathematical text into a computer is a painful task, be it in LaTeX or WYSIWYG editors. The reason for this is that a tree structure of a mathematical expression has to be encoded in a left-to-right linear order. We derive inspiration from the programming domain, where the software analyzes the input on the fly and provides intelligent suggestions for completion as the user types, not only saving users valuable time, but also providing a compelling advantage for using the software over paper-and-pen. This project extends this approach from programming to mathematical input. It solves two novel problems, namely *structure prediction *and *term prediction *problems for mathematical expression entry.

Our solution to the *structure prediction* problem involves defining a ranking measure that captures symmetry of a mathematical term, and an algorithm for efficiently finding the structure with the highest rank. Our solution to the *term prediction* problem involves defining a domain-specific language for term transformations, and an inductive synthesis algorithm that can learn the likely transformation from the first couple of sequence elements. Our tool is able to predict the correct structure in 63% of the cases, and save more then half of sequence typing time in 52% of the cases on our benchmark collection.

The work has been done together with Sumit Gulwani and Sriram Rajamani.

**Reports:**

- Structure and Term Prediction for Mathematical Text
*(Technical Report MSR-TR-2012-7)*