Algebraic databases

Patrick Schultz, David I. Spivak, Christina Vasilakopoulou, and Ryan Wisnesky

Databases have been studied category-theoretically for decades. While mathematically elegant, previous categorical models have typically struggled withrepresenting concrete data such as integers or strings. In the present work, we propose an extension of the earlier set-valued functor model, making use of multi-sorted algebraic theories (a.k.a. Lawvere theories) to incorporate concrete data in a principled way. This approach easily handles missing information (null values), and also allows constraints and queries to make use of operations on data, such as multiplication or comparison of numbers, helping to bridge the gap between traditional databases and programming languages. We also show how all of the components of our model - including schemas, instances, change-of-schema functors, and queries fit into a single double categorical structure called a proarrow equipment (a.k.a. framed bicategory).

Keywords: Databases, algebraic theories, proarrow equipments, collage construction, data migration

2010 MSC: 18C10, 18D05, 68P15

Theory and Applications of Categories, Vol. 32, 2017, No. 16, pp 547-619.

Published 2017-04-27.

http://www.tac.mta.ca/tac/volumes/32/16/32-16.pdf

TAC Home