Databases, graphs, and GraphQL: The past, present, and future
Manish Jain and Josh McKenzie are both engineer rock stars who wear many hats. They also have something else in common: they are both avid GraphQL users and builders, despite getting there from different start points. GraphQL is an API query language that has taken the software development world by storm, and Jain and McKenzie exemplify this.
Jain, founder and CTO of Dgraph, was among the first to take note of GraphQL in 2015. He liked it so much that he built Dgraph around it. McKenzie got exposed to GraphQL in 2020, while building Stargate during his stint at DataStax. He liked it so much that he is now VP of Software Engineering at Apollo, a vendor built around GraphQL.
We were not as early as Jain, but in 2018 we noted that GraphQL may make sense as a layer for universal database access. Today we revisit the use of GraphQL to access databases, with insights from Jain and McKenzie on the technical, business, and social implications.
GraphQL is not SQL, and it’s not trying to be either
When discussing GraphQL, even in 2021, starting with a disclaimer may be necessary: GraphQL is NOT a graph query language. It’s a misnomer, and as Jain pointed out, “GraphQL is as much a graph query language, as relational databases are about relationships.” In other words, not very much, and this is by design.
Jain should know, as he chose to build Dgraph, a graph database, around GraphQL. To serve as a graph query language, GraphQL had to be extended. Dgraph’s fork of GraphQL, originally called GraphQL+, is now called DQL.
Jain noted he was in touch with GraphQL founders. What they are trying to achieve, he went on to add, is a general specification which can be implemented easily on any system, as opposed to a very specific and complete specification. GraphQL is not a SQL equivalent, and it’s not trying to be either.
This, of course, begs the question: How come so many databases these days support GraphQL? MongoDB does, and so do Fauna, Yugabyte and a number of graph databases. Not to mention, there are GraphQL connectivity layers for PostgreSQL and other databases too, via Hasura, PostGraphile, and Prisma.
McKenzie, being relatively fresh in the GraphQL ecosystem but seasoned in the database world, pointed out that one of the tensions people often navigate with APIs for databases is how much of the underlying representation on disk is exposed to the API.
There is a tradeoff there: the more of that you expose, the more performant you can expect queries to be, and the heavier the lift for the end user to really be able to interface with that API and understand it. McKenzie compared and contrasted Cassandra and MongoDB in that respect.
Cassandra started out with Thrift: untyped, wild west. Over time, they moved to CQL, with the notion that it’s closer to SQL and people will find it easier to adopt. That took power out of the hands of the end user, but made things more user friendly.
MongoDB’s play, as per McKenzie, was to say — how do we most empower the end user, how do we put the most flexibility and power into their hands? Figuring out the technology side would come later, and continuous revision and investment over time was a strategic choice:
“This distills down to the question: are you optimizing for CPU compute time, or for human heartbeats? Which of those two things is the resource that you want your API to to really leverage and facilitate? GraphQL is firmly in the camp of being more human usable and human consumable than a lot of the bespoke query languages.”
GraphQL’s simplicity is its charm, and its Achilles heel
GraphQL’s charm is that it’s simple to understand and work with. And therein also lies its Achilles heel, Jain noted. But there is a reason GraphQL is becoming interesting for databases to implement, he went on to add.
Looking at a typical REST API, it’s hard to define what will be used and how — which types, and which fields. It becomes a permutation problem of trying to figure out what people will and won’t use, which is why it’s hard to build a REST API on top of a database to make it easier for people to consume it.
“With GraphQL databases, there is a way to match that need. Because the user can tell them — this is what I want. And the database is going to quickly translate that to how they sort it out on disk. So that’s a plus that opens up their gates to all those front end developers who really like GraphQL”, said Jain.
GraphQL can work well for CRUD operations, but what about the rest? What about custom queries, and filters? This is where the specification is a bit vague, and implementations may vary. But this is also where the resolver pattern shines.
This pattern, also leveraged by Apollo, lets developers abstract the implementation of complex query logic specific to each database. In addition, as McKenzie noted, the resolver pattern enables decoupling. You can have a team that specializes in data access, and the consumers don’t have to know how that data is surfaced.
This is also how Stargate evolved, according to McKenzie, who was involved in it. From a number of micro services serving different APIs for Cassandra, to micro services going to a Dynamo coordinator, to Stargate, which McKenzie sees as the marriage of GraphQL and Dynamo.
The existential question, he went on to add, is how to simplify but still provide 90 percent of the power, then take the other 10 percent and hide it behind advanced users and customized ability. Stargate could potentially just have GraphQL as its API, because GraphQL allows incremental adoption.
However, for Stargate to serve a broader purpose than as an access layer for Cassandra, enabling developers to work in a polyglot environment plus a fair amount of evangelizing and community building would be needed.
And what about GraphQL as an actual graph query language? McKenzie sees the technology growing into the name as opposed to it. There is a graph-shaped computer science problem starting to surface as the usage of GraphQL gets broader and broader and more widely adopted.
The mesh of all the endpoints that GraphQL abstracts over is indeed a graph, and Apollo will have to think about how to tackle graph-shaped challenges in the query planner space. Different graph algorithms will become quite relevant in that respect.
However, despite the fact that GraphQL is easier to comprehend for most people than specialized graph query languages, there’s little chance that the GraphQL specification will evolve in that direction. But that, Jain and McKenzie noted, is fine. This is how committees, specifications, and open source works.
The GraphQL specification has come out of Facebook, and Facebook engineers still are a large part of the steering of GraphQL. The question becomes, do they need it at their scale? And if they don’t need it at their scale, why should anyone else? And then you get into the debates about the patterns of technology usage.
Still, McKenzie noted, we’re talking about the democratization of the business models of some of the biggest companies in the world. They are based on graphs of information and the massive network effects they’re in. So whoever manages to technologically democratize that for end users is going to be off to the races.
That’s the vision for Dgraph, Jain chimed: to take this complex graph stuff that people are afraid of and make it usable and accessible to developers, and bring it to the modern world. Dgraph is not the only one, as there is a burgeoning graph ecosystem. But Dgraph is the only one using a GraphQL-based query language.
Jain mentioned how they had to reassure people that they knew what they were doing. Yes, they do understand that GraphQL was not meant to serve as a database query language, let alone a graph database query language. Dgraph forked GraphQL to what is now DQL, and has managed to add graph-specific constructs and algorithms to DQL.
At the same time, people can use both GraphQL and DQL to access Dgraph. It’s a balancing act. People are drawn to Dgraph and DQL because it enables them to do things that are not possible in GraphQL. Still, they want to be able to do them in a simple way, via GraphQL. GraphQL users are asking for more DQL graph features, graph users are asking for more GraphQL features.
As for the future of GraphQL? Graph algorithms and graph-specific extensions are probably not in the roadmap. Things such as name spaces, enabling multiple subgroups to essentially segment data off from one another without clashing with each other, or filters and how they run across different implementations, may be.
At the end of the day, for GraphQL like every other specification, it’s all about striking the right balance of powerful vs. simple.