17th Nov 2020 9 minutes read

The History of SQL – How It All Began

Who created SQL and why? Find out in this article!

Are you learning SQL? Or are you about to take the first step towards working with databases? Great decision! Either way, it's worth knowing the history of SQL – where it came from, who came up with it, and why.

Here’s a brief history of SQL, starting with its foundational concept: the database.

Ted Codd and the Relational Data Model

The first computer databases appeared in the late 1960s. This was an important area of research at the time. Many computer scientists were focused on improving how databases work. One of them was Edgar Frank (Ted) Codd, an English computer scientist employed at IBM. Back in the 1940s, he took part in the Selective Sequence Electronic Calculator project – the world’s first electromechanical computer.

But what Codd is really famous for is an article published in 1970 called A Relational Model of Data for Large Shared Data Banks; this started the era of relational databases in computer science. Codd is therefore often referred to as the forefather of SQL. In 1981, he received the Turing Award, the highest distinction in computer science that’s sometimes called the "Nobel Prize of computing”.

At the time Codd wrote his article, hierarchical and network databases were dominant. They were also quite inflexible. To get data out of the database, you essentially had to write a computer program: the data were not accessible to non-programmers. Any change in the model required changes in data access patterns – in other words, the data access programs basically had to be rewritten.

In his article, Codd proposed a completely new idea: modeling data with the mathematical notion of relations. (Today, we call them tables.) Codd’s relational data model allowed for more flexibility than hierarchical and network data models. New relations could be added without modifying existing relations. Thanks to his ideas, working with databases has become much easier.

System R

Codd’s model was not immediately successful. IBM was not eager to implement his suggestions. At the time, they had IMS, a very successful hierarchical database. They didn't want to undermine their revenue from IMS by building a competing product. (IMS is still developed today, which shows how successful it was.) It wasn’t until 1973 that IBM started System R, a research project to explore Codd’s ideas for the relational data model. Codd did not work closely with the System R team; it’s hard to know why he was pulled out of a project based on his own work. Two people involved in System R development, Don Chamberlin and Ray Boyce, were in charge of creating its query language.

A Query Language for Relational Databases

Joins in Codd’s article

In his seminal article, Codd proposed a set of operations that could be used to extract data from relations. You can think of these operations as the first query language for relational databases. Of course, the syntax was completely different from the SQL we know today; Codd used mathematical notation for this language. Most of the operations Codd proposed can be done in today’s SQL, just with different notation.

At the time, Don Chamberlin was working on hierarchical databases and had studied the language for querying these databases. He immediately understood the impact of Codd’s data model. In 1995, he recalled:

“For Ray and me, our exposure to the relational data model at Codd’s research symposium was a revelation. For the first time, we could see how a query that would require a complex program in the DBTG language could be reduced to a few simple lines using one of Codd’s relational languages. It became a game for the two of us to invent queries and challenge each other to express them in various query languages.” ^[1]

In fact, Codd proposed two different languages for the relational model: relational algebra (the basis for this language was in his original 1970 article), and relational calculus (also known as the language Alpha). Both of these languages used mathematical notation with quantifiers and various mathematical operators. You can see ideas from Codd’s relational algebra in SQL today.

Irv Traiger, who also worked at IBM during this time, added:

“Glenn Bacon, who had the Systems Department, used to wonder how Ted could justify that everybody would be able to write this language that was based on mathematical predicate calculus, with universal quantifiers and existential quantifiers and variables and really, really hairy stuff.” ^[2]

Relational calculus/Alpha became the foundation for QUEL, the query language for Ingres (Interactive Graphics Retrieval System), an early relational database developed by Michael Stonebraker at the University of California, Berkeley. Ingres has evolved into many commercial database applications, such as PostgreSQL.

The Query Game

Even before the System R project started, Chamberlin and Boyce came up with a language they called SQUARE (Specifying Queries as Relational Expressions). They appreciated the power of Codd’s ideas, which allowed them to use a few lines to express complex queries that would take pages in a hierarchical database. However, they were convinced that their language was simpler and more accessible to regular users than Codd’s relational algebra and relational calculus.

“Ray and I were impressed by how compactly Codd’s languages could represent complex queries. However, at the same time, we believed that it should be possible to design a relational language that would be more accessible to users without formal training in mathematics or computer programming.” ^[1]

SQUARE was the foundation for System R’s new query language. SQUARE used a lot of subscripts and some mathematical notation. It was difficult to type on a keyboard. Chamberlin and Boyce decided to adapt it so that it resembled the structure of an English sentence and was easier to type.

“So we began saying we’ll adapt the SQUARE ideas to a more English keyword approach, which is easier to type because it was based on English structures. We called it Structured English Query Language and used the acronym SEQUEL for it.” ^[2]

Two things were important for Chamberlin and Boyce in the design of SEQUEL. First, they wanted it to be accessible to regular users with no mathematical or programming background. System R staff even recruited a group of students to learn SEQUEL and see if they found the syntax easy. Additionally, they wanted the language to contain data modification and data definition elements, which was something very new at the time.

“Ray and I hoped to design a relational language based on concepts that would be familiar to a wider population of users. We also hoped to extend the language to encompass database updates and administrative tasks such as the creation of new tables and views, which had traditionally been outside the scope of a query language.
[...] What we thought we were doing was making it possible for nonprogrammers to interact with databases. We thought that this was going to open up access to data to a whole new class of people who could do things that were never possible before because they didn’t know how to program.” ^[2]

Finally, Chamberlin and Boyce wrote two articles about SEQUEL: one about DML (Data Manipulation Language, e.g. SELECT, INSERT, and UPDATE statements) and one about DDL (Data Definition Language, which is used to create and modify database structure).

“We wrote two papers: one on SEQUEL/DML and one on SEQUEL/DDL. We were cooperating very closely on this. The DML paper’s authors were Chamberlin and Boyce; the DDL paper’s authors were Boyce and Chamberlin, for no special reason; we just sort of split it up. We wanted to go to Stockholm that year because it was the year of the IFIP Congress in Stockholm. I had a ticket to Stockholm because of some work I’d done in Yorktown, so Ray submitted the DDL paper to the IFIP Congress in Stockholm, and the DML paper we submitted to SIGMOD. [...] These were twin papers in our original estimation. We wrote them together and thought they were of comparable value and impact. But what happened to them was quite different. The DDL paper got rejected by the IFIP Congress; Ray didn’t get to go to Stockholm.” ^[2]

And that’s how SEQUEL was born. Later SEQUEL was renamed to SQL because of a trademark issue.

Unfortunately, Ray Boyce passed away shortly after laying the foundations for SQL; he never got to see the impact it would have. In 1974, about a month after presenting a SEQUEL article at a technical conference in Ann Arbor, Michigan, he suddenly died of a ruptured brain aneurysm. He was only 26 years old.

Interestingly, Donald Chamberlin did not consider SQL to be a good language for the way it was used at the time. In 1995 he said:

“When Ray and I were designing Sequel in 1974, we thought that the predominant use of the language would be for ad-hoc queries by planners and other professionals whose domain of expertise was not primarily data-base management. We wanted the language to be simple enough that ordinary people could ‘walk up and use it’ with a minimum of training. Over the years, I have been surprised to see that SQL is more frequently used by trained database specialists to implement repetitive transactions such as bank deposits, credit card purchases, and online auctions. I am pleased to see the language used in a variety of environments, even though it has not proved to be as accessible to untrained users as Ray and I originally hoped.” [1]

SQL Becomes Industry Standard

Over the years, SQL has become an industry standard. For now, it is enough to say that SQL has become the basic language for working with databases. It has been recognized by all important organizations, and market giants such as Google and Facebook use it on a daily basis for many processes.

SQL and databases are currently one of the fastest growing branches of the IT industry. Catching up on this trend can pay off. If you want to start learning SQL from scratch, try our SQL Basics course. If you already know some SQL and want to learn how to better analyze your customers' behavior or revenue trends, I recommend the SQL Reporting track. And if you know SQL and are stuck on a specific problem, check out our SQL Cookbook, which contains many ready-made SQL scripts. Feel free to copy them to your project.

No matter what you choose from the extensive LearnSQL.com offer, now is a good time to learn SQL. It's a language that's 40 years old, but it's not going anywhere. SQL knowledge is a great skill that will be useful in your daily work and will give your career a boost.

Sources:

1. Chamberlin, Donald D. “Early History of SQL” Accessed 11 Nov 2020 from https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6359709

2. McJones et al. “The 1995 SQL Reunion: People, Projects and Politics” Accessed 11 Nov 2020 from http://www.scs.stanford.edu/~dbg/readings/SRC-1997-018.pdf

Tags:

Ted Codd and the Relational Data Model

System R

A Query Language for Relational Databases

The Query Game

SQL Becomes Industry Standard

You may also like