The February 2008 issue of php|architect features an article by Mac Newbold titled "Database Design for PHP Programmers". Here's the blurb:
While PHP can do a lot of great things by itself, it can do much more when you add a relational database. Whether you use MySQL, PostgreSQL, Oracle, MicroSoft SQL Server or SQLite, the way you design your schema and build your tables has a big impact on the abilities of your application. By the same token, mistakes in your database structure can be very difficult to fix or overcome. A little bit of careful planning can go a long way.
It's a pretty good article about database design and I think all PHP developers who are just starting to integrate databases in their applications should read it. However, there are a few things I'd like to add and point out.
SQL does not mean "all non-PHP code that has to do with the database"
Here's a quick primer:
SQL - Structured Query Language. This includes all your SELECT queries.
DML - Data Modification Language. This includes any statements that change data in your RDBMS. All your INSERT, UPDATE, DELETE, MERGE go here.
DDL - Data Definition Language. How you define objects in your RDBMS. A database object is a table, index, sequence, etc. Your CREATE TABLE, DROP INDEX, etc. fall under this category.
It makes me cringe when people say "SQL statements to create your tables." I understand that SQL has kind of become to mean "all database code", but it really isn't. There's a huge difference between the three categories I listed above and it really helps developers organize how they think about database development if they understand the differences between the three.
For example, did you know that you can't create a table with auto-increment starting at a value which is the next auto-increment value from another table? Why is that? The answer is simple: you can't mix DDL (create table) with SQL (selecting a new value). If you understand that DDL and SQL are different, the answer is obvious. If you don't get the distinction between the three, the answer can be kind of magical and confusing.
constraints vs. indexes vs. keys
You can think of a constraint as a "check" that your RDBMS performs when you execute a DML statement.
An index is a physical object in the database (created with DDL).
A key is more of a conceptual construct which combines the characteristics of a constraint and an index. It is a constraint in a way that it makes you RDBMS enforce a rule when DML is executed (e.g. ids of users must be unique and sequential). It is an index in a way that it is stored separately in the database and can be used for querying (via SQL). You create it via DDL, of course.
I think the article's explanation about keys, primary keys and foreign keys was aimed at newer database developers, and especially ones using MySQL. Good database design stems from understanding the relationships between database objects and constraints and indexes help enforce those relationships. Since you, the developer is who puts these constraints and indexes in place, it is in your benefit to learn how they work.
Designing for performance
I applaud the article for pointing out that good design takes precedence over optimization and that good design provides good potential for good optimization practices. I also really liked him pointing out that it's important to know what you should optimize. Figuring out what your bottleneck is will go a long way toward figuring out what next steps you should take.
Speaking of optimization...
Mac's article is definitely very MySQL-oriented. I would really love for everyone to enjoy reading Oracle documentation as much as I do. But if you don't, at least read this awesome part about Oracle's EXPLAIN PLAN which beats MySQL's by a long shot. I think the article does not focus much on the explain plan because MySQL's support for it is very limited. Oracle gives you so much information in an explain plan that it is really no excuse to not know what your queries are doing.
Let the database handle data
The article points out that it's sometimes difficult to know what code to store in the database and what in PHP. Personally, I'd love to keep it all in the database (just so you know my bias), but even if my views were more liberal in this respect, I'd disagree with the article that the choice may sometimes be unclear (and definitely in the examples the article provides).
The word "database" contains the string "data" - that should be a pretty good clue what the database is for. (Ok, that's my snarkiness for this post.) A database is really REALLY good at handling data. Enforcing auto-increments or managing referential integrity (e.g. foreign keys) via PHP would not only be cumbersome and error-prone, but it also violates the principle of using the best tool for the job.
Overall, it was a pretty good article, albeit with a MySQL slant. I'm glad that the PHP community recognizes that most PHP applications will at some point use a database and that solid database development skills in PHP developers are important. Learning how to write database code well can also give you a whole new set of solutions to a lot of challenges you encounter in your application.
<3 databases!
Comments (1)
I <3 Databases! Rock on.
Posted by Ligaya Turmelle | February 29, 2008 11:00 AM
Posted on February 29, 2008 11:00