In this lab, you will write a set of operators for ColgateDB that provide support for querying and modifying the data.
See previous labs.
See previous labs.
Before starting this assignment, please be sure to read Ch 12.4.
The set of operators in ColgateDB provide support for performing relational algebra operations – such as select, project, join – as well as operations to update the database – such insertions, and deletions.
The operators rest on top of the foundation that you wrote in previous labs to provide you with a database system that can perform simple queries over multiple tables. You do not need to implement transactions or locking in this lab though you may continue to see mentions of it in the code.
A few details:
Join
.You are expected to modify these files:
Delete.java
Filter.java
Insert.java
Join.java
JoinPredicate.java
OperatorMain.java
Predicate.java
SeqScan.java
You are expected to read, use, but not modify the remaining files. You should also note that include with this lab is a small database. The files are college.schema
, students.dat
, takes.dat
, courses.dat
, and profs.dat
.
Here are the contents of the database:
students(sid(INT_TYPE), name(STRING_TYPE))
------------------------------------------
1 alice
2 bob
3 alice
4 charles
5 alice
takes(sid(INT_TYPE), cid(INT_TYPE))
-----------------------------------
1 301
2 301
2 460
4 460
1 302
courses(cid(INT_TYPE), title(STRING_TYPE))
------------------------------------------
301 os
302 algo
460 db
profs(pid(INT_TYPE), name(STRING_TYPE), favoriteCourse(INT_TYPE))
-----------------------------------------------------------------
10 sommers 301
20 hay 460
30 ramachandran 302
Here is a suggested sequence of tasks:
SeqScan
. This class sequentially scans all of the tuples from the pages of the table specified by the tableid in the constructor. If this sounds familiar, that’s because SeqScan
will (indirectly) call the iterator you wrote for HeapFile
. SeqScan
is essentially a wrapper that talks to the Catalog, retrieves the appropriate DbFile
object and obtain that object’s DbFileIterator
. (Recall that your HeapFile
class is an implementation of the DbFile
interface.) The only method that is non-trivial is the constructor which updates the TupleDesc
renaming the attributes as necessary (see the javadocs).Predicate
. The implementation should pass PredicateTest
.Filter
. Before working on the Filter
operator, you may find it helpful to look at the implementation of the Project
operator. (Like Filter
, Project
is a subclass of Operator
.) Once complete, you should be able to pass FilterTest
.OperatorMain
which effectively computes the following query:
SELECT *
FROM Students
WHERE name="alice"
When run, you should see something like this:
Loading schema from file: college.schema
Added table : Students with schema sid(INT_TYPE), name(STRING_TYPE) Table has 1 pages.
Added table : Takes with schema sid(INT_TYPE), cid(INT_TYPE) Table has 0 pages.
Added table : Courses with schema cid(INT_TYPE), title(STRING_TYPE) Table has 1 pages.
Added table : Profs with schema pid(INT_TYPE), name(STRING_TYPE), favoriteCourse(INT_TYPE) Table has 1 pages.
Query results:
1 alice
3 alice
5 alice
JoinPredicate
. The implementation should pass JoinPredicateTest
.Join
. In class, we will look at sophisticated algorithms for performing joins. In this lab, however, you are only expected to perform a “simple” nested loops join. That said, you will still need to think carefully about how to implement a nested loops join when you only have access to iterators over the inputs being joined. Spend time writing out small examples on paper. After you’ve correctly implemented Join
, you should be able to pass JoinTest
.Task 6 Modify OperatorMain
to execute a sequence of operators equivalent to this query:
SELECT S.name
FROM Students S, Takes T, Profs P
WHERE S.sid = T.sid AND
T.cid = P.favoriteCourse AND
P.name = "hay"
It is up to you to decide how to translate this SQL query into an operator sequence. Note that you may need to use the Project operator, which is included in the repo. Also, do not assume that Hay’s favorite course is 460 – your query must work correctly even if the database instance changes. The correct answer to the query on the provided data files is:
Query results:
bob
charles
Insert
and Delete
operators. For plans that implement insert and delete queries, the top-most operator is a special Insert
or Delete
operator that modifies the pages on disk. It does so by looking up the appropriate table in the Catalog and calling the insertTuple and deleteTuple methods on the resulting DbFile
. (Recall that your HeapFile
class is an implementation of the DbFile
interface.) These operators return the number of affected tuples. This is implemented by returning a single tuple with one integer field, containing the count. Note that even when no tuples are inserted or deleted, this operator should still return a tuple (with a count of zero). The Insert operator adds the tuples it reads from its child operator to the tableid specified in its constructor. It should use the DbFile.insertTuple()
method to do this. When completed, you should be able to pass InsertTest
. The Delete operator deletes the tuples it reads from its child operator from the tableid specified in its constructor. It should use the DbFile.deleteTuple()
method to do this. Use the RecordId of each tuple to find the DbFile from which it should be deleted. When completed, you should be able to pass DeleteTest
.The milestone for this lab will be to complete the above tasks and pass the tests described above.
Upload your files to Gradescope.
This exercise is an optional challenge problem. Only attempt these after completing the rest of the assignment. Completing challenge problems may earn you a small amount of extra credit at the end of the semester.
An additional operator implements basic SQL aggregates with a GROUP BY clause. You should implement the five SQL aggregates (COUNT, SUM, AVG, MIN, MAX) and support grouping. You only need to support aggregates over a single field, and grouping by a single field.
In order to calculate aggregates, we use an Aggregator
interface which merges a new tuple into the existing calculation of an aggregate. The Aggregator
is told during construction what operation it should use for aggregation. Subsequently, the client code should call Aggregator.mergeTupleIntoGroup()
for every tuple in the child iterator. After all tuples have been merged, the client can retrieve a DbIterator
of aggregation results. Each tuple in the result is a pair of the form (groupValue, aggregateValue)
, unless the value of the group by field was Aggregator.NO_GROUPING
, in which case the result is a single tuple of the form (aggregateValue)
.
Note that this implementation requires space linear in the number of distinct groups. For the purposes of this lab, you do not need to worry about the situation where the number of groups exceeds available memory.
Implement IntegerAggregator
, StringAggregator
, and Aggregate
. In coding your implementation, you might find it handy to use the TupleIterator
. At this point, your code should pass the unit tests IntegerAggregatorTest
, StringAggregatorTest
, and AggregateTest
.