Tuesday, October 4, 2011

Handling Multiple Child Records - Part 1

I'm currently working on a data feed project for downstream process. Using this application, the end user can create his/her own template (Type of fields, length, order etc) and then use that template to download the data for a given time period in 3 different formats; pipe delimited, tab delimited or fixed width. The table structure is as follows:

Base Table (BT) {BT_ID, col2, col3, ... , coln}
Child Table 1 (CT1) {BT_ID, Seq_No, col3, col4, ... , coln}
Child Table 2 (CT2) {BT_ID, Seq_No, col3, col4, ... , coln}
Child Table 3 (CT3) {BT_ID, Seq_No, col3, col4, ... , coln}
Child Table 4 (CT4) {BT_ID, Seq_No, col3, col4, ... , coln}
Child Table 5 (CT5) {BT_ID, Seq_No, col3, col4, ... , coln}
Child Table 6 (CT6) {BT_ID, Seq_No, col3, col4, ... , coln}

where BT_ID is the PK of the parent table and FK in child tables. BT_ID and Seq_No combination is  PK in child tables. My query is something like this:

SELECT BT.COL2, BT.COL3, ..., BT.COLN,
             CT1.COL3, CT1.COL4, ..., CT1.COLN,
             CT2.COL3, CT2.COL4, ..., CT2.COLN,

             CT3.COL3, CT3.COL4, ..., CT3.COLN,
             CT4.COL3, CT4.COL4, ..., CT4.COLN,
             CT5.COL3, CT5.COL4, ..., CT5.COLN,
             CT6.COL3, CT6.COL4, ..., CT6.COLN
   FROM BT, CT1, CT2, CT3, CT4, CT5, CT6
 WHERE BT.BT_ID = CT1.BT_ID (+)
      AND BT.BT_ID = CT2.BT_ID (+)
      AND BT.BT_ID = CT3.BT_ID (+)
      AND BT.BT_ID = CT4.BT_ID (+)
      AND BT.BT_ID = CT5.BT_ID (+)
      AND BT.BT_ID = CT6.BT_ID (+)

Now, there are two cases. First, one base record has only one type of child record i.e. A record in BT has child records only in CT1. Second, one base record has more than one type of child record i.e. A record in BT has child records in CT1 and CT2. The latter case in my application is very rare, say less than 0.01% of cases will fall under second category but still we have to handle those type of records. I'll discuss that solution in my second post. For now, let's assume that each base records can have only one type of child record.

To explain the ensuing scenario better, let me show a sample data fetched by the above query:

BT    CT1      CT2    CT3    CT4    CT5    CT6
b1     b1-1     null     null     null     null     null
b1     b1-2     null     null     null     null     null
b1     b1-3     null     null     null     null     null

b2     null       b2-1   null     null     null     null
b2     null       b2-2   null     null     null     null

In the above scenario, b1 record has 3 child records in CT1. b2 record has 2 child records in BT2. Also, "b1-1" represents two columns from CT1,BT_ID and Seq_No respectively.

The problem is that when user creates a template, (s)he can select fields from any of the 7 tables. So, there are 4 different cases:

Case 1: Template has fields from BT only.


Actual Output: Result set will have 5 rows:


b1
b1
b1
b2
b2

Expected Output: Result should have 2 rows:

b1
b2

Case 2: Template has fields from BT and CT1 only.

Actual Output: Result set will have 5 rows:

b1     b1-1
b1     b1-2
b1     b1-3
b2     null
b2     null

Expected Output: Result set should have 4 rows:

b1     b1-1
b1     b1-2
b1     b1-3
b2     null

Case 3: Template has fields from BT and CT2 only.

Actual Output: Result set will have 5 rows:

b1     null
b1     null
b1     null
b2     b2-1
b2     b2-2

Expected Output: Result set should have 3 rows:

b1     null
b2     b2-1
b2     b2-2

Case 4: Template has fields from BT, CT1 and CT2.

Actual Output: Result set will have 5 rows:


b1     b1-1     null
b1     b1-2     null
b1     b1-3     null
b2     null       b2-1
b2     null       b2-2

Expected Output: Result should have 5 rows:

b1     b1-1     null
b1     b1-2     null
b1     b1-3     null
b2     null       b2-1
b2     null       b2-2

So basically, the problem is, I should only repeat the records if the fields are selected in the template.All the cases, except 4, fail. Unfortunately, I can not apply DISTINCT as it'll give the wrong result set in many other scenarios. For example, when the child data is repeating itself or is NULL, it'll only show one row whereas I should show as many number of rows as present in child table for that record.  To overcome this problem, we used the below approach.

1. Generate template key. This key was a 6 character string made up of 1s and 0s, much like a binary number. Each character representing child table. If the char is 1, that means the fields for that child table is present in template. So mapping is as follows:

CT1 -> 1st Digit
CT2 -> 2nd Digit
CT3 -> 3rd Digit
CT4 -> 4th Digit
CT5 -> 5th Digit
CT6 -> 6th Digit

If the key generated is 111111, then at least one field from all the child tables is present in template. If the key generated is 000000, then none of the fields from the child tables is present. If the key is 100000, then at least one field from CT1 is present in template and so on and so forth.

2. Generate record key. This key is similar to template key except instead of 1s and 0s, it has digits from 1-6 in the same order. So, for a record with 0 child records, it'll be 000000. For a record with child records in CT1, it'll be 100000. For a record with child records in CT2, it'll be 020000. For a record with child records in CT3, it'll be 003000. For a record with child records in CT4, it'll be 000400. For a record with child records in CT5, it'll be 000050. For a record with child records in CT6, it'll be 000006.

3. Map record key to template key using base key, '123456', and generate child_type. So for example, if the template key is 100000, record key is 003000, then translate function, TRANSLATE('003000','123456','100000'), will return 000000. If the record key is 100000, then the translate function, TRANSLATE('100000','123456','100000'), will return 100000. Similarly, whenever the record has child data and template has at least a field from the corresponding child table, the child_type will always be more than zero.

4. Assign an order to each record based on seq no. Also, a record with no child records will have order as 1. So, order (in first column) would be something like this:

1     b1     b1-1     null     null     null     null     null
2     b1     b1-2     null     null     null     null     null
3     b1     b1-3     null     null     null     null     null
1     b2     null       b2-1   null     null     null     null
2     b2     null       b2-2   null     null     null     null

1     b3     null       null     null     null     null     null

 5. Filter out all the records which doesn't meet either one of the following two conditions; order = 1 or child_type > 0.

That's it. The records will always be generated per template. So, the final query after implementing above steps will look something like this:


SELECT BT.COL2, BT.COL3, ..., BT.COLN,
             CT1.COL3, CT1.COL4, ..., CT1.COLN,
             CT2.COL3, CT2.COL4, ..., CT2.COLN,
             CT3.COL3, CT3.COL4, ..., CT3.COLN,              CT4.COL3, CT4.COL4, ..., CT4.COLN,
             CT5.COL3, CT5.COL4, ..., CT5.COLN,
             CT6.COL3, CT6.COL4, ..., CT6.COLN
FROM
(SELECT  GREATEST(NVL(CT1.Seq_No,1), NVL(CT2.Seq_No,1),
                                    NVL(CT3.Seq_No,1), NVL(CT4.Seq_No,1),
                                    NVL(CT5.Seq_No,1), NVL(CT6.Seq_No,1)) CHILD_ORDER,
             TO_NUMBER(TRANSLATE(
             DECODE(CT1.Seq_No, NULL, 0, 1)||DECODE(CT2.Seq_No, NULL, 0, 2)||
             DECODE(CT3.Seq_No, NULL, 0, 3)||DECODE(CT4.Seq_No, NULL, 0, 4)||
             DECODE(CT5.Seq_No, NULL, 0, 5)||DECODE(CT6.Seq_No, NULL, 0, 6)
             , '123456', <template_key>)) CHILD_TYPE,
             BT.COL2, BT.COL3, ..., BT.COLN,
             CT1.COL3, CT1.COL4, ..., CT1.COLN,
             CT2.COL3, CT2.COL4, ..., CT2.COLN,
             CT3.COL3, CT3.COL4, ..., CT3.COLN,             CT4.COL3, CT4.COL4, ..., CT4.COLN,
             CT5.COL3, CT5.COL4, ..., CT5.COLN,
             CT6.COL3, CT6.COL4, ..., CT6.COLN
   FROM BT, CT1, CT2, CT3, CT4, CT5, CT6
 WHERE BT.BT_ID = CT1.BT_ID (+)
      AND BT.BT_ID = CT2.BT_ID (+)
      AND BT.BT_ID = CT3.BT_ID (+)
      AND BT.BT_ID = CT4.BT_ID (+)
      AND BT.BT_ID = CT5.BT_ID (+)
      AND BT.BT_ID = CT6.BT_ID (+)
) WHERE CHILD_TYPE > 0 OR CHILD_ORDER = 1;


This will handle all the cases except when the same record has multiple type of child records. In that case, the joining condition will fail the number of records returned will always be more. More on that in second part.