How Product Records Are Merged

For each record we collect, we generate 1 or more keys for the record. Each key value is based on different unique identifiers that are available from the record's data. If we see a different record with 1 or more of the same keys values, we will merge these two records.

For example, we may generate a product record like this when crawling a web page:

{
  "name": "Tectronics CD Player",
  "manufacturer": "Tectronics",
  "manufacturerNumber": "A342",
  "upc": "014045125963"
}

This record will generate the following keys:

"keys": [
  "Tectronics/A432",
  "014045125963"
]

Let's say we then crawl another web page for the same product and generate this data:

{
  "name": "Tectronics CD Player",
  "colors": [
  	"black",
    "red"
  ],
  "upc": "014045125963"
}

This record will generate the following keys:

"keys": [
  "014045125963"
]

When we import both of these records into our product database, the records will merge, since they share at least one key value. The resulting record will be:

{
  "name": "Tectronics CD Player",
  "colors": [
  	"black",
    "red"
  ],
  "manufacturer": "Tectronics",
  "manufacturerNumber": "A342",
  "keys": [
 		"Tectronics/A432",
  	"014045125963"
  ],
  "upc": "014045125963"
}

Product records use the following fields to generate keys:

  • brand
  • ean
  • isbn
  • manufacturer
  • manufacturerNumber
  • upc
  • vin
  • websiteIDs

brand and manufacturer are always used in conjunction with manufacturerNumber.