2007年5月15日星期二

The DataRush Team

The DataRush development team studied the problem and immediately began discussing the solution in the context of a classic dataflow graph. They knew parallel computing was the answer and they knew the cost of 8, 16 and 32 multicore servers would almost certainly decline every year, so the surveillance framework had to automatically scale as CPU and memory resources were added by IT operations personnel.

A quick prototype of the dataflow pipeline on a whiteboard showed that the team could break the problem into two main sections that could be built independently:

Reading the hit-list table and transaction file to find possible matches
Enriching matches with address information to find implicit associations among the perpetrators’ home addresses (using a well-known algorithm called K means)

没有评论: