by shigemk2

当面は技術的なことしか書かない

In-Place Computing for Big and Complex Data #linuxcon

What is In-Place computing?

In-place conputing is a ste of principles for storing and computing big volume of data. (The Term in-place is inspried by execute-in-place)

Real-Time Collaborative Planning

Update Propagation in Other Views

Sales Director Store Category

100M to billions data records to process

Challenges

Real-Time (Interactive)

Challenge Interractive Response

Interactive response time is expected similar to a web page loading time

WHy is this a challenge for today's

データ量の増加とともに処理時間も指数関数的に増える

Challenge: Data Model

Relational databases only support tables

It is not easy to write pure SQL for distribution

Challenge: Limited Budget

Today's Storage=based computing Model

we take for granted for 40 years

heavy data transfer

In-Place Computing Model

Assumption: Data objects are stored in a unlimited virtual memory space.

No concept of Storage No data retrieval

64 bit architecture is the key enabler

  1. Persistent and ready for computing
  2. Big enough to hold the entire big data
  3. data is organized into macro objects that can be expresd and manipulated in macro level
  4. data centric computing, in which computations take place where data are stored

So What? We can trade space for time

オーダーも指数関数からO(N)へ

BigObject Store

In-Place computing system for multidimensional data domain.

Extended relational model-table object, tree object, matrix object

Trhee mechanisms

  • in-memory
  • transformation
  • data metric

In-Memory mechanism

memory mapping

memory as file

paging on-demand

i.e. system call mmap()

An approximation to computation takes place

Multidimensional Data

channlel dimentiontable

Trans-Join (Transformative Join)

channel.data channel.store product.category product.sku

Program Tag

Macro Expression

BigObject Store Framework

BigObjectStore

BigObject for Collaborative Filtering

Basic technique used by some recommender systems

Experiment Envirnment

Hardware (-= &S $3000)

Performance Improvement in Two Orders of Magnitude

In-database vs In-place computing

Scale-Out, Scale-up, Scale-In(best performance)

Demo