Package com.bigdata.service

This package provides implementations of bigdata services (metadata service, data service, transaction manager service.

See: Description

Package com.bigdata.service Description

This package provides implementations of bigdata services (metadata service, data service, transaction manager service.

A bigdata federation is comprised of the following essential services:

metadata service
The metadata service manages scale-out indices and is used to locate index partitions.
data service
The data service supports reads and writes on index partitions.
Clients are responsible for discovering the metadata service. Clients then use the metadata service to manage scale-out indices and to locate and cache leases for data service instances having data for key range partitions that the client will read or write. If a client will use transactions (vs ACID batch operations), then the client must also locate the transaction manager service. Service location is performed using JINI, but other service locator protocols are possible.

bigdata provides the following optional services. These are not considered essential since (a) they are not required by clients that make direct use of bigdata as a distributed database; and (b) they are themselves applications of the services briefly described above.

transaction manager service
Clients use the transaction manager service to start new transactions and to coordinate the distributed commit protocol. The transaction manager also provides a centralized timestamp factory.
map and reduce service
Manages the distributed execution of a functional program.

deployment

In order to deploy a bigdata federation, the following preconditions must be met (a standalone deployment can be realized by meeting these preconditions on a single host). This will go more smoothly if JINI is running before you start the various bigdata services since they will then be able to register themselves immediately on startup. Service discovery by default uses the local network and the "public" group. If you want to run more than one bigdata federation on the same network, then you MUST edit the configuration file.

Hosts:

TODO:
review the use of -Xms (sets the initial heap size). How much of a difference does this make? Note that JRockit defaults to 75% of physical memory or 1G, which ever is less.
  • the bigdata-server distribution is installed.
  • the data service is running.
  • In addition, there must be at least one host on which the following preconditions are true (a different host could be used for each of these services, and more than one instance of these services may be running on the network). These services should be started during the boot procedure so that they are available whenever the host is up and running.

    • jini 2.x is installed and running. This will be used to register and lookup services. It helps if jini is running before you try to start the bigdata services since they will be able to register themselves immediately in that case.
    • a metadata server is running.
    • one or more data server(s) are running.
    • a transaction server is running (optional).
    • one or more map and reduce servers are running (optional).

    Clients:

    • java 5 or better is installed. The use of a server VM is highly recommended
    • the bigdata-client deployment package is installed. This includes bigdata-client.jar, but also support for generating unicode keys (ICU4J), etc.

    unique identifiers

    Data Service

    The serviceID identifies the data service. If the service dies and then restarts on the same host with the same data, then it is still the same service instance. If the service dies and is recreated on another host with the same data, then it is still the same instance. What is important is that (a) the service has the same data (journals, index segments, and serviceID); and (b) that the data for that service has not become stale (or inconsistent).

    The easiest way in which the data can become stale is for the service to fail. When failover is operating, clients will be redirected to one or more other services having consistent data for the same index partitions. If clients then write on _any_ of those index partitions and commit, then the old service now has stale (aka inconsistent) data. A service with inconsistent data MUST NOT be used.

    In general, the majority of state for a service will live in its index segments. This can be hundreds or thousands of times more data in the index segments than there is in any single journal. This can make it worth while to make the service consistent again.

    Journal
    Each journal has a UUID that is generated when the journal is created (it is stored in the root block). This serves to uniquely identify the journal as a resource regardless of where it may be located on the host or in the file system.
    Index Segment
    Each index segment has a UUID that is generated when the index segment is created (it is stored in the fixed length index segment metadata record). This serves to uniquely identify the index segment as a resource regardless of where it may be located on the host or in the file system.
    Index
    Named indices are assigned UUIDs when they are created. This makes it possible to rename an index, since its UUID remains the same.
    metadata service
    transaction manager service
    job scheduler service

    downloaded code

    While bigdata makes use of jini for service registration and discovery, it does not use downloaded code for executing its basic services (the necessary interfaces and implementation classes are already bundled in bigdata.jar). However, clients MAY use downloaded code in order to have data servers execute remote procedures or extension batch operations. In this scenario, the client will create and register a JINI service. The service should perform all of its operations locally. The data server will lookup the service using jini and then execute it locally.

    In order for downloaded code to work correctly, clients must ensure that their classes are exposed by an HTTP server accessible to bigdata servers and declared to the VM in which the client starts using: -Djava.rmi.server.codebase=...

    , consider this use case further. Are there other/better ways for executing remote code on the data server? At a minimum, we should supply a base class and ant script support for creating such remote services.

    Copyright © 2006-2012 SYSTAP, LLC. All Rights Reserved.