Preparing data

Feedback


The input data sources supported by the iServer distributed analysis service include the following. After the data is ready, iServer will filter out all the datasets which meet the analysis condition when creating the specific analysis job.

iServer DataStore

iServer DataStore is an application that allows you to quickly create data storage and associate the data storage with iServer. For how to build the iServer DataStore distributed environment, please refer to Build distributed iServer DataStore environment.

The relational datasets in iServer DataStore are from the two sources:

Big data file sharing

The iServer administrator can register the CSV file, the UDB file, and the HDFS directory as iServer's big data file sharing. For the registration method, see:  Register big data file sharing. The datasets in big data file sharing which has been registered successfully will appear in the datasets resource of the Data Category Service and will also be used as input data for the distributed analysis service.

The csv data files registered to iServer need to be validated for distributed analysis service. The validation method is:

  1. Access the data registration page: http://localhost:8090/iserver/manager/datastores, find the registered file data storage item, click the Storage ID to open the dataset list;
  2. In the dataset list, the "Status" column with a question mark indicates that the file is not validated yet;
  3. Click the csv dataset name, specify X/Y index in the popped up dialog.
  4. Click OK. When its status is changed to , it means it is validated successfully.

 

If you use distributed analysis service with an unregistered csv data, you need ensure that a corresponding .meta file exists in the csv storage path which contains meta information for the csv data file. For example, the content of the .meta file for the sample data newyork_taxi_2013-01_14k.csv under [iServer installation directory]/samples/data_en/ProcessingData directory is:

    "FieldInfos": [
        {
            "name": "col0",
            "type": "WTEXT"
        },
        {
           "name": "col1",
            "type": "WTEXT"
        },
        {
            "name": "col2",
            "type": "WTEXT"
        },
        {
            "name": "col3",
            "type": "INT32"
        },
        {
            "name": "col4",
            "type": "WTEXT"
        },
        {
            "name": "col5",
            "type": "WTEXT"
        },
        {
            "name": "col6",
            "type": "WTEXT"
        },
        {
            "name": "col7",
            "type": "INT32"
        },
        {
            "name": "col8",
            "type": "INT32"
        },
        {
            "name": "col9",
            "type": "DOUBLE"
        },
        {
            "name": "X",
            "type": "DOUBLE"
        },
        {
            "name": "Y",
            "type": "DOUBLE"
        },
        {
            "name": "col12",
            "type": "DOUBLE"
        },
        {
            "name": "col13",
            "type": "DOUBLE"
        }
    ],
    "GeometryType": "POINT",
    "HasHeader": false,
    "StorageType": "XYColumn"
}

Spatial database

The iServer administrator can register the HBase, Oracle, PostgreSQL, POSTGIS and MONGODB databases as the spatial database of iServer through the "Register data storage" function on the Cluster>Data registration page. For the registration method, please refer to the Register spatial database. The datasets in the spatial database which has been registered successfully will appear in the datasets resource of the Data Category Service and will also be used as input data of the distributed analysis service.