Predix_Logo
  • Categories
    • Questions
    • Predix | Updates
      • Pricing
      • Product
    • Deloitte - Private
    • How-To
    • Accenture-Private
  • Explore
    • Topics
    • Questions
    • Articles
    • Feedback or Feature Requests
  • Sign in
  • Home /
  • How-To /
avatar image
  • Home /
  • How-To /

Ingest and Visualize Data using the Data Management Workbench (Beta 10.4)

  • Export to PDF
Jon Zucker - community manager created · Feb 16, 2018 at 05:15 PM · Omer Elhiraika edited · Mar 13, 2018 at 03:40 PM
0

Problem

You are new to Predix Studio and attempting to build your first Studio app. You might have tested some queries using the Indexer, and maybe even set up some charts using sample data. But how do you start working with your own data? How do you link your data sources to Predix Studio?

The Data Management Workbench (DMW) is Predix Studio's fully integrated, intelligent data schema discovery and modeling feature. By using an AI engine to quickly learn about your data, the DMW makes it easy to create models and set up the data ingestion pipeline.

This guide uses traffic data from the Chicago Data Portal to illustrate how to use the DMW. You will use the DMW to perform the following tasks.

  1. Create a project.

  2. Use the AI engine to learn about your data.

  3. Create and publish a visualization model.

  4. And finally, create adapters to ingest data.

Dependencies:

Access to Predix Studio

Create a new project

The Project page is used to create and manage Data Management projects. This section shows how to create a project, update the project's options and description, review the log records created for the current project session, and optionally delete the project entirely.

In this phase, you will create a new project to manage the Chicago data.

  1. From the Data Integration menu, choose Data Management Workbench. alt text

  2. Click New Project, then enter com.chicago in the Project Name field.

    alt text

  3. Click Insert, and verify that you see the following page.

alt text

Here's the Project page breakdown:

  1. The menu on the left shows the current phase you are on and the next available phase (Project in this example).

  2. Options: This XML code specifies how the AI engine learns from your data sources. This will be relevant in the Explore phase of the project. Check out Project Options for more details on configuration.

  3. Description: A brief description of your project.

  4. Project Log: Records of actions completed.

For more details about this phase including configuring options, check out the Project Section documentation on Predix.io.

Configure sources

To learn more about a data set and generate a model, the source data files must be added to the project. This section shows you how to list the available sources and then add them to your project.

Here you will add the Chicago data to the DMW, and then move them into your project.

Step 1: In the left navigation menu, choose Source.

alt text

Source page breakdown:

  • The phase timeline on the top left shows the status of your project. (The orange label indicates the current phase (Source) is in progress.)

  • Data Sources: Sources that are added to your project

  • Available Data Sources: Sources in the workbench that can be added to your project.

Step 2: Download the sample CSVs and drag/drop them into the Available Data Sources grid.

You should see them upload and appear in the grid.

alt text

Step 3: Click the green arrows next to each Chicago data CSV to move them into your project.

alt text

You might notice the Source label in the status timeline changed colors from orange to green. That means you have successfully added your sources to the project.

To learn more about configuring sources such as HTTP sources, check out the guide on Adding Data Sources to a Project.

Learn from the data

This section shows you how to trigger the engine to "learn" about your data - from the data itself. It learns about connections and entities, verifies the keys, then trains its classifier. The engine then generates what it has learned in the form of a canonical model.

In this phase, the DMW explores the structure of the Chicago data you added. The AI engine will create a model for you to start working with.

  • In the left navigation menu, choose Explore.

alt text

Explore page breakdown:

  • Entities: These are the entities that your data sets represent.

  • Learned Unique Keys: Learned unique keys from your data sets.

  • Data Source Fields: The fields of your data sources with column index.

  • Learned Connections: Deductions of what the most likely connections between entities are.

  • Learned Features: Types of data each source field represents.

    Learn Entities

  • Click Learn Entities.

    alt text

The Learn Entities label turns yellow, indicating the AI engine is processing. This may take a few moments.

alt text After the AI engine finishes processing, the label turns green. The engine learned some unique keys from the traffic data. The Learned Unique Keys and Data Source Fields data grids should be populated.

  • Click Verify Keys and wait for the engine to test the new keys.

The engine tests how likely the learned keys are unique within each entity. The % Confidence column in the Learned Unique Keys should be populated.

alt text

Note: Notice that for the entity ChicagoCongestion, the regionId uniquely identifies the entity with 100% confidence, but other fields (and a combination of fields) may also identify this entity.

  1. Click Learn Connections to discover how the entities connect with each other.

The engine deduces most likely connections (relationships) between entities. The Learned Connections and Learned Features data grids should be populated.

alt text

Take note of the field relationship in the first row (id > regionId). We will discuss if this connection is valid in the Visualize phase. - Click Train Classifier and wait for the engine to store recent learnings.

The classifier is used in the Ingest phase to map data source fields with the fields of the model you are creating.

Other features of learning process

  • If you want to stop a step-in progress such as Learn Entities, you can click Stop Learning.

![alt text

  • To delete your learning data and restart the learning process, you can click Clear Learning. (This displays a confirmation prompt.)

To learn more, check out the documentation on the Explore Section.

Create a visualization model

This section shows you how to view, explore, validate, and manipulate the generated canonical model and then save it for future use.

In this phase, you see a visual model of what the AI engine learned in the Explore phase, and you will verify the relationships between the entities.

  • In the left navigation menu, choose Visualize.

alt text

Visualize page breakdown:

  • The main view displays the visual of your model. Each point represents an entity, and between them, connections, if they exist.

  • Save: Saves your changes to the model.

  • Undo: Undoes changes made to the model.

You can click and drag the entities in the model to rearrange the view

Step1: Click the discovered connection between ChicagoCongestion and ChicagoTraffic.

Step 2: Then click the single connection to expand the discovered connection details.

alt text

Looks like the engine found the regionId values of ChicagoCongestion contained in the id values of ChicagoTraffic.

Is this a useful connection, or an unfortunate coincidence? A local table row id contained a set of values [1, 2, 3, ...] that another entity contained in a data field [1, 2, 3, ...]. In this case, it is undesired.

Step: 3: Click the detailed connection data between ChicagoParkingPermits and ChicagoCongestion.

alt text

Looks like the machine found another undesired (false positive) connection between ChicagoCongestion regionId and ChicagoParkingPermits id.

Step 4: Remove each of the discovered connections. (They are all undesired because the entities are independent of each other and there is no hierarchy.)

a. Right-click each connection.

b. Then click the grey Save button on the bottom right of the screen.

alt text

c. Click Confirm Save and you will see some notifications on the top right.

alt text

To get a better handle on understanding and modifying your models visually, check out this guide on Graph Tools.

Publish your model

After you have finished the basic model manipulation by examining and trimming the models in the Explore and Visualize stages, you can then define them as local data types in the indexer and publish them. You can modify the models before publishing.

Each entity and its fields are housed inside a package. In this phase, you rename the packages appropriately and then publish them in a new model.

Step 1: In the left navigation menu, choose Publish.

alt text

Publish page breakdown:

  • Packages: List of packages

  • Objects: List of fields within a selected package

Step 1: Change the name of each package to reflect the entity each one contains.

Step 2: Click Package1 from the Packages list, then click the edit (pencil) icon to edit the name.

alt text

Step 3: Rename Package1 to StreetNames.

Step 4: Then click the save icon to save changes.

Step 5: Repeat the above step to rename the others.

  • Package2 = Congestion

  • Package3 = VehiclesTowed

  • Package4 = ParkingPermits

  • Package5 = Traffic

    alt text

Step 6: Click the grey Publish All button on the bottom-right side of the page to publish all the packages into the Chicago model.

alt text

Step 7: In the Model Selection dialog, click the blue Create New Model button.

alt text

Step 8: Enter chicago into the Model Name field, and then click Create Model and Publish Packages.

You have now created and saved a new model (com.ged-chicago) based on what was learned in the process.

To learn more about modifying objects and models, check out the documentation on the Publish Section.

Ingesting Data

This section walks through how to map the source data to newly created or existing models, and then create and execute adapters to ingest the source data.*

Step 1: In the left navigation menu, choose Ingest.

alt text

You can use this page to map the source data fields to the model you created.

Step 2: Click the add (+) button, next to the Analyze All button.

Step 3: In the Model Selection dialog, select your model.

alt text

Step 4: Click Set Selected Models to close the dialog.

Step 5: Then click Analyze All.

The Data Mappings grid should now contain the mappings between the source data fields (normalized and labeled under Source Field) and the model fields (under Target Field, with the format objectname_fieldname).

For more information on this process, check out the guide on Ingest Data.

alt text

Now you can run the adapter to ingest the source data using the Chicago model.

Step 6: Click the grey Run Adapter button on the bottom left corner of the page. An orange warning appears in the top right, and it says that you need to create storage (the indices before the data can be ingested.

Step 7: Click Manage Models in the top right corner.

alt text

Step 8: Click the Chicago model, and then click Create Storage.

alt text

Step 9: Click the Run Adapter button again to ingest the data. You should see all green badges in the progress bar at the top.

alt text

Now that you've ingested the data, you can test a query using the Indexer. The following example shows a query for all vehicles towed in Chicago and the result.

alt text

alt text

Other Resources

  • Data Management Workbench

  • Project Options

  • Adding Data Sources to a Project

  • Explore Section

  • Graph Tool

  • Publish Section

  • Indexer Search

thub.nodes.view.add-new-comment
predix-studio
1omer.png (102.2 kB)
2omer.png (37.4 kB)
3omer.png (225.7 kB)
4omer.png (102.0 kB)
5omer.png (73.3 kB)
6omer.png (89.2 kB)
7omer.png (121.5 kB)
8omer.png (73.1 kB)
9omer.png (109.2 kB)
10omer.png (65.3 kB)
11omer.png (67.9 kB)
12omer.png (28.3 kB)
12omer.png (28.3 kB)
13omer.png (102.0 kB)
14omer.png (95.1 kB)
15omer.png (91.4 kB)
16omer.png (79.9 kB)
17omer.png (82.1 kB)
18omer.png (149.8 kB)
19omer.png (22.1 kB)
20omer.png (130.2 kB)
21omer.png (123.0 kB)
22omer.png (48.2 kB)
23omer.png (113.4 kB)
24omer.png (155.1 kB)
25omer.png (118.5 kB)
26omer.png (143.0 kB)
27omer.png (187.1 kB)
28omer.png (68.1 kB)
29omer.png (51.1 kB)
30omer.png (204.6 kB)
31omer.png (109.2 kB)
31omer.png (109.2 kB)
Add comment
10 |1200 characters needed characters left characters exceeded
▼
  • Viewable by all users
  • Viewable by moderators
  • Viewable by moderators and the original poster
  • Advanced visibility
Viewable by all users

Article

Contributors

avatar image avatar image
Follow

Follow this article

45 People are following this .

avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image avatar image

Navigation

Ingest and Visualize Data using the Data Management Workbench (Beta 10.4)

Related Articles

How to Use Significant Terms Aggregation With Director Service Debugger (Beta 10.4)

How to Use Statistical Metrics Aggregations With Director Service Debugger (Beta 10.4)

How to Use Percentiles Aggregation With Director Service Debugger (Beta 10.4)

How to Use Terms Aggregation With Director Service Debugger (Beta 10.4)

How to Use Cardinality Aggregation With Director Service Debugger (Beta 10.4)

How to Use Histogram Aggregation With Director Service Debugger (Beta 10.4)

How to Use the Date Histogram Aggregation With Director Service Debugger (Beta 10.4)

How to Use the Range Aggregation With Director Service Debugger (Beta 10.4)

How to Use the IP Range Aggregation With Director Service Debugger (Beta 10.4)

How to Use the Date Range Aggregation With Director Service Debugger (Beta 10.4)

GE Monogram
  • Legal
  • Cookies
  • Forum Terms
  • Contact Us
  • Copyright © 2017 General Electric Company. All rights reserved.


Enterprise
Social Q&A

  • Anonymous
  • Sign in
  • Create
  • Ask a question
  • Create an article
  • Submit your feedback or feature request
  • Categories
  • Questions
  • Predix | Updates
    • Pricing
    • Product
  • Deloitte - Private
  • How-To
  • Accenture-Private
  • Explore
  • Topics
  • Questions
  • Articles
  • Feedback or Feature Requests