Skip to main content

Curate

The Curate page is where most of the pre-processing work is performed. Here, all the source files will be parsed into discrete samples of consistent size. Then, all samples will be added to a sample list. This section contains two sub-tabs: Source Files and Data Sample Lists.

  • Source Files: This is where you manage the raw data collected during your project.
  • Data Sample Lists: Here, you create smaller, curated datasets from the available source files.

When collecting data for extended periods, the resulting files are often too large for microcontroller units with limited memory and processing power. These devices are designed to handle smaller, focused datasets, enabling them to process only critical instances where meaningful events occur.

The Curate page helps to achieve this by breaking down large data files into smaller, manageable pieces. This process ensures that the model can be trained effectively to mimic real-world scenarios, optimizing its performance for deployment in constrained environments. Breaking down the data files is important as these blocks will be the input to your ML model. For example, if your data files is 10 seconds long, you can break it down into 10 segments, 1 second long each. We can choose the length of these segments based on prior knowledge of a use-case or start with a 1 second window length and experiment from there.

Curate page

Source Files

Here, you can see all the uploaded data files along with their File Name, Size, Type, Uploaded date, Data Shape and Sample Rate.

Viewing Project Members

Select the Show Only Project Members checkbox to filter the source files displayed, limiting the view to files associated with project members only.

Actions

In the Action dropdown menu, you’ll find the following options:

  • Select All: Select all files listed.
  • Deselect All: Clear all selected files.
  • New List From Selected: Create a new list using the selected files.
  • Segment List From Selected: Create a segmented list from the selected files.
  • Edit Metadata Type: Modify the metadata type for the selected files.
  • Format Selected: Define or update the file format for the selected files.
  • Remove Selected: Remove the selected files from the project.
  • Import Metadata: Upload a metadata file to add or update metadata for the source files.
  • Close: Exit the action menu without making changes.

Creating a Segmented List

After labeling your files, you can create a segmented list to divide your data into smaller, manageable samples for analysis and training. Follow these steps to configure the segmentation method and options effectively.

Why segmentation?

Segmentation plays a crucial role in generating models that are optimized for deployment in resource-constrained environments, such as those using microcontrollers (MCUs). These models are designed to process live data quickly and efficiently, often within small time windows, such as 1 second, 500 milliseconds, or even shorter durations.

In real-world applications, models need to make predictions based on short, continuous streams of data rather than long, uninterrupted recordings. To replicate this scenario during the training phase, raw data is divided into smaller segments. These segments are then used to train the model, enabling it to learn and adapt to the type of data it will encounter in live, production environments.

This approach ensures the model performs effectively in real-time scenarios while operating within the constraints of limited processing power and memory.

Steps to Create a Segmented List

  1. Go to Actions > Segment List from Selected.
  2. The Segment Files window opens, displaying the selected files for segmentation.
  3. From the Segmentation Method dropdown menu, choose one of the following methods:
    • Sliding CSV Window
    • Energy Triggered

Sliding CSV Window Configuration

The Sliding CSV Window method divides CSV data [numerical, text based or time series data] into smaller, manageable samples using a step-by-step sliding window approach throughout the file, capturing data in each step until the entire file is processed. The table below summarizes the configuration options available for this method:

Option

Description

Sample Rate

Displays the fixed sampling rate for parsing the CSV file into a sample list. This value is fixed during the file formatting.

Target

Select the target column from the file metadata (type: Class).

Window Length

The Window Length determines the size of the decision window used for segmentation. Specify the length of each sample in rows or milliseconds (ms). This value controls how much data the AI analyzes to classify each segment.

Tip: Experiment with different window lengths to identify the best configuration for your dataset.

Offset

The Offset specifies the gap between the starting points of consecutive samples within the source file. Enter the value in rows or milliseconds (ms) to define how far the parser moves before creating a new sample window.

50% Overlap

  • Creates samples with a 50% overlap between consecutive windows.
  • Ensures half of the data is shared across adjacent samples.
  • Balances the need for starting-point variation with reduced redundancy.

Non-Overlapping

  • Creates distinct samples with no shared data between consecutive windows.
  • Suitable for datasets with a large volume of data or longer offsets.
  • Often used during initial exploration and training.

All Shifts

  • Shifts the window one row at a time to create the maximum number of samples.
  • Ideal for testing a finalized classifier to simulate performance on arbitrarily sampled data streams.

Advanced Options

Customize the segmentation further by clicking Advanced Options.

OptionDescription
Restart Streamed WindowRestarts the window at the beginning of each class or metadata block.
Respect Transitions: Ensures transitions within class or metadata blocks are handled.
Class: Handles transitions specifically within class blocks.
Keep Short Window SamplesDetermines how to handle short samples at the end of a file or class block.
Retain Short Samples: Includes short samples in the output.
1 per Block: Retains one short sample per block.
Output TypeSelect how to save the segmented samples:
Output to New List: Creates a new list for segmented samples.
Append to Existing List: Adds parsed samples to an existing list.

Sliding CSV Window

NOTE

The Explorer Tier has a limitation that restricts the creation of segmented lists to fewer than 7,000 samples.

Output Sample List

After configuring the segmentation, provide a name for the segmented sample list on the Output Sample List page. This field is mandatory.

ActionDescription
SubmitConfirms the configuration and completes the process.

After clicking on Submit, wait for a ~30 seconds and refresh the page. After refreshing the page, click on the Data Sample List tab. After clicking on the tab, you should be able to see the finished processed list.

NOTE

It takes < 1min to segment/ break down this dataset. However, the duration of segmentation depends on the size of the dataset. For example: A 1 GB file may take 5-10 minutes to segment.

Energy triggered

When you select the energy-triggered option, you can configure the following settings as needed. This segmentation method is particularly suitable for regression datasets like audio based signals.

Energy triggered 1

FieldDescription
Show/Hide PreviewDisplays an overview of the data file, including the file name, selected class, trigger points, capture window, and a graphical representation of the data. This feature allows you to refresh the preview, set start and end points for the data, pan and zoom within the graphical representation, and view the full file for a comprehensive analysis.
Sample RateSets the frequency at which data is sampled. For energy-triggered events, this is automatically set at 100 Hz to ensure accurate data capture. No manual input is required. This is fixed during file formatting.
Trigger ChannelDetermines the source for trigger detection. Options include:
- Single: Select a specific channel from the dropdown menu to monitor for energy-triggered events
- Sum: Allows you to combine channels mathematically (e.g., sum or difference) to define a trigger condition across multiple channels.
- Magnitude: Enables monitoring of multiple channels simultaneously by calculating the combined magnitude.
Pre-ProcessingConfigures data normalization. Select the Normalize checkbox to enable this option. This scales data to ensure uniformity and improve comparison across samples.
ZeroingAdjusts the zeroing method for data processing. Options include:
- None: Keeps the original data without applying any zeroing adjustments.
- DeMin: Adjusts the baseline of the data by subtracting the minimum value.
- DeMean: Centers the data by subtracting the mean value, ensuring zero-centered data for analysis.
Zero WindowSpecifies the duration over which zeroing adjustments are applied. Helps manage baseline drift over the chosen window. Enter the required value or adjust using the up and down arrows.
FilterSets the filter type for data processing. Options include:
- None: No filter is applied to the data.
- Low: Applies a low-pass filter to remove high-frequency noise, retaining lower frequencies for analysis.
- Band: Applies a band-pass filter to isolate frequencies within a specified range, removing frequencies outside this band.
- High: Applies a high-pass filter to remove low-frequency noise, retaining higher frequencies for analysis.
Trigger ModeDetermines the mode for triggering. Options include:
- Amplitude: Detects triggers based on the amplitude (signal strength) of the data.
- + Crossing: Triggers when the signal crosses a positive threshold.
- - Crossing: Triggers when the signal crosses a negative threshold.
- RMS: Uses the Root Mean Square (RMS) value for trigger detection, focusing on overall energy in the signal.
- RMS Step: Triggers based on step changes in RMS values.
- RMS Step Ratio: Detects triggers based on the ratio of consecutive RMS step changes.
- Peak to RMS Ratio: Triggers based on the ratio of the peak signal value to its RMS value, useful for identifying transient signals.
- Diff: Detects triggers based on differences between consecutive data points.
- Sign: Monitors the sign (positive or negative) of the signal for trigger detection.
ThresholdSpecifies the minimum signal level required to trigger an event. Enter the required value or use the up and down arrows.
SpanDefines the duration or range for trigger detection. This field is inactive if Amplitude, Diff, or Sign is selected in Trigger Mode.
Window LengthSets the number of samples in a row used for analysis. Helps control the resolution of captured data. Enter the value directly or adjust using the up and down arrows.
DatapointsSpecifies the number of data points to analyze within the selected window. Enter the required value or adjust using the up and down arrows.
msDefines the window length in milliseconds for temporal analysis. Enter the required value or adjust using the up and down arrows.
Capture OptionsConfigures the pre-trigger or minimum separation values for capturing data:
- Pre-Trigger: Determines the amount of data captured before the triggering event occurs, aiding in understanding pre-event conditions.
- Min Separation: Ensures a minimum interval between successive trigger events to avoid capturing redundant data.
Limit Captures Per FileLimits the number of captures stored in a single file to manage file sizes and improve data organization. Select the checkbox to activate this option.

Click Continue to fill in more details as follows:

Energy triggered 2

FieldDescription
Restart streamed window location at start of each class block or metadata blockWhen you select the Respect Transitions checkbox, the Class and Metadata checkboxes become editable. If you select Metadata, a dropdown appears, allowing you to specify the required metadata..
Keep short window samples at end of file or class blockSelect the Retain Short Samples checkbox to keep short samples at the end of a file or class block. You can also enable the 1 per block checkbox to retain one short sample per block.
Output typeChoose between Output to new List or Append to existing List using radio buttons to determine whether to create a new list or add results to an existing list.
Output Sample ListThis field is available if you select Output to new List. Enter the name of the output list where the processed samples will be saved.
Destination ListThis field is available if you select Append to existing List. Select the required lists from the dropdown menu.

Click Submit to confirm.

Filtering Source Files

  1. Click the Filter icon to open the Filter Source Files page.
  2. Use the available filters to narrow your search:
    • Name: Search files by name.
    • Data Type: Filter based on the data type.
    • Date: Filter by file creation or modification date.
    • Data Shape: Narrow down files based on data shape.
    • Sample Rate: Filter by the sample rate.
    • Unformatted: Find files that are yet to be formatted.
    • Assigned Targets: Filter files with assigned targets.
    • Unassigned Targets: Locate files with no targets assigned.
  3. After filling in the required fields, click Apply to filter the source files.

Defining the Target Class

To define the target class for your data, you have two options:

  1. Using an Additional Column in the Source File: While uploading a source file, include an extra column that specifies the label for each data point.
  2. Using a Metadata File: Prepare a CSV file named metadata with the following two columns:
      • File Name: A list of all the file names you have uploaded.
        • Label Type: The corresponding label for each file.

For example, if you have 10 files for "apples" and 5 files for "oranges," assign the labels accordingly in the metadata file.

Importing Metadata

  1. On the Curate page, in the Source Files tab, use the Action > Import Metadata option to upload the metadata file.
  2. A dialog box will appear, allowing you to drag and drop the prepared CSV file.
  3. Select Target Value from the second row dropdown. Keep the first row dropdown as File Names. This will label the files according to the assigned metadata.
  4. Once uploaded, descriptive metadata will be added to the source files.

Viewing the Target Class

After importing metadata, expand the arrows next to the Sample Rate row. The Amps column will display the target class selection for all files.

This method is particularly beneficial when dealing with large batches of files, as adding an additional column to each source file manually can be tedious.

Data Sample Lists

This section explains how to work with Output Sample Lists, which are generated after performing the Segment List from Selected action. These lists are displayed in a tabular format with the following details:

FieldDescription
List NameThe name of the sample list.
List TypeSpecifies the type of the list, such as classification or regression.
Data ShapeThe shape or dimensions of the data in the list.
Sample RateThe rate at which samples were collected.
N SamplesThe number of samples in the list.
Target RangeThe range of target values in the list.
CreatedThe date and time the list was created.
ModifiedThe date and time the list was last updated.
CommentsAllows you to add comments or notes regarding the data sample list.
RemoveEnables you to delete a specific sample list from the table.

Multi-view Option

In the toolbar, select the Multi-view checkbox to compare and analyze lists more effectively, you can display them in multiple views.

Actions

Use the Actions dropdown menu to manage sample lists. The following actions are available:

ActionDescription
Deselect AllClears all selected items.
Random Subset to NewCreates a new list from a random subset of the selected items.
Edit Sensor GroupsAdjusts sensor groupings for the selected lists.
Convert to Regression List/
Convert to Classification List
Converts the selected classification lists into regression list or vice versa.
Remap ClassesReassigns class labels in the selected lists.
Export to CSVSaves the selected lists as a CSV file.
Import From CSVUploads a CSV file to add or update data sample lists.
CloseCloses the action menu without making changes.
Remove SelectedDeletes the selected data sample lists.

Filtering Lists

You can filter sample lists to find specific items.

  1. Click the Filter icon to open the Filter Lists page.
  2. Use the provided options to filter the lists based on:
    • Name: Search lists by name.
    • List Type: Filter lists by their type.
    • Date Created: Narrow down lists based on the date they were created.
    • Data Shape: Filter lists by data shape.
    • Sample Rate: Search for lists based on their sample rate.
  3. Enter the required information in the filter fields and click Apply to refine the displayed lists.

Distribution

The sample lists created from segmenting the source files can be used for AI Exploration, Training or Testing. Each row contains a specific labeled sample or observation of a set length, taken from the source file stream.

Click on the newly created segmented list to view its contents. This list displays the blocks or windows of segmented data. A histogram of the data should also appear for visualization.

NOTE

If the histogram does not appear immediately after segmentation, try refreshing the page.

You can analyze the distribution of a selected list in List View or Table View.

View Options

ViewDescription
List ViewDisplays distribution details by Classes, Count, and % of List.
Table ViewProvides detailed information for each sample, including Sample File, Data Shape, View, Target Class dropdown (to find or create a class) and Exclude and Remove options

Perform Actions in Table View

  1. Select the Action button in the toolbar.
  2. Choose from the following options:
ActionDescription
TransferTransfers selected items to a different list.
Transfer to New ListCreates a new list from the selected items.
Select AllSelects all items.
Select All on PageSelects all items displayed on the current page.
Select Random SubsetSelects a random subset of items.
Deselect AllClears all selections.
Set Target for SelectedAssigns a target class to the selected items.
Exclude SelectedExcludes the selected items from the list.
Include SelectedIncludes previously excluded items.
Export to CSVSaves the selected items to a CSV file.
Import CSVImports items from a CSV file.
CloseCloses the action menu without making changes.
Remove SelectedDeletes the selected items.

Distribution tab