Curate

The Curate page is where most of the pre-processing work is performed. Here, all the source files will be parsed into discrete samples of consistent size. Then, all samples will be added to a sample list. This section contains two sub-tabs: Source Files and Data Sample Lists.

Source Files: This is where you manage the raw data collected during your project.
Data Sample Lists: Here, you create smaller, curated datasets from the available source files.

When collecting data for extended periods, the resulting files are often too large for microcontroller units with limited memory and processing power. These devices are designed to handle smaller, focused datasets, enabling them to process only critical instances where meaningful events occur.

The Curate page helps to achieve this by breaking down large data files into smaller, manageable pieces. This process ensures that the model can be trained effectively to mimic real-world scenarios, optimizing its performance for deployment in constrained environments. Breaking down the data files is important as these blocks will be the input to your ML model. For example, if your data files is 10 seconds long, you can break it down into 10 segments, 1 second long each. We can choose the length of these segments based on prior knowledge of a use-case or start with a 1 second window length and experiment from there.

Curate page

Source Files

Here, you can see all the uploaded data files along with their File Name, Size, Type, Uploaded date, Data Shape and Sample Rate.

Viewing Project Members

Select the Show Only Project Members checkbox to filter the source files displayed, limiting the view to files associated with project members only.

Actions

In the Action dropdown menu, you’ll find the following options:

Select All: Select all files listed.
Deselect All: Clear all selected files.
New List From Selected: Create a new list using the selected files.
Segment List From Selected: Create a segmented list from the selected files.
Edit Metadata Type: Modify the metadata type for the selected files.
Format Selected: Define or update the file format for the selected files.
Remove Selected: Remove the selected files from the project.
Import Metadata: Upload a metadata file to add or update metadata for the source files.
Close: Exit the action menu without making changes.

Creating a Segmented List

After labeling your files, you can create a segmented list to divide your data into smaller, manageable samples for analysis and training. Follow these steps to configure the segmentation method and options effectively.

Why segmentation?

Segmentation plays a crucial role in generating models that are optimized for deployment in resource-constrained environments, such as those using microcontrollers (MCUs). These models are designed to process live data quickly and efficiently, often within small time windows, such as 1 second, 500 milliseconds, or even shorter durations.

In real-world applications, models need to make predictions based on short, continuous streams of data rather than long, uninterrupted recordings. To replicate this scenario during the training phase, raw data is divided into smaller segments. These segments are then used to train the model, enabling it to learn and adapt to the type of data it will encounter in live, production environments.

This approach ensures the model performs effectively in real-time scenarios while operating within the constraints of limited processing power and memory.

Steps to Create a Segmented List

Go to Actions > Segment List from Selected.
The Segment Files window opens, displaying the selected files for segmentation.
From the Segmentation Method dropdown menu, choose one of the following methods:
- Sliding CSV Window
- Energy Triggered

Sliding CSV Window Configuration

The Sliding CSV Window method divides CSV data [numerical, text based or time series data] into smaller, manageable samples using a step-by-step sliding window approach throughout the file, capturing data in each step until the entire file is processed. The table below summarizes the configuration options available for this method:

Option	Description
Sample Rate	Displays the fixed sampling rate for parsing the CSV file into a sample list. This value is fixed during the file formatting.
Target	Select the target column from the file metadata (type: Class).
Window Length	The Window Length determines the size of the decision window used for segmentation. Specify the length of each sample in rows or milliseconds (ms). This value controls how much data the AI analyzes to classify each segment. Tip: Experiment with different window lengths to identify the best configuration for your dataset.
Offset	The Offset specifies the gap between the starting points of consecutive samples within the source file. Enter the value in rows or milliseconds (ms) to define how far the parser moves before creating a new sample window.
50% Overlap	Creates samples with a 50% overlap between consecutive windows. Ensures half of the data is shared across adjacent samples. Balances the need for starting-point variation with reduced redundancy.
Non-Overlapping	Creates distinct samples with no shared data between consecutive windows. Suitable for datasets with a large volume of data or longer offsets. Often used during initial exploration and training.
All Shifts	Shifts the window one row at a time to create the maximum number of samples. Ideal for testing a finalized classifier to simulate performance on arbitrarily sampled data streams.

Advanced Options

Customize the segmentation further by clicking Advanced Options.

Option	Description
Restart Streamed Window	Restarts the window at the beginning of each class or metadata block.
	Respect Transitions: Ensures transitions within class or metadata blocks are handled.
	Class: Handles transitions specifically within class blocks.
Keep Short Window Samples	Determines how to handle short samples at the end of a file or class block.
	Retain Short Samples: Includes short samples in the output.
	1 per Block: Retains one short sample per block.
Output Type	Select how to save the segmented samples:
	Output to New List: Creates a new list for segmented samples.
	Append to Existing List: Adds parsed samples to an existing list.

Sliding CSV Window

NOTE

The Explorer Tier has a limitation that restricts the creation of segmented lists to fewer than 7,000 samples.

Output Sample List

After configuring the segmentation, provide a name for the segmented sample list on the Output Sample List page. This field is mandatory.

Action	Description
Submit	Confirms the configuration and completes the process.

After clicking on Submit, wait for a ~30 seconds and refresh the page. After refreshing the page, click on the Data Sample List tab. After clicking on the tab, you should be able to see the finished processed list.

NOTE

It takes < 1min to segment/ break down this dataset. However, the duration of segmentation depends on the size of the dataset. For example: A 1 GB file may take 5-10 minutes to segment.

Energy triggered

When you select the energy-triggered option, you can configure the following settings as needed. This segmentation method is particularly suitable for regression datasets like audio based signals.

Energy triggered 1

Field	Description
Show/Hide Preview	Displays an overview of the data file, including the file name, selected class, trigger points, capture window, and a graphical representation of the data. This feature allows you to refresh the preview, set start and end points for the data, pan and zoom within the graphical representation, and view the full file for a comprehensive analysis.
Sample Rate	Sets the frequency at which data is sampled. For energy-triggered events, this is automatically set at 100 Hz to ensure accurate data capture. No manual input is required. This is fixed during file formatting.
Trigger Channel	Determines the source for trigger detection. Options include:
	- Single: Select a specific channel from the dropdown menu to monitor for energy-triggered events
	- Sum: Allows you to combine channels mathematically (e.g., sum or difference) to define a trigger condition across multiple channels.
	- Magnitude: Enables monitoring of multiple channels simultaneously by calculating the combined magnitude.
Pre-Processing	Configures data normalization. Select the Normalize checkbox to enable this option. This scales data to ensure uniformity and improve comparison across samples.
Zeroing	Adjusts the zeroing method for data processing. Options include:
	- None: Keeps the original data without applying any zeroing adjustments.
	- DeMin: Adjusts the baseline of the data by subtracting the minimum value.
	- DeMean: Centers the data by subtracting the mean value, ensuring zero-centered data for analysis.
Zero Window	Specifies the duration over which zeroing adjustments are applied. Helps manage baseline drift over the chosen window. Enter the required value or adjust using the up and down arrows.
Filter	Sets the filter type for data processing. Options include:
	- None: No filter is applied to the data.
	- Low: Applies a low-pass filter to remove high-frequency noise, retaining lower frequencies for analysis.
	- Band: Applies a band-pass filter to isolate frequencies within a specified range, removing frequencies outside this band.
	- High: Applies a high-pass filter to remove low-frequency noise, retaining higher frequencies for analysis.
Trigger Mode	Determines the mode for triggering. Options include:
	- Amplitude: Detects triggers based on the amplitude (signal strength) of the data.
	- + Crossing: Triggers when the signal crosses a positive threshold.
	- - Crossing: Triggers when the signal crosses a negative threshold.
	- RMS: Uses the Root Mean Square (RMS) value for trigger detection, focusing on overall energy in the signal.
	- RMS Step: Triggers based on step changes in RMS values.
	- RMS Step Ratio: Detects triggers based on the ratio of consecutive RMS step changes.
	- Peak to RMS Ratio: Triggers based on the ratio of the peak signal value to its RMS value, useful for identifying transient signals.
	- Diff: Detects triggers based on differences between consecutive data points.
	- Sign: Monitors the sign (positive or negative) of the signal for trigger detection.
Threshold	Specifies the minimum signal level required to trigger an event. Enter the required value or use the up and down arrows.
Span	Defines the duration or range for trigger detection. This field is inactive if Amplitude, Diff, or Sign is selected in Trigger Mode.
Window Length	Sets the number of samples in a row used for analysis. Helps control the resolution of captured data. Enter the value directly or adjust using the up and down arrows.
Datapoints	Specifies the number of data points to analyze within the selected window. Enter the required value or adjust using the up and down arrows.
ms	Defines the window length in milliseconds for temporal analysis. Enter the required value or adjust using the up and down arrows.
Capture Options	Configures the pre-trigger or minimum separation values for capturing data:
	- Pre-Trigger: Determines the amount of data captured before the triggering event occurs, aiding in understanding pre-event conditions.
	- Min Separation: Ensures a minimum interval between successive trigger events to avoid capturing redundant data.
Limit Captures Per File	Limits the number of captures stored in a single file to manage file sizes and improve data organization. Select the checkbox to activate this option.

Click Continue to fill in more details as follows:

Energy triggered 2

Field	Description
Restart streamed window location at start of each class block or metadata block	When you select the Respect Transitions checkbox, the Class and Metadata checkboxes become editable. If you select Metadata, a dropdown appears, allowing you to specify the required metadata..
Keep short window samples at end of file or class block	Select the Retain Short Samples checkbox to keep short samples at the end of a file or class block. You can also enable the 1 per block checkbox to retain one short sample per block.
Output type	Choose between Output to new List or Append to existing List using radio buttons to determine whether to create a new list or add results to an existing list.
Output Sample List	This field is available if you select Output to new List. Enter the name of the output list where the processed samples will be saved.
Destination List	This field is available if you select Append to existing List. Select the required lists from the dropdown menu.

Click Submit to confirm.

Filtering Source Files

Click the Filter icon to open the Filter Source Files page.
Use the available filters to narrow your search:
- Name: Search files by name.
- Data Type: Filter based on the data type.
- Date: Filter by file creation or modification date.
- Data Shape: Narrow down files based on data shape.
- Sample Rate: Filter by the sample rate.
- Unformatted: Find files that are yet to be formatted.
- Assigned Targets: Filter files with assigned targets.
- Unassigned Targets: Locate files with no targets assigned.
After filling in the required fields, click Apply to filter the source files.

Defining the Target Class

To define the target class for your data, you have two options:

Using an Additional Column in the Source File: While uploading a source file, include an extra column that specifies the label for each data point.
Using a Metadata File: Prepare a CSV file named metadata with the following two columns:
- - File Name: A list of all the file names you have uploaded.
    - Label Type: The corresponding label for each file.

For example, if you have 10 files for "apples" and 5 files for "oranges," assign the labels accordingly in the metadata file.

Importing Metadata

On the Curate page, in the Source Files tab, use the Action > Import Metadata option to upload the metadata file.
A dialog box will appear, allowing you to drag and drop the prepared CSV file.
Select Target Value from the second row dropdown. Keep the first row dropdown as File Names. This will label the files according to the assigned metadata.
Once uploaded, descriptive metadata will be added to the source files.

Viewing the Target Class

After importing metadata, expand the arrows next to the Sample Rate row. The Amps column will display the target class selection for all files.

This method is particularly beneficial when dealing with large batches of files, as adding an additional column to each source file manually can be tedious.

Data Sample Lists

This section explains how to work with Output Sample Lists, which are generated after performing the Segment List from Selected action. These lists are displayed in a tabular format with the following details:

Field	Description
List Name	The name of the sample list.
List Type	Specifies the type of the list, such as classification or regression.
Data Shape	The shape or dimensions of the data in the list.
Sample Rate	The rate at which samples were collected.
N Samples	The number of samples in the list.
Target Range	The range of target values in the list.
Created	The date and time the list was created.
Modified	The date and time the list was last updated.
Comments	Allows you to add comments or notes regarding the data sample list.
Remove	Enables you to delete a specific sample list from the table.

Multi-view Option

In the toolbar, select the Multi-view checkbox to compare and analyze lists more effectively, you can display them in multiple views.

Actions

Use the Actions dropdown menu to manage sample lists. The following actions are available:

Action	Description
Deselect All	Clears all selected items.
Random Subset to New	Creates a new list from a random subset of the selected items.
Edit Sensor Groups	Adjusts sensor groupings for the selected lists.
Convert to Regression List/ Convert to Classification List	Converts the selected classification lists into regression list or vice versa.
Remap Classes	Reassigns class labels in the selected lists.
Export to CSV	Saves the selected lists as a CSV file.
Import From CSV	Uploads a CSV file to add or update data sample lists.
Close	Closes the action menu without making changes.
Remove Selected	Deletes the selected data sample lists.

Filtering Lists

You can filter sample lists to find specific items.

Click the Filter icon to open the Filter Lists page.
Use the provided options to filter the lists based on:
- Name: Search lists by name.
- List Type: Filter lists by their type.
- Date Created: Narrow down lists based on the date they were created.
- Data Shape: Filter lists by data shape.
- Sample Rate: Search for lists based on their sample rate.
Enter the required information in the filter fields and click Apply to refine the displayed lists.

Distribution

The sample lists created from segmenting the source files can be used for AI Exploration, Training or Testing. Each row contains a specific labeled sample or observation of a set length, taken from the source file stream.

Click on the newly created segmented list to view its contents. This list displays the blocks or windows of segmented data. A histogram of the data should also appear for visualization.

NOTE

If the histogram does not appear immediately after segmentation, try refreshing the page.

You can analyze the distribution of a selected list in List View or Table View.

View Options

View	Description
List View	Displays distribution details by Classes, Count, and % of List.
Table View	Provides detailed information for each sample, including Sample File, Data Shape, View, Target Class dropdown (to find or create a class) and Exclude and Remove options

Perform Actions in Table View

Select the Action button in the toolbar.
Choose from the following options:

Action	Description
Transfer	Transfers selected items to a different list.
Transfer to New List	Creates a new list from the selected items.
Select All	Selects all items.
Select All on Page	Selects all items displayed on the current page.
Select Random Subset	Selects a random subset of items.
Deselect All	Clears all selections.
Set Target for Selected	Assigns a target class to the selected items.
Exclude Selected	Excludes the selected items from the list.
Include Selected	Includes previously excluded items.
Export to CSV	Saves the selected items to a CSV file.
Import CSV	Imports items from a CSV file.
Close	Closes the action menu without making changes.
Remove Selected	Deletes the selected items.

Distribution tab

Source Files​

Viewing Project Members​

Actions​

Creating a Segmented List​

Sliding CSV Window Configuration​

Energy triggered​

Filtering Source Files​

Defining the Target Class​

Data Sample Lists​

Multi-view Option​

Actions​

Filtering Lists​

Distribution​

Source Files

Viewing Project Members

Actions

Creating a Segmented List

Sliding CSV Window Configuration

Energy triggered

Filtering Source Files

Defining the Target Class

Data Sample Lists

Multi-view Option

Actions

Filtering Lists

Distribution