Data cleaning

Most dataset contain unwanted or erroneous values which have to be removed or corrected before any further data processing can take place. The data cleaning application is provided to manually select corrupted data points, to remove them, and to fill data gaps using a linear interpolation. Alternatively, a median filter can be applied to a data channel.

Channel selection

Data cleaning can only be applied to data tables which are owned by the current user. Hence, a copy should be made first of the required data table.

A selection list will be presented of the data tables owned by the current user. After selecting a table, the page will reload and display additional options.

First, select the data channels which need to be cleaned (multiple selections can be made by, for instance, holding down the Shift or Ctrl key, depending on your browser). Then, press the Set Channels button to activate the selection.

As a result of the selecion action, a plot will be generated (and displayed in a pop-up window) of the selected channels for the specified start date and number of days (after selecting the channels, the start date is the start date of the dataset, and the number of days is set according to the time resolution of the data).

The start date can be modified by entering a date in the corresponding text box and pressing the Set Date button. Alternatively, clicking on the < and > symbols surrounding the date text box will recede or advance the start date by the number of days specified in the corresponding selection menu. Changing the number of days to be displayed will reload the page.

Data point selections

When the channel and time range selection is complete, proceed to the next panel and select a channel from the update list. This will cause a reload of the page with additional features: Using visual inspection of the data plot and the tabulated values, data points that need to be removed can be identified. A data point is marked for deletion by ticking the check box in front of it (clicking on the date will also (de-)activate the check box). Missing values are indicated as NULL. Once a number of data points has been marked for deletion, pressing the View Changes button will refresh the data plot without the selected data points. Once the selection is verified, pressing the Commit Changes button will set the selected values to NULL in the database. The changes are not committed until this button is pressed.

When data points were rejected, or data points were missing from the start, a list of data gaps will be shown. The gaps can be filled using a linear interpolation between the last and first data values surrounding the gap. Multiple selections of gaps can be made, to speed up the processing. Pressing the Fill Gaps button will cause the page to be reloaded with the missing values replaced by the fitted values (indicated in red for easy recognition), and the plot window will be refreshed to show the effect of the gap filling. In order to confirm the gap filling, press the Commit Changes button to update the values in the database. Gap filled data can be removed again by reloading the data from the database

Caution: changes made by pressing the Commit Changes button cannot be undone!




Last modified on: 21 February 2011.