While doing so, we notice that the column INSTALLMENT_PERC_INCOME is of type string while a percentage should normally be of type numeric. The next step is to scrutinize each column and check if the actual data type match the expected one. Indicate this and it can be identified just in one glance by using the Module, another anomaly which is easy to detect is the null values. This is an indication of duplicates!Īs we continue to explore the output of the
Instead, it generates a row for each column, beginning with the column name and followed by relevant statistics for that column, based on its data type.īefore reading further, I suggest you download the Data Set here and do some exploration.Īfter exploring the Data Set, here are my observations:Īs shown in the screen shot below, the count of unique values is less than the number of values for the column LOAN_ID. The module does not return the original data set. It is used to create a set of standard statistical measures that describe each column in the input table. Here, at one glance, all the details about all the columns can be obtained.
Add the module and connect it to the data set that needs to be visualized. The first step is, of course, to explore the data in Azure ML studio.įeature of the data set, we can go through each column of the data set and view properties of each column such as Mean, Unique Values and Missing Values. The Data Set also have a column Status which is the label, that is, the column that we want to predict. In this example, I’m using a credit scoring data set which has the following columns: You agree to indemnify and defend Citrix against any and all claims arising from your use, modification or distribution of the code.Today, we’ll discuss the impact of data cleansing in a Machine Learning model and how it can be achieved in Azure Machine Learning (Azure ML) studio.
NEITHER CITRIX NOR ITS AFFILIATES OR AGENTS WILL BE LIABLE, UNDER BREACH OF CONTRACT OR ANY OTHER THEORY OF LIABILITY, FOR ANY DAMAGES WHATSOEVER ARISING FROM USE OF THE SOFTWARE APPLICATION, INCLUDING WITHOUT LIMITATION DIRECT, SPECIAL, INCIDENTAL, PUNITIVE, CONSEQUENTIAL OR OTHER DAMAGES, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
In no event should the code be used to support of ultra-hazardous activities, including but not limited to life support or blasting activities. Without limiting the generality of the foregoing, you acknowledge and agree that (a) the software application may exhibit errors, design flaws or other problems, possibly resulting in loss of data or damage to property (b) it may not be possible to make the software application fully functional and (c) Citrix may, without notice or liability to you, cease to make available the current version and/or any future versions of the software application. CITRIX DISCLAIMS ALL WARRANTIES WHATSOEVER, EXPRESS, IMPLIED, WRITTEN, ORAL OR STATUTORY, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NONINFRINGEMENT. You may use and distribute it at your own risk. These software applications are provided to you as is with no representations, warranties or conditions of any kind.