Overview

Dataset statistics

Number of variables22
Number of observations253680
Missing cells0
Missing cells (%)0.0%
Duplicate rows11369
Duplicate rows (%)4.5%
Total size in memory42.6 MiB
Average record size in memory176.0 B

Variable types

Categorical16
Numeric6

Alerts

Dataset has 11369 (4.5%) duplicate rowsDuplicates
GenHlth is highly correlated with PhysHlthHigh correlation
PhysHlth is highly correlated with GenHlthHigh correlation
GenHlth is highly correlated with PhysHlthHigh correlation
PhysHlth is highly correlated with GenHlth and 1 other fieldsHigh correlation
DiffWalk is highly correlated with PhysHlthHigh correlation
MentHlth has 175680 (69.3%) zeros Zeros
PhysHlth has 160052 (63.1%) zeros Zeros

Reproduction

Analysis started2022-07-28 05:32:43.991206
Analysis finished2022-07-28 05:33:18.242984
Duration34.25 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Diabetes_binary
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
218334 
1.0
35346 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0218334
86.1%
1.035346
 
13.9%

Length

2022-07-28T11:03:18.314710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:18.411299image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0218334
86.1%
1.035346
 
13.9%

Most occurring characters

ValueCountFrequency (%)
0472014
62.0%
.253680
33.3%
135346
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0472014
93.0%
135346
 
7.0%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0472014
62.0%
.253680
33.3%
135346
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0472014
62.0%
.253680
33.3%
135346
 
4.6%

HighBP
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
144851 
1.0
108829 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0144851
57.1%
1.0108829
42.9%

Length

2022-07-28T11:03:18.482077image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:18.563724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0144851
57.1%
1.0108829
42.9%

Most occurring characters

ValueCountFrequency (%)
0398531
52.4%
.253680
33.3%
1108829
 
14.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0398531
78.5%
1108829
 
21.5%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0398531
52.4%
.253680
33.3%
1108829
 
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0398531
52.4%
.253680
33.3%
1108829
 
14.3%

HighChol
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
146089 
1.0
107591 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row0.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0146089
57.6%
1.0107591
42.4%

Length

2022-07-28T11:03:18.633469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:18.626276image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0146089
57.6%
1.0107591
42.4%

Most occurring characters

ValueCountFrequency (%)
0399769
52.5%
.253680
33.3%
1107591
 
14.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0399769
78.8%
1107591
 
21.2%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0399769
52.5%
.253680
33.3%
1107591
 
14.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0399769
52.5%
.253680
33.3%
1107591
 
14.1%

CholCheck
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
1.0
244210 
0.0
 
9470

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0244210
96.3%
0.09470
 
3.7%

Length

2022-07-28T11:03:18.721130image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:18.825774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0244210
96.3%
0.09470
 
3.7%

Most occurring characters

ValueCountFrequency (%)
0263150
34.6%
.253680
33.3%
1244210
32.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0263150
51.9%
1244210
48.1%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0263150
34.6%
.253680
33.3%
1244210
32.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0263150
34.6%
.253680
33.3%
1244210
32.1%

BMI
Real number (ℝ≥0)

Distinct84
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.38236361
Minimum12
Maximum98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2022-07-28T11:03:18.919204image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum12
5-th percentile20
Q124
median27
Q331
95-th percentile40
Maximum98
Range86
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.608694201
Coefficient of variation (CV)0.2328450968
Kurtosis10.99747329
Mean28.38236361
Median Absolute Deviation (MAD)3
Skewness2.122003758
Sum7200038
Variance43.67483905
MonotonicityNot monotonic
2022-07-28T11:03:19.027458image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2724606
 
9.7%
2620562
 
8.1%
2419550
 
7.7%
2517146
 
6.8%
2816545
 
6.5%
2315610
 
6.2%
2914890
 
5.9%
3014573
 
5.7%
2213643
 
5.4%
3112275
 
4.8%
Other values (74)84280
33.2%
ValueCountFrequency (%)
126
 
< 0.1%
1321
 
< 0.1%
1441
 
< 0.1%
15132
 
0.1%
16348
 
0.1%
17776
 
0.3%
181803
 
0.7%
193968
1.6%
206327
2.5%
219855
3.9%
ValueCountFrequency (%)
987
 
< 0.1%
961
 
< 0.1%
9512
 
< 0.1%
9232
< 0.1%
911
 
< 0.1%
901
 
< 0.1%
8928
< 0.1%
882
 
< 0.1%
8761
< 0.1%
861
 
< 0.1%

Smoker
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
141257 
1.0
112423 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0141257
55.7%
1.0112423
44.3%

Length

2022-07-28T11:03:19.123973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:19.204008image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0141257
55.7%
1.0112423
44.3%

Most occurring characters

ValueCountFrequency (%)
0394937
51.9%
.253680
33.3%
1112423
 
14.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0394937
77.8%
1112423
 
22.2%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0394937
51.9%
.253680
33.3%
1112423
 
14.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0394937
51.9%
.253680
33.3%
1112423
 
14.8%

Stroke
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
243388 
1.0
 
10292

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0243388
95.9%
1.010292
 
4.1%

Length

2022-07-28T11:03:19.274003image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:19.352747image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0243388
95.9%
1.010292
 
4.1%

Most occurring characters

ValueCountFrequency (%)
0497068
65.3%
.253680
33.3%
110292
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0497068
98.0%
110292
 
2.0%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0497068
65.3%
.253680
33.3%
110292
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0497068
65.3%
.253680
33.3%
110292
 
1.4%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
229787 
1.0
23893 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0229787
90.6%
1.023893
 
9.4%

Length

2022-07-28T11:03:19.421105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:19.500742image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0229787
90.6%
1.023893
 
9.4%

Most occurring characters

ValueCountFrequency (%)
0483467
63.5%
.253680
33.3%
123893
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0483467
95.3%
123893
 
4.7%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0483467
63.5%
.253680
33.3%
123893
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0483467
63.5%
.253680
33.3%
123893
 
3.1%

PhysActivity
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
1.0
191920 
0.0
61760 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0191920
75.7%
0.061760
 
24.3%

Length

2022-07-28T11:03:19.570808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:19.649233image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0191920
75.7%
0.061760
 
24.3%

Most occurring characters

ValueCountFrequency (%)
0315440
41.4%
.253680
33.3%
1191920
25.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0315440
62.2%
1191920
37.8%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0315440
41.4%
.253680
33.3%
1191920
25.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0315440
41.4%
.253680
33.3%
1191920
25.2%

Fruits
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
1.0
160898 
0.0
92782 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0160898
63.4%
0.092782
36.6%

Length

2022-07-28T11:03:19.752961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:19.844472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0160898
63.4%
0.092782
36.6%

Most occurring characters

ValueCountFrequency (%)
0346462
45.5%
.253680
33.3%
1160898
21.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0346462
68.3%
1160898
31.7%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0346462
45.5%
.253680
33.3%
1160898
21.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0346462
45.5%
.253680
33.3%
1160898
21.1%

Veggies
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
1.0
205841 
0.0
47839 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row0.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0205841
81.1%
0.047839
 
18.9%

Length

2022-07-28T11:03:19.919796image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:20.014791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0205841
81.1%
0.047839
 
18.9%

Most occurring characters

ValueCountFrequency (%)
0301519
39.6%
.253680
33.3%
1205841
27.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0301519
59.4%
1205841
40.6%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0301519
39.6%
.253680
33.3%
1205841
27.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0301519
39.6%
.253680
33.3%
1205841
27.0%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
239424 
1.0
 
14256

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0239424
94.4%
1.014256
 
5.6%

Length

2022-07-28T11:03:20.089769image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:20.174643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0239424
94.4%
1.014256
 
5.6%

Most occurring characters

ValueCountFrequency (%)
0493104
64.8%
.253680
33.3%
114256
 
1.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0493104
97.2%
114256
 
2.8%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0493104
64.8%
.253680
33.3%
114256
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0493104
64.8%
.253680
33.3%
114256
 
1.9%

AnyHealthcare
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
1.0
241263 
0.0
 
12417

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0241263
95.1%
0.012417
 
4.9%

Length

2022-07-28T11:03:20.253426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:20.341739image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0241263
95.1%
0.012417
 
4.9%

Most occurring characters

ValueCountFrequency (%)
0266097
35.0%
.253680
33.3%
1241263
31.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0266097
52.4%
1241263
47.6%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0266097
35.0%
.253680
33.3%
1241263
31.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0266097
35.0%
.253680
33.3%
1241263
31.7%

NoDocbcCost
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
232326 
1.0
 
21354

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row1.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0232326
91.6%
1.021354
 
8.4%

Length

2022-07-28T11:03:20.413540image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:20.498095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0232326
91.6%
1.021354
 
8.4%

Most occurring characters

ValueCountFrequency (%)
0486006
63.9%
.253680
33.3%
121354
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0486006
95.8%
121354
 
4.2%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0486006
63.9%
.253680
33.3%
121354
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0486006
63.9%
.253680
33.3%
121354
 
2.8%

GenHlth
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
2.0
89084 
3.0
75646 
1.0
45299 
4.0
31570 
5.0
12081 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5.0
2nd row3.0
3rd row5.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.089084
35.1%
3.075646
29.8%
1.045299
17.9%
4.031570
 
12.4%
5.012081
 
4.8%

Length

2022-07-28T11:03:20.571054image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:20.666459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2.089084
35.1%
3.075646
29.8%
1.045299
17.9%
4.031570
 
12.4%
5.012081
 
4.8%

Most occurring characters

ValueCountFrequency (%)
.253680
33.3%
0253680
33.3%
289084
 
11.7%
375646
 
9.9%
145299
 
6.0%
431570
 
4.1%
512081
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0253680
50.0%
289084
 
17.6%
375646
 
14.9%
145299
 
8.9%
431570
 
6.2%
512081
 
2.4%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
.253680
33.3%
0253680
33.3%
289084
 
11.7%
375646
 
9.9%
145299
 
6.0%
431570
 
4.1%
512081
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
.253680
33.3%
0253680
33.3%
289084
 
11.7%
375646
 
9.9%
145299
 
6.0%
431570
 
4.1%
512081
 
1.6%

MentHlth
Real number (ℝ≥0)

ZEROS

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.184772154
Minimum0
Maximum30
Zeros175680
Zeros (%)69.3%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2022-07-28T11:03:20.757316image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile26
Maximum30
Range30
Interquartile range (IQR)2

Descriptive statistics

Standard deviation7.412846696
Coefficient of variation (CV)2.327590904
Kurtosis6.441684484
Mean3.184772154
Median Absolute Deviation (MAD)0
Skewness2.721148366
Sum807913
Variance54.95029614
MonotonicityNot monotonic
2022-07-28T11:03:20.862011image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0175680
69.3%
213054
 
5.1%
3012088
 
4.8%
59030
 
3.6%
18538
 
3.4%
37381
 
2.9%
106373
 
2.5%
155505
 
2.2%
43789
 
1.5%
203364
 
1.3%
Other values (21)8878
 
3.5%
ValueCountFrequency (%)
0175680
69.3%
18538
 
3.4%
213054
 
5.1%
37381
 
2.9%
43789
 
1.5%
59030
 
3.6%
6988
 
0.4%
73100
 
1.2%
8639
 
0.3%
991
 
< 0.1%
ValueCountFrequency (%)
3012088
4.8%
29158
 
0.1%
28327
 
0.1%
2779
 
< 0.1%
2645
 
< 0.1%
251188
 
0.5%
2433
 
< 0.1%
2338
 
< 0.1%
2263
 
< 0.1%
21227
 
0.1%

PhysHlth
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.242080574
Minimum0
Maximum30
Zeros160052
Zeros (%)63.1%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2022-07-28T11:03:20.969316image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q33
95-th percentile30
Maximum30
Range30
Interquartile range (IQR)3

Descriptive statistics

Standard deviation8.717951307
Coefficient of variation (CV)2.055112145
Kurtosis3.496241806
Mean4.242080574
Median Absolute Deviation (MAD)0
Skewness2.207394915
Sum1076131
Variance76.00267499
MonotonicityNot monotonic
2022-07-28T11:03:21.070079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0160052
63.1%
3019400
 
7.6%
214764
 
5.8%
111388
 
4.5%
38495
 
3.3%
57622
 
3.0%
105595
 
2.2%
154916
 
1.9%
44542
 
1.8%
74538
 
1.8%
Other values (21)12368
 
4.9%
ValueCountFrequency (%)
0160052
63.1%
111388
 
4.5%
214764
 
5.8%
38495
 
3.3%
44542
 
1.8%
57622
 
3.0%
61330
 
0.5%
74538
 
1.8%
8809
 
0.3%
9179
 
0.1%
ValueCountFrequency (%)
3019400
7.6%
29215
 
0.1%
28522
 
0.2%
2799
 
< 0.1%
2669
 
< 0.1%
251336
 
0.5%
2472
 
< 0.1%
2356
 
< 0.1%
2270
 
< 0.1%
21663
 
0.3%

DiffWalk
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
211005 
1.0
42675 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0211005
83.2%
1.042675
 
16.8%

Length

2022-07-28T11:03:21.173711image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:21.257440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0211005
83.2%
1.042675
 
16.8%

Most occurring characters

ValueCountFrequency (%)
0464685
61.1%
.253680
33.3%
142675
 
5.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0464685
91.6%
142675
 
8.4%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0464685
61.1%
.253680
33.3%
142675
 
5.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0464685
61.1%
.253680
33.3%
142675
 
5.6%

Sex
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size14.5 MiB
0.0
141974 
1.0
111706 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters761040
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0141974
56.0%
1.0111706
44.0%

Length

2022-07-28T11:03:21.330993image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-28T11:03:21.413948image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0141974
56.0%
1.0111706
44.0%

Most occurring characters

ValueCountFrequency (%)
0395654
52.0%
.253680
33.3%
1111706
 
14.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number507360
66.7%
Other Punctuation253680
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0395654
78.0%
1111706
 
22.0%
Other Punctuation
ValueCountFrequency (%)
.253680
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common761040
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0395654
52.0%
.253680
33.3%
1111706
 
14.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII761040
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0395654
52.0%
.253680
33.3%
1111706
 
14.7%

Age
Real number (ℝ≥0)

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.032119205
Minimum1
Maximum13
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2022-07-28T11:03:21.476271image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q16
median8
Q310
95-th percentile13
Maximum13
Range12
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.054220434
Coefficient of variation (CV)0.3802508847
Kurtosis-0.5812227275
Mean8.032119205
Median Absolute Deviation (MAD)2
Skewness-0.3599032479
Sum2037588
Variance9.32826246
MonotonicityNot monotonic
2022-07-28T11:03:21.562141image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
933244
13.1%
1032194
12.7%
830832
12.2%
726314
10.4%
1123533
9.3%
619819
7.8%
1317363
6.8%
516157
6.4%
1215980
6.3%
413823
5.4%
Other values (3)24421
9.6%
ValueCountFrequency (%)
15700
 
2.2%
27598
 
3.0%
311123
 
4.4%
413823
5.4%
516157
6.4%
619819
7.8%
726314
10.4%
830832
12.2%
933244
13.1%
1032194
12.7%
ValueCountFrequency (%)
1317363
6.8%
1215980
6.3%
1123533
9.3%
1032194
12.7%
933244
13.1%
830832
12.2%
726314
10.4%
619819
7.8%
516157
6.4%
413823
5.4%

Education
Real number (ℝ≥0)

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.050433617
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2022-07-28T11:03:21.640060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q14
median5
Q36
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation0.9857741757
Coefficient of variation (CV)0.1951860475
Kurtosis0.03945316698
Mean5.050433617
Median Absolute Deviation (MAD)1
Skewness-0.7772552706
Sum1281194
Variance0.9717507255
MonotonicityNot monotonic
2022-07-28T11:03:21.721481image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
6107325
42.3%
569910
27.6%
462750
24.7%
39478
 
3.7%
24043
 
1.6%
1174
 
0.1%
ValueCountFrequency (%)
1174
 
0.1%
24043
 
1.6%
39478
 
3.7%
462750
24.7%
569910
27.6%
6107325
42.3%
ValueCountFrequency (%)
6107325
42.3%
569910
27.6%
462750
24.7%
39478
 
3.7%
24043
 
1.6%
1174
 
0.1%

Income
Real number (ℝ≥0)

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.053874961
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.9 MiB
2022-07-28T11:03:21.972101image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q15
median7
Q38
95-th percentile8
Maximum8
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.071147566
Coefficient of variation (CV)0.3421193169
Kurtosis-0.2803292618
Mean6.053874961
Median Absolute Deviation (MAD)1
Skewness-0.8913449907
Sum1535747
Variance4.289652241
MonotonicityNot monotonic
2022-07-28T11:03:22.047466image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
890385
35.6%
743219
17.0%
636470
14.4%
525883
 
10.2%
420135
 
7.9%
315994
 
6.3%
211783
 
4.6%
19811
 
3.9%
ValueCountFrequency (%)
19811
 
3.9%
211783
 
4.6%
315994
 
6.3%
420135
 
7.9%
525883
 
10.2%
636470
14.4%
743219
17.0%
890385
35.6%
ValueCountFrequency (%)
890385
35.6%
743219
17.0%
636470
14.4%
525883
 
10.2%
420135
 
7.9%
315994
 
6.3%
211783
 
4.6%
19811
 
3.9%

Interactions

2022-07-28T11:03:15.634171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:11.382437image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:12.276897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:13.120496image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:13.967581image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:14.797634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:15.768719image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:11.555870image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:12.418031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:13.256657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:14.104860image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:14.930327image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:15.911783image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:11.694211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:12.558165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:13.393862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:14.244726image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:15.069478image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:16.059491image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:11.830776image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:12.697716image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:13.537678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:14.378863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:15.215397image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:16.200671image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:11.965714image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:12.837666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:13.676030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:14.516038image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:15.347084image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:16.338686image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:12.132109image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:12.975544image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:13.818902image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:14.651245image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2022-07-28T11:03:15.488725image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2022-07-28T11:03:22.161808image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-28T11:03:22.393175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-28T11:03:22.614573image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-28T11:03:22.829183image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-28T11:03:23.014691image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-28T11:03:16.557455image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-28T11:03:17.351164image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Diabetes_binaryHighBPHighCholCholCheckBMISmokerStrokeHeartDiseaseorAttackPhysActivityFruitsVeggiesHvyAlcoholConsumpAnyHealthcareNoDocbcCostGenHlthMentHlthPhysHlthDiffWalkSexAgeEducationIncome
00.01.01.01.040.01.00.00.00.00.01.00.01.00.05.018.015.01.00.09.04.03.0
10.00.00.00.025.01.00.00.01.00.00.00.00.01.03.00.00.00.00.07.06.01.0
20.01.01.01.028.00.00.00.00.01.00.00.01.01.05.030.030.01.00.09.04.08.0
30.01.00.01.027.00.00.00.01.01.01.00.01.00.02.00.00.00.00.011.03.06.0
40.01.01.01.024.00.00.00.01.01.01.00.01.00.02.03.00.00.00.011.05.04.0
50.01.01.01.025.01.00.00.01.01.01.00.01.00.02.00.02.00.01.010.06.08.0
60.01.00.01.030.01.00.00.00.00.00.00.01.00.03.00.014.00.00.09.06.07.0
70.01.01.01.025.01.00.00.01.00.01.00.01.00.03.00.00.01.00.011.04.04.0
81.01.01.01.030.01.00.01.00.01.01.00.01.00.05.030.030.01.00.09.05.01.0
90.00.00.01.024.00.00.00.00.00.01.00.01.00.02.00.00.00.01.08.04.03.0

Last rows

Diabetes_binaryHighBPHighCholCholCheckBMISmokerStrokeHeartDiseaseorAttackPhysActivityFruitsVeggiesHvyAlcoholConsumpAnyHealthcareNoDocbcCostGenHlthMentHlthPhysHlthDiffWalkSexAgeEducationIncome
2536701.01.01.01.025.00.00.01.00.01.00.00.01.00.05.015.00.01.00.013.06.04.0
2536710.01.01.01.023.00.01.01.00.00.00.00.01.01.04.00.05.00.01.08.03.02.0
2536720.01.00.01.030.01.00.01.01.01.01.00.01.00.03.00.00.00.01.012.02.01.0
2536730.01.00.01.042.00.00.00.01.01.01.00.01.00.03.014.04.00.01.03.06.08.0
2536740.00.00.01.027.00.00.00.00.00.01.00.01.00.01.00.00.00.00.03.06.05.0
2536750.01.01.01.045.00.00.00.00.01.01.00.01.00.03.00.05.00.01.05.06.07.0
2536761.01.01.01.018.00.00.00.00.00.00.00.01.00.04.00.00.01.00.011.02.04.0
2536770.00.00.01.028.00.00.00.01.01.00.00.01.00.01.00.00.00.00.02.05.02.0
2536780.01.00.01.023.00.00.00.00.01.01.00.01.00.03.00.00.00.01.07.05.01.0
2536791.01.01.01.025.00.00.01.01.01.00.00.01.00.02.00.00.00.00.09.06.02.0

Duplicate rows

Most frequently occurring

Diabetes_binaryHighBPHighCholCholCheckBMISmokerStrokeHeartDiseaseorAttackPhysActivityFruitsVeggiesHvyAlcoholConsumpAnyHealthcareNoDocbcCostGenHlthMentHlthPhysHlthDiffWalkSexAgeEducationIncome# duplicates
6450.00.00.01.021.00.00.00.01.01.01.00.01.00.01.00.00.00.00.06.06.08.059
16910.00.00.01.023.00.00.00.01.01.01.00.01.00.01.00.00.00.00.06.06.08.055
16980.00.00.01.023.00.00.00.01.01.01.00.01.00.01.00.00.00.00.07.06.08.053
6390.00.00.01.021.00.00.00.01.01.01.00.01.00.01.00.00.00.00.05.06.08.052
10970.00.00.01.022.00.00.00.01.01.01.00.01.00.01.00.00.00.00.06.06.08.052
11140.00.00.01.022.00.00.00.01.01.01.00.01.00.01.00.00.00.00.08.06.08.051
17030.00.00.01.023.00.00.00.01.01.01.00.01.00.01.00.00.00.00.08.06.08.050
6500.00.00.01.021.00.00.00.01.01.01.00.01.00.01.00.00.00.00.07.06.08.047
6340.00.00.01.021.00.00.00.01.01.01.00.01.00.01.00.00.00.00.04.06.08.044
3670.00.00.01.020.00.00.00.01.01.01.00.01.00.01.00.00.00.00.07.06.08.043