Medical Cost Personal Dataset
TabularDatabase Contents License
This dataset contains demographic and personal health information for individuals, along with the corresponding medical insurance charges billed to them. It is commonly used to build predictive models for insurance costs and to explore relationships between factors such as age, BMI, smoking status, and region on medical expenses.
Features:
age: Age of the primary beneficiary (integer)sex: Gender of the individual (male,female)bmi: Body mass index, providing a measure of body fat based on height and weight (float)children: Number of children/dependents covered by the insurance (integer)smoker: Smoking status of the individual (yes,no)region: Residential area in the US (northeast,northwest,southeast,southwest)charges: Individual medical costs billed by health insurance (float, in USD)
Applications:
This dataset is frequently used in regression modeling, cost prediction, and data visualization tasks. It is ideal for learning how lifestyle and demographic factors impact healthcare expenses and serves as a foundational dataset for applied machine learning in health economics.