Converging Initiatives: Geospatial Insights into Community Health, Agriculture, and WASH

This notebook presents an analysis of geospatial data from a project focusing on three critical thematic areas: Water, Sanitation, and Hygiene (WASH), health, and agriculture. The primary objective is to identify regions where the project's activities are converging and performing well, as well as areas that require further improvement. By mapping these activities across the thematic areas, the analysis aims to provide insights into the project's overall effectiveness and inform strategic interventions.

This project was part of my work during Cohort 1 of the Analytics for a Better World Fellowship (ABW) in 2022. ABW emphasizes the "art of the feasible," equipping individuals from non-profits with the tools and techniques needed to implement data-driven solutions and guide organizations toward making informed decisions. This experience sparked my passion for geospatial data analysis.

The data used in this analysis is a small sample from a real-world project I worked on in Zambia. The geometry files used for mapping are publicly available and can be accessed through a simple online search.

The implementation outlined below assumes the project focuses on three key activities centered around health facilities: forming nutrition support groups, promoting improved agricultural practices, and drilling new boreholes in health facility catchment communities.

The nutrition support groups aim to educate community members about the importance of proper feeding practices for children under five. Improved agricultural activities help households maintain gardens using climate-smart techniques. The produce from these gardens supplements household nutrition, and any surplus can be sold to generate income, which may be invested in small businesses. Profits from these businesses can then be used to purchase other nutrient-rich foods, such as poultry and dairy.

Finally, the drilling of new boreholes ensures that communities have access to clean drinking water, helping to prevent waterborne diseases like diarrhea, which can result from consuming contaminated water.

In [55]:
#import libraries
        
        import pandas as pd
        import re
        import io
        import sys
        import folium.features
        import matplotlib.pyplot as plt
        import numpy as np
        import json
        import matplotlib
        import geopandas as gpd
        import folium
        import folium.plugins as plugins
        import branca.colormap as cm
        from folium import FeatureGroup
        
In [56]:
# Import the necessary libraries
        import geopandas as gpd
        
        # Specify the path to the GeoJSON file
        # GeoJSON is a format for encoding a variety of geographic data structures using JavaScript Object Notation (JSON).
        # It is commonly used to represent geographical features along with their associated non-spatial attributes.
        url = "ZMB_adm.json"
        district_geo = f"{url}"
        
        # Read the GeoJSON file into a GeoDataFrame
        # This will allow us to work with the geographic data in a structured format using GeoPandas.
        # try-except block to handle potential errors that might occur if the file is not found or if there are issues with the file format
        try:
            geoJSON_df = gpd.read_file(district_geo)
        except Exception as e:
            print(f"An error occurred: {e}")
        
        # Display the first few rows of the GeoDataFrame to inspect the data
        # This helps us verify that the data has been loaded correctly and gives us an initial look at the structure of the data.
        geoJSON_df.head()
        
Out[56]:
id ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 NAME_2 TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2 geometry
0 None 255 ZMB Zambia 1 Central 1 Chibombo District District POLYGON ((28.57138 -15.16938, 28.56549 -15.168...
1 None 255 ZMB Zambia 1 Central 2 Kabwe District District POLYGON ((28.16377 -14.61242, 28.17037 -14.608...
2 None 255 ZMB Zambia 1 Central 3 Kapiri Mposhi District District POLYGON ((27.10981 -14.39602, 27.11125 -14.377...
3 None 255 ZMB Zambia 1 Central 4 Mkushi District District POLYGON ((28.81862 -13.61394, 28.83084 -13.596...
4 None 255 ZMB Zambia 1 Central 5 Mumbwa District District POLYGON ((27.76539 -15.63296, 27.75987 -15.633...
In [57]:
# Rename the column containing the name of the district to make it easier to remember
        # This changes the column name from "NAME_2" to "district" for better readability and easier reference in subsequent analyses.
        geoJSON_df = geoJSON_df.rename(columns={"NAME_2": "district"})
        
In [58]:
# Check the first few rows to confirm that the renaming was successful
        geoJSON_df.head()
        
Out[58]:
id ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 district TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2 geometry
0 None 255 ZMB Zambia 1 Central 1 Chibombo District District POLYGON ((28.57138 -15.16938, 28.56549 -15.168...
1 None 255 ZMB Zambia 1 Central 2 Kabwe District District POLYGON ((28.16377 -14.61242, 28.17037 -14.608...
2 None 255 ZMB Zambia 1 Central 3 Kapiri Mposhi District District POLYGON ((27.10981 -14.39602, 27.11125 -14.377...
3 None 255 ZMB Zambia 1 Central 4 Mkushi District District POLYGON ((28.81862 -13.61394, 28.83084 -13.596...
4 None 255 ZMB Zambia 1 Central 5 Mumbwa District District POLYGON ((27.76539 -15.63296, 27.75987 -15.633...
In [59]:
# Import the necessary libraries
        import pandas as pd
        import numpy as np
        
        # Import indicators and health facility data
        # The na_values parameter specifies additional strings to recognize as NA/NaN.
        # The delimiter is set to ',' as the CSV files are comma-separated.
        indicators_df = pd.read_csv("indicators.csv", na_values="NA", delimiter=',', header=0, index_col=False)
        df_facilities = pd.read_csv("catchment_areas.csv", na_values="NA", delimiter=',', header=0, index_col=False)
        
        # Replace all zero values with NaN to ensure they do not affect average computations
        # This is useful for handling cases where zero values may represent missing data rather than actual zero counts.
        # Column abbreviations:
        # msg_groups - Mother Support Groups (number)
        # improved_techs - Improved Agricultural Technologies (number of households practicing)
        # new_boreholes - New Boreholes (number)
        indicators_df["msg_groups"] = indicators_df["msg_groups"].replace(0, np.nan)
        indicators_df["improved_techs"] = indicators_df["improved_techs"].replace(0, np.nan)
        indicators_df["new_boreholes"] = indicators_df["new_boreholes"].replace(0, np.nan)
        
        # Display summary statistics of the DataFrame
        # This provides an overview of the central tendency, dispersion, and shape of the dataset’s distribution.
        indicators_df.describe()
        
Out[59]:
msg_groups improved_techs new_boreholes latitude longitude
count 13.000000 13.000000 11.000000 13.000000 13.000000
mean 3205.769231 1450.000000 8.727273 -11.948680 29.067516
std 3639.478400 916.518139 6.724447 2.226680 1.358950
min 393.000000 36.000000 1.000000 -14.996166 26.605040
25% 849.000000 947.000000 3.000000 -14.134750 28.258268
50% 1265.000000 1091.000000 8.000000 -11.478774 28.777910
75% 5341.000000 1822.000000 13.000000 -10.439146 29.927670
max 12804.000000 3664.000000 23.000000 -8.806687 31.683861
In [60]:
indicators_df.head()
        
Out[60]:
district msg_groups improved_techs new_boreholes latitude longitude
0 Chibombo 6560 2450 1.0 -14.834949 28.036740
1 Kabwe 849 36 23.0 -14.470852 28.352683
2 Kapiri Mposhi 5351 1091 15.0 -14.134750 28.097150
3 Kaputa 393 1822 NaN -8.806687 29.928590
4 Kasama 1265 2113 13.0 -10.439146 30.974062
In [61]:
# Rename columns in the df_facilities DataFrame
        # This changes the column names for better readability and consistency.
        df_facilities = df_facilities.rename(columns={
            "Country": "country",
            "District": "district",
            "Province": "province"
        })
        
        # Display the first 10 rows of the DataFrame to verify the renaming
        df_facilities.head(10)
        
Out[61]:
code name id country province district longitude latitude district_populations new_boreholes mother_support_groups community_gardens
0 kasama_army Army Clinic RzLkbnx9fyD Zambia Northern Kasama 31.183730 -10.20515 306462 0.0 37.0 0.0
1 bruneli Bruneli Health Post EbzBDFuHPvM Zambia Central Kabwe 28.632452 -14.36700 234055 2.0 12.0 0.0
2 bulambo Bulambo Health Post fmWhauKbHg2 Zambia Northern Luwingu 30.089080 -10.96715 179554 0.0 6.0 NaN
3 bulangililo Bulangililo Urban Health Centre SIES0o5kucs Zambia Copperbelt Kitwe 28.246390 -12.77889 738320 0.0 1020.0 0.0
4 bulungu Bulungu/Mumbwa Health Centre VRSnCaCXNxW Zambia Central Mumbwa 27.064960 -14.98257 242480 0.0 304.0 14.0
5 buntungwa Buntungwa Urban Health Centre jC3fHhoSEuZ Zambia Luapula Mansa 28.880730 -11.22584 253414 NaN 161.0 NaN
6 bwacha Bwacha Urban Health Centre frjoZHqGJ9e Zambia Central Kabwe 28.440660 -14.40797 234055 NaN 184.0 NaN
7 mansa_central Central Urban Health Centre k43pfm7F9YJ Zambia Luapula Mansa 28.949570 -10.94079 253414 NaN 127.0 NaN
8 chabilikila Chabilikila Rural Health Centre cLEnAtHGNM9 Zambia Luapula Nchelenge 28.706110 -9.54308 203432 0.0 48.0 74.0
9 chalele Chalele Health Facility gdXqK7FzYoK Zambia Northern Mbala 31.379060 -9.26570 268774 0.0 6.0 0.0
In [62]:
geoData = pd.merge(geoJSON_df, indicators_df, on="district")
        
In [63]:
# Display the first 5 rows of the DataFrame to verify the renaming
        geoData.head()
        
Out[63]:
id ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 district TYPE_2 ENGTYPE_2 NL_NAME_2 VARNAME_2 geometry msg_groups improved_techs new_boreholes latitude longitude
0 None 255 ZMB Zambia 1 Central 1 Chibombo District District POLYGON ((28.57138 -15.16938, 28.56549 -15.168... 6560 2450 1.0 -14.834949 28.036740
1 None 255 ZMB Zambia 1 Central 2 Kabwe District District POLYGON ((28.16377 -14.61242, 28.17037 -14.608... 849 36 23.0 -14.470852 28.352683
2 None 255 ZMB Zambia 1 Central 3 Kapiri Mposhi District District POLYGON ((27.10981 -14.39602, 27.11125 -14.377... 5351 1091 15.0 -14.134750 28.097150
3 None 255 ZMB Zambia 1 Central 5 Mumbwa District District POLYGON ((27.76539 -15.63296, 27.75987 -15.633... 5341 947 8.0 -14.996166 26.605040
4 None 255 ZMB Zambia 2 Copperbelt 10 Kitwe District District POLYGON ((28.48323 -12.75668, 28.48079 -12.754... 12804 1580 NaN -12.789445 28.258268
In [64]:
# Extract latitude and longitude columns from df_facilities
        # This creates a DataFrame containing only the latitude and longitude columns.
        locations = df_facilities[['latitude', 'longitude']]
        
        # Convert the DataFrame to a list of lists
        # Each inner list represents a location with latitude and longitude values.
        facility_locationlist = locations.values.tolist()
        
        # Display the length of the list of locations
        # This shows the total number of locations in the list.
        print(len(facility_locationlist))
        
        # Access and display the 8th entry in the list (index 7, as indexing starts at 0)
        # This shows the latitude and longitude of the 8th facility.
        print(facility_locationlist[7])
        
411
        [-10.94079, 28.94957]
        
In [65]:
# Filter df_facilities to include only rows where 'new_boreholes' is greater than or equal to 1
        # This creates a new DataFrame df_boreholes that contains facilities with at least one new borehole.
        df_boreholes = df_facilities[df_facilities['new_boreholes'] >= 1]
        
        # Inspect the resulting DataFrame
        df_boreholes.head()
        
Out[65]:
code name id country province district longitude latitude district_populations new_boreholes mother_support_groups community_gardens
1 bruneli Bruneli Health Post EbzBDFuHPvM Zambia Central Kabwe 28.632452 -14.367000 234055 2.0 12.0 0.0
13 chandamukulu Chandamukulu Rural Health Centre aaSlxWUsCC7 Zambia Northern Kasama 31.126944 -10.820000 306462 1.0 7.0 NaN
15 chankalamu Chankalamu Health Post SFjORf1oNL9 Zambia Central Kabwe 28.446314 -14.541684 234055 3.0 1.0 NaN
16 chankomo Chankomo Rural Health Centre Uq7cPxwf4aU Zambia Central Kapiri-Mposhi 29.026560 -13.905630 301722 1.0 122.0 NaN
40 chindwin_camp Chindwin Camp Urban Health Centre o5Hz9Axrf2H Zambia Central Kabwe 28.615370 -14.336400 234055 4.0 4.0 NaN
In [66]:
# how many facilities meet the condition
        len(df_boreholes)
        
Out[66]:
48
In [67]:
def scaled_feature(i, column, mx=100, mn=10, data=indicators_df):
            
           # Scales a feature in a DataFrame to a specified range.
        
           # Parameters:
           # i (int): The index of the row from which the feature is to be scaled.
           # column (str): The name of the column to be scaled.
           # mx (float): The maximum value of the scaled range (default is 100).
           # mn (float): The minimum value of the scaled range (default is 10).
           # data (pd.DataFrame): The DataFrame containing the data (default is indicators_df).
        
           # Returns:
           # float: The scaled value.
         
            # Extract the value from the specified row and column
            value = data.iloc[i][column]
        
            # Calculate the minimum and maximum of the column
            d_max, d_min = data[column].max(), data[column].min()
        
            # Check if d_max is equal to d_min to avoid division by zero
            if d_max == d_min:
                raise ValueError(f"Column '{column}' has the same min and max values. Scaling cannot be performed.")
        
            # Scale the value to the range [0, 1]
            scaled = (value - d_min) / (d_max - d_min)
        
            # Scale the value to the specified range [mn, mx]
            return float(scaled * (mx - mn) + mn)
        

Create a choropleth map using Folium to visualize the number of Nutrition Support Groups across different districts in Zambia, including markers for improved agricultural technologies, new boreholes, and health facilities.

In [68]:
import geopandas as gpd
        import folium
        import folium.plugins as plugins
        import branca.colormap as cm
        import pandas as pd
        
        # Set CRS for geoData
        geoJSON_df.crs = "EPSG:4326"
        
        # Aggregate data to get total number of boreholes and facilities per district
        boreholes_per_district = df_boreholes.groupby('district')['new_boreholes'].sum().reset_index()
        facilities_per_district = df_facilities.groupby('district')['name'].count().reset_index()
        community_gardens_per_district = df_facilities.groupby('district')['community_gardens'].sum().reset_index()
        facilities_per_district.rename(columns={"name": "total_facilities"}, inplace=True)
        
        # Merge aggregated data with geoJSON_df
        geoJSON_df = geoJSON_df.merge(indicators_df[['district', 'msg_groups']], on='district', how='left')
        geoJSON_df = geoJSON_df.merge(boreholes_per_district, on='district', how='left')
        geoJSON_df = geoJSON_df.merge(community_gardens_per_district, on='district', how='left')
        geoJSON_df = geoJSON_df.merge(facilities_per_district, on='district', how='left')
        
        # Create the Folium map centered on Zambia
        m = folium.Map(location=[-13.1, 27], zoom_start=6.3, width='100%', height='100%', control_scale=True, tiles='CartoDB Positron')
        
        # Choropleth layer with hover functionality
        choropleth = folium.Choropleth(
            geo_data=geoJSON_df,
            data=indicators_df,
            columns=['district', 'msg_groups'],
            key_on="feature.properties.district",
            fill_color="YlGnBu",
            fill_opacity=0.7,
            line_opacity=0.2,
            bins=5,
            legend_name="# of Nutrition Support Groups",
            name="Nutrition Support Groups Density",
            highlight=True
        ).add_to(m)
        
        # Add hover functionality with GeoJsonTooltip
        folium.GeoJson(
            geoJSON_df,
            style_function=lambda feature: {
                'fillColor': '#ffffff00',
                'color': '#000000',
                'weight': 0.1,
                'dashArray': '5, 5',
                'fillOpacity': 0,
            },
            tooltip=folium.GeoJsonTooltip(
                fields=['district', 'total_facilities', 'msg_groups', 'new_boreholes', 'community_gardens'],
                aliases=['District: ',  'Health Facilities: ', 'Nutrition Support Groups: ', 'New Boreholes: ','Community Gardens: '],
                localize=True,
                sticky=False,
                labels=True,
                style="""
                    background-color: #F0EFEF;
                    border: 1px solid black;
                    border-radius: 3px;
                    box-shadow: 3px;
                """,
                max_width=300,
            )
        ).add_to(choropleth)
        
        # Improved Agricultural Technologies Layer
        group0 = folium.FeatureGroup(name='<span style="color: #007580;">Improved Agricultural Technologies</span>')
        for i in range(len(indicators_df)):
            folium.CircleMarker(
                location=[indicators_df.iloc[i]['latitude'], indicators_df.iloc[i]['longitude']],
                popup="Improved Agricultural Technologies " + str(indicators_df.iloc[i]['district']) + ' ' + str(indicators_df.iloc[i]['improved_techs']),
                radius=scaled_feature(i, 'improved_techs', mn=5, mx=20),
                color='#007580',
                fill=True,
                fill_color='#007580'
            ).add_to(group0)
        m.add_child(group0)
        
        # New Boreholes Layer
        colormap = cm.LinearColormap(colors=['orange', 'blue', 'red'], vmin=0, vmax=5)
        for i in range(len(df_boreholes)):
            folium.Circle(
                location=[df_boreholes.iloc[i]['latitude'], df_boreholes.iloc[i]['longitude']],
                radius=20,
                fill=True,
                color=colormap(df_boreholes.iloc[i]['new_boreholes']),
                popup="New Boreholes " + str(df_boreholes.iloc[i]['new_boreholes']),
                fill_opacity=0.5
            ).add_to(m)
        m.add_child(colormap)
        colormap.caption = '# of Newly Installed Boreholes'
        
        # Health Facilities Layer with Hover Functionality
        group2 = folium.FeatureGroup(name='Health Facilities')
        marker_cluster = folium.plugins.MarkerCluster().add_to(group2)
        
        for point in range(len(facility_locationlist)):
            folium.Marker(
                location=facility_locationlist[point],
                popup=folium.Popup(df_facilities['name'][point], max_width=200),
                tooltip=folium.Tooltip(f"Facility: {df_facilities['name'][point]}")
            ).add_to(marker_cluster)
        
        group2.add_to(m)
        
        # Add layer control
        folium.LayerControl().add_to(m)
        
        # Display the map
        m
        
Out[68]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Summary

The map visualizes health, agriculture, and WASH indicators across 13 districts in Zambia, providing insight into the distribution of various interventions. The base layer is a chloropleth map that color codes districts by the number of Nutrition Support Groups, with darker shades indicating higher numbers. Central and Copperbelt provinces, particularly Kitwe district, have the highest concentration of support groups.

Clustered markers show a higher density of health facilities in Northern and Central provinces, while circle markers represent other key indicators: orange and blue circles for newly installed boreholes, predominantly in Northern Zambia, and green circles for the adoption of improved agricultural technologies.

The Northern and Luapula provinces show significant development efforts across all sectors, suggesting a faster rate of convergence compared to Copperbelt and Central provinces. To ensure uniform intervention coverage across all districts, it is recommended to revisit the implementation strategy and consider adjustments that will promote equitable progress across the country.

Conclusion

This GeoPandas project successfully visualized the spatial distribution of key health, agriculture, and WASH indicators across 13 districts in Zambia. The maps revealed significant regional disparities, with Northern and Luapula provinces showing higher levels of intervention across multiple sectors. This spatial analysis highlights the need for targeted strategies to achieve uniform development across all districts, ensuring that no region is left behind in the pursuit of convergence and sustainable growth.