Presentation

On this page, I present how to exploit FINESS data for EHPAD (nursing home/retirement houses). Moreover, I present how to create a panel of EHPAD and various summary statistics. We first start with the 2019 data.

Open FINESS data

The FINESS dataset has the universe of all health care providers in France. It provides information on the type of health care (liberal professions, hospitals, etc.) and the location (either the postal adress or the georeference, depending on the dataset).

It is extracted from here. I accessed the database on February 29, 2024.

The way variables are coded changes from a year to another. So far, I open year 2019 (as I match with some census data, only avaible in 2019, in another project).

path = "/Users/mmoglia/Dropbox/research/raw_data/finess"
library(dplyr)
library(sf)
library(foreign)
library(haven)
library(ggplot2)
library(COGugaison)
library(viridis)

I then open the dataset.

  # I. Open the dataset
  
finess = read.csv2(paste0(path,"/raw/etalab_stock_et_20191231.csv")) %>%
  select(starts_with("date"),
         libmft,libcategetab,
         siret,cog,
         nofinesset,nofinessej) %>%
  rename("codgeo"="cog")

ehpad = finess %>% 
  filter(libcategetab=="EHPAD") %>%
  select(-dateautor,-datemaj,-libcategetab)

count(ehpad)

As you notice, the libcategetab directly provides information on EHPAD, and I filter this specific category and remove unwanted variables. It leaves us with 7520 EHPAD in 2019 in France.

Make density map

To visualize the data, we can compute the density of EHPAD per 1,000 inhabitants across commuting zones (I choose CZ because cities are too numerous in France, and we would end up with many NAs).

Open CZ shapefile for 2020

The shapefile is downloaded here.

ze.shp = read_sf(paste0(path,"/../shp/ze2020/fond_ZE2020_geo20.shp")) %>%
  rename("codgeo"="code")

Population in 2019 by age class

From INSEE, we obtain the population by age class (5 years) and by census year. The latest being 2019, we choose this one.

pop = readxl::read_xls(paste0(path,"/../population/pop-sexe-age-quinquennal6819.xls"),
                       sheet="COM_2019",range="A14:AT38228",col_names=T)

# Aggregate age class by 0-20, 20-60, and 60+
pop = pop %>%
  mutate(codgeo=paste0(DR,CR)) %>%
  mutate(young = rowSums(select(.,starts_with("ageq_rec01"), starts_with("ageq_rec02"), starts_with("ageq_rec03"), starts_with("ageq_rec04")))) %>%
  mutate(active = rowSums(select(.,starts_with("ageq_rec05"), starts_with("ageq_rec06"), starts_with("ageq_rec07"), starts_with("ageq_rec08"),
                                 starts_with("age_rec09"),starts_with("age_rec10"),starts_with("age_rec11"),starts_with("age_rec12")))) %>%
  mutate(old = rowSums(select(.,starts_with("ageq_rec13"), starts_with("ageq_rec14"), starts_with("ageq_rec15"), starts_with("ageq_rec16"),
                              starts_with("age_rec17"),starts_with("age_rec18"),starts_with("age_rec19"),starts_with("age_rec20")))) %>%
  mutate(pop=young+active+old) %>%
  select(codgeo,young,active,old,pop) 

# Change the current zipcode to codgeo, and aggregate at the ZE level thanks to COGuaison package
pop = pop %>%
  changement_COG_varNum(annees=c(2019:2020),codgeo_entree = "codgeo",donnees_insee = TRUE,agregation=TRUE) %>%
  nivsupra(codgeo_entree="codgeo",COG=2020,agregation=F,nivsupra="ZE2020") %>%
  group_by(ZE2020_codgeo) %>%
  summarise_at(c("young","active","old","pop"),sum,na.rm=TRUE) %>%
  mutate(year=2020) %>%
  ungroup() %>%
  rename("ze2020"="ZE2020_codgeo")

We now need to merge these population data with the ehpad database. Before that, we aggregate the ehpad data base by city, before merging it to the ze.shp database (which is at the city level but contains the crosswalk between cities and their 2020 ZE).

ehpad.city = ehpad %>%
  group_by(codgeo) %>%
  mutate(count = n())

We can now merge ehpad.city with the shapefile and aggregate at the CZ level!

ehpad.shp = ehpad.city %>% 
  right_join(ze.shp) %>%
  st_as_sf() %>%
  filter(!grepl("^(01|02|03|04|06)", ze2020)) %>%
  group_by(ze2020) %>%
  summarize(geometry = st_union(geometry),
            count = sum(count,na.rm=T)) %>%
  left_join(pop)

Plot

ggplot(ehpad.shp) +
  geom_sf(aes(fill=log(count/old*1000)),color=NA) +
  scale_fill_viridis(option="E",name=" ") + 
  labs(title="Number of Ehpad in 2019",
       subtitle="Log ratio of EHPAD for 1000 inh. aged 60+, ZE level",
       caption="Own computations. Source: FINESS, RP2019") +
  theme_void()+
  theme(plot.title = element_text(hjust = 0.5,family="serif",face="bold"),
        plot.subtitle=element_text(hjust=0.5,family="serif"))