We regularly collect and process over 120 county-level variables from a variety of federal and state agencies. Learn more about our process below.
Data Collection
We collect data from publicly available datasets that are made available by agencies such as the US Census Bureau, the Centers for Disease Control and Prevention and the Texas Department of State Health Services. Many of the variables we collect are drawn from the Agency for Healthcare Research and Quality's Social Determinants of Health Database, which compiles data from a variety of federal sources. When working with data compiled in this way, we attempt where possible to provide the original sources and dates of individual variables.
When presenting data from the US Census Bureau's American Community Survey, we use five-year estimates because many Texas counties do not have sufficient populations to be included in the Survey's one-year estimates and because our county profile data is intended to present a broad overview rather than to capture temporal change. When reporting the year these variables were updated, we give the last year of the collection period.
We update our dataset periodically and make no guarantee that it represents the most recently released data. Additionally, users should note that the most recently released data from federal and state agencies often lags the present by several years. Our metadata file includes the year that each variable was last updated.
Data Processing
After organizing and preserving original data files, we perform necessary processing and merge the variables into our main dataset. Typically, the only processing that occurs at this point is the conversion of counts to rates. When calculating rates, we use appropriate historical population data from the US Census Bureau. We have chosen not to release counts in order to prevent the dataset from becoming unwieldy, however we can often provide these on request.
Public data, especially related to behaviors and health, is often supressed at the agency level when release might lead to the identification of individuals. In our dataset, missing values are represented by blanks.
During this process we also produce longitudinal files for some variables. While we intend to continue updating these files, we make no guarantess about their continued provision or maintenance
Our full processing workflow is available in our Github repository.
Visualization and Publication
We produce several kinds of visualizations that are intended for use on this site and also for republication:
- Visualizations that appear on our county profile pages. These visualizations provide broad context public health topics and are updated periodically to reflect the most recent data.
- Additional visualizations that are available through our visualization library. These visualizations are typically available for every county in Texas.
- Visualizations that are produced for our reports or as part of our work with partner organizations
All published visualizations include the data source and the year the data was last updated.
For more information on using our datasets and visualizations, see our usage guide and use agreements.