Compound Group Population Project

 

Summary

Authoritative scientific bodies have identified hundreds of groups of chemicals that share a common chemical structure - such as containing the element lead or a methylmercury cation - that these bodies have determined are all associated with a serious health hazards. These groups are an important part of hazard list screening in tools like Pharos, allowing association of hazards with substances that haven’t been individually listed on the authoritative hazard lists.

The challenge is that there is no authoritative listing of the tens of thousands of hazardous chemicals which are included in those groups. In Pharos we’re building the tools to solve this problem and we’d like the help of chemists, toxicologists, and data scientists like yourselves to complete the job.  The vision of this process is to develop a transparent, scientific, peer reviewed methodology for populating members of compound groups to be broadly used by researchers and programs in hazard screening with a public repository for definitions and compound group members.

Problem

Compound Groups used to identify hazardous substances are not well defined. Pharos relies heavily on authoritative hazard listings to identify associations between substances and human and environmental health hazards, such as cancer and aquatic toxicity. Authoritative bodies of scientists - primarily under the auspices of national and international governmental agencies, such as the US EPA and the World Health Organization - review the science and develop a consensus on these important listings, resulting in lists of carcinogens, reproductive toxicants, aquatic toxicants and others.  

In some cases, these bodies will list a series of individual compounds associated with these hazards. In other cases, however, the bodies identify whole classes of related substances. The evidence may be compelling that any compound that contains a certain element (such as lead) or is based upon a particular structure (such as lead bound to carbon chains) is likely to have the same hazard. Pharos includes over 600 of these groups. 

Rarely do these agencies attempt to list all of the substances that are members of these hazardous groups, as some may contain thousands of chemicals.  The agencies generally leave it to manufacturers or the public to check if the chemicals they use have these characteristics. This might be workable with a handful of groups, but it has become overwhelming as the number of groups has grown into the hundreds.

Now that list based hazard screening - screening chemicals against the authoritative lists to identify potential hazards - is automated in tools such as Pharos and the HPD Builder, it is particularly important to fully populate these groups with individual members. Until the chemicals in each group are identified and listed, they can find their way into our products without anyone being aware of their hazards. Furthermore, without a common agreed upon listing of the chemicals in each group, different screening tools will come up with different results.   

Solution

HBN coordinates the Compound Group Population Project to identify the chemicals that should be included in groups used for hazard screening. Together we can help manufacturers and consumers avoid hazards they may have otherwise missed. Help us close this huge gap in list screening. 

We are tackling this problem by the following steps:

  1. Establish definitions of groups

  2. Develop search algorithms to apply to chemical structure databases to identify members of the groups.

  3. Populate lists of substances that are members of each group through use of these definitions and algorithm drive searches.

  4. Establish a public registry of the group definitions and algorithms to allow others to replicate (and test) this work

  5. Establish a public registry of the individual group members

  6. Use an open collaborative peer review process to improve these definitions and algorithms, establish credibility and build buy in.  

  7. Publish these definitions and algorithms as an open standard

  8. Encourage use of the open standard   - these definitions and algorithms - by tool developers and list publishers.to increase consistency.

  9. Update the list regularly

Scope

We are using Pharos as the registry for these definitions and algorithms and to facilitate collaboration. We are developing structure based algorithms and searching PubChem and other structural databases for group members that are then added to the compound group lists in Pharos. These compound groups define how the list screening process generates hazard listings for Pharos and the other tools that use its data, including the HPDC’s HPD Builder and BlueGreen’s ChemHat. 

To date, the Project has used PubChem and ChemIDplus searches to add compound group associations to existing Pharos substances as well as to find new substances with matching structures that are not previously listed in Pharos. We are exploring additional chemical databases that may provide additional structural search options.

We generally limit the scope of database searches to substances with a CASRN, under the assumption that excluding non CAS registered substances effectively limits the addition of large numbers of substances, at least in PubChem, which are experimental or pharmaceutical only, are not likely to be used in non pharmaceutical industry and would only serve to burden the database without improving its function.

Effect of the Project on Hazards

Whenever we populate a new compound group, the warnings that scientific bodies have associated with that group are associated with the substances in the group. This may change the hazard level for an endpoint or even the GreenScreen List Translator score. As part of the Interim Harmonization Project described below we do not implement these changes on a rolling basis as we complete the research, but instead roll them out on a coordinated basis with CPA.   

 
Current Status and Access

Through the Project we have made progress on the first 5 steps. We have established group definitions and added tens of thousands of members to groups. The full list of Compound Groups is available, including descriptions of how each was populated, the number of members, and the number of hazards. Additional detail on each, including a list of members is available in individual compound group profiles in Pharos. If you have a research or program need for large quantities of compound group data, contact support@pharosproject.net.

Terms of Use:  The definitions and substance lists developed under the Project are subject to the Terms of Use. Any entity which intends to use these definitions or substance lists for public or commercial use must do two things:

  • Notify HBN’s Pharos Project of the intended use by email notification at support@pharosproject.net.  

  • Clearly identify on the website or other media using the data that it was sourced from the Pharos Compound Group Population Project.

  • Required language: “The compound group definitions and data used here are provided by the Pharos Compound Group Population Project run by the Healthy Building Network.”

Provide a link as listed above to the Project home page. (https://pharosproject.net/compound-group-population-project)

Parties are encouraged to participate in the Project to define and populate more groups and collaborate on development of an open standard for groups and a public registry of substance members.

Next steps

Interim Harmonization Project: These groups are an integral part of hazard screening for the GreenScreen List Translator (GSLT) and the Health Product Declaration (HPD). Pharos is one of two automation systems that provide hazard screening services for these two programs. Differences in how automation systems populate these groups is one of several issues which can result in significant differences in automator hazard screening outcomes. Clean Production Action (CPA) is facilitating a harmonization process to reconcile differences between the two systems and generate a single list of groups and members.   HBN is participating in that harmonization process and contributing its work to date on this Compound Group Population Project, Use of this contribution is subject to the Terms of Use listed above.

CPA has committed to the vision to develop a transparent, scientific, peer reviewed methodology for populating members of compound groups and supports the long term project to fill the gaps described here. The official policy for using the results of this project for GreenScreen List Translation is published in the GreenScreen for Safer Chemicals Compound Group Policy on the GreenScreen Guidance and Resources page 

You can help

We need chemists who like puzzles to help us design the necessary chemical searches. If you think you can help, please review the documents below and post questions and ideas in the Compound Group Population Discussion or contact the coordinators of this project (listed below) directly.

Compound Group - Cheminformatics project: Using cheminformatics tools to perform complex queries on large chemical datasets. This is a currently active part of the project.

Join in the discussion: This project is discussed in the Compound Group Population Discussion.

Project coordinators

Michel Dedeo mdedeo@healthybuilding.net

Akos Kokai akokai@berkeley.edu