Featured Event: Curating Worldwide Scientific Content

Sponsored by the Silicon Valley and Puget Sound Sections of ACS. April 12th from 7-8pm. Online via Zoom, Free, Registration required

Abstract

Chemical informatics technology is improving access to the text and images of patents and the scientific literature through computer-curation. In the example of a collaboration between a team of computer scientists at Google Patents and chemical informaticians at Ontochem, annotated data is produced from the patents of ~138 countries translated from ~ 58 languages as well as from Google Scholar and Books. The annotators identify entities such as chemical names, diseases, proteins, and genes that are then post-processed into machine-readable formats, normalized, and labeled with unique ontology concept identifiers (OCIDs). Chemical names and images are postprocessed using name-to-structure and image-to-structure programs, producing associated metadata, e.g., SMILES strings, InChIs, and InChIkeys. In this manner worldwide patents and the scientific literature are rendered searchable by structure-substructure searching. This is demonstrated on the freely available Google Patents platform. Data derived from patents are downloadable in machine-readable formats (SMILES), while data derived from the scientific literature is available via new commercial offerings such as Dimensions from Digital Science.

The output of non-copyrighted data of >54 billion scientific and related entities is donated to NIH and made available in PubChem and in Google Big Query. These collaborative efforts provide researchers access to previously unavailable resources, relevant in the areas of pharmaceuticals, publishing, health care, and environmental science. Integration of this data with massive amounts of additional scientific information uploaded into the Big Query environment provides a rich resource for machine-learning and widespread value for the worldwide scientific community.

Bio

Stephen (Steve) K. Boyer, PhD., Collabra, Inc.

https://www.linkedin.com/in/stephen-k-boyer-15529

Steve Boyer works in the interdisciplinary space of chemistry and computer science. By automating the curation of patents and the scientific literature, his goal is to expand the scientific community’s understanding and use of published information in chemistry, the physical sciences, medicine and intellectual property.

His professional history combines ten years of synthesis and scale-up in the pharmaceutical industry (Ciba-Geigy/Novartis) with 25 years in technical capacities at IBM Research. He has participated in several start-ups and currently serves as a science advisor at Google, Digital-Science, OntoChem and several other cheminformatics enterprises. He played a major role in getting patent information publicly available on the early days of the internet.

Steve holds a BA from Temple University in Philadelphia and a PhD in synthetic organic chemistry from Tufts University. His publications and patents range from new drug syntheses to text+image analytics. https://bit.ly/3Ky5XSU

Written by