chemistry_utilities
Molecular graph analysis for lipid head/tail detection.
funccheck_branch_points_not_in_cycles(mol, branch_point_indices, exclude_elements=[])Filter branch points to those not part of any ring.
parammolrdkit.Chem.MolRDKit molecule object.
parambranch_point_indiceslist of intAtom indices representing branch points.
paramexclude_elementslist of str= []Element symbols to keep even if they are in a cycle.
Returns
list of intBranch point indices that are not in a cycle (or whose element
is in exclude_elements).
funcremove_leaves(graph)Remove leaf nodes (nodes with only one connection) from the graph.
paramgraphReturns
Nonefuncdetect_lipid_parts_from_smiles_modified(smiles, head_search_radius=3, min_tail_distance=6)Detect head group, tail termini, and branch points of a lipid from SMILES.
Parse the molecular graph, trim terminal atoms, identify branch points, and classify endpoints as head group or tail atoms.
paramsmilesstrSMILES string of the lipid molecule.
paramhead_search_radiusint= 3Maximum graph distance to search for head-group heteroatoms from trimmed endpoints. Default is 3.
parammin_tail_distanceint= 6Minimum graph distance from the head-group branch point required for a valid tail endpoint. Default is 6.
Returns
int or NoneAtom index of the detected head group, or None if parsing fails.
funcclassify_endpoints(mol, graph, endpoints, branch_indices, head_search_radius, min_tail_distance=10)Classify trimmed-graph endpoints as head group or tail atoms.
Search for the head group by looking for terminal hydroxyl, nearby nitrogen, farthest nitrogen, farthest oxygen, or any heteroatom (in that priority order). Remaining endpoints that pass distance and element filters become tails.
parammolrdkit.Chem.MolRDKit molecule object.
paramgraphnetworkx.GraphFull molecular connectivity graph.
paramendpointslist of intEndpoint atom indices from the twice-trimmed graph.
parambranch_indiceslist of intAtom indices of non-cyclic branch points.
paramhead_search_radiusintMaximum graph distance to search for head-group heteroatoms.
parammin_tail_distanceint= 10Minimum graph distance from the head-group branch point for a valid tail. Default is 10.
Returns
int or NoneAtom index of the detected head group.
funcfind_closest_node_single_source(graph, source_node, candidate_nodes, cutoff=2)Find the closest candidate node to a source using shortest-path distance.
paramgraphnetworkx.GraphMolecular connectivity graph.
paramsource_nodeintStarting node index.
paramcandidate_nodeslist of intNode indices to consider as targets.
paramcutoffint= 2Maximum path length to search. Default is 2.
Returns
int or NoneIndex of the closest candidate, or None if none reachable.
funcmap_to_original_terminals(graph, tail_indices)Map trimmed-graph endpoints back to the nearest original terminal atoms.
paramgraphnetworkx.GraphFull molecular connectivity graph (before trimming).
paramtail_indiceslist of intAtom indices from the trimmed graph to map back.
Returns
list of intCorresponding terminal atom indices in the original graph.
funcremove_duplicates_and_sort(connections)Deduplicate and sort a list of atom-index pairs.
paramconnectionslist of list of intPairs (or sequences) of atom indices, possibly with duplicates or reversed ordering.
Returns
list of list of intUnique connections, each internally sorted.
funcanalyze_molecular_graph(mol, headgroup_index, tail_indices, branch_indices)Partition a molecular graph into segments between key structural points.
Trace paths between head group, tail termini, and branch points via BFS, then assign every remaining atom to the nearest segment.
parammolrdkit.Chem.MolRDKit molecule object.
paramheadgroup_indexintAtom index of the head group.
paramtail_indiceslist of intAtom indices of tail termini.
parambranch_indiceslist of intAtom indices of branch points.
Returns
list of list of intSegments of atom indices, each sorted, covering the full molecule.
funcfind_connections_from_point(adjacency_list, start, points_of_interest)Find all BFS paths from a start atom to other points of interest.
paramadjacency_listlist of list of intPer-atom neighbor lists for the molecule.
paramstartintStarting atom index (must be in points_of_interest).
parampoints_of_interestset of intAtom indices at which to terminate paths.
Returns
list of list of intEach entry is a path (list of atom indices) from start to another
point of interest.
funcassign_adjacent_atoms(mol, segments, points_of_interest)Assign unassigned atoms to the nearest existing segment.
Atoms not already in any segment and not in points_of_interest are
appended to whichever segment contains their closest neighbor by
shortest-path distance.
parammolrdkit.Chem.MolRDKit molecule object.
paramsegmentslist of list of intExisting segments of atom indices (modified in place).
parampoints_of_interestset of intHead, tail, and branch atom indices (excluded from assignment).
Returns
list of list of intThe input segments with unassigned atoms appended.
funcvisualize_lipid_parts_from_smiles(smiles, output_file='lipid_parts.png')Visualize detected lipid head and tail groups from SMILES.
Render a 2D image of the molecule with head (red), tail (blue), and branch (green) atoms highlighted. Intended for use in a Jupyter notebook.
paramsmilesstrSMILES string of the lipid molecule.
paramoutput_filestr= 'lipid_parts.png'Path for the output PNG image. Default is "lipid_parts.png".
Returns
tuple of (rdkit.Chem.Mol, int, list of int, list of int) or None(mol, head_index, tail_indices, branch_indices) on success,
or None if the molecule cannot be parsed.
funccreate_lipid_assignment(mol, head_index, tail_indices, branch_indices, output_file='lipid_assignment.png')Create a color-coded visualization of lipid segment assignments.
Partition the molecule into segments between head, tail, and branch points, then render each segment in a distinct color.
parammolrdkit.Chem.MolRDKit molecule object.
paramhead_indexintAtom index of the head group.
paramtail_indiceslist of intAtom indices of tail termini.
parambranch_indiceslist of intAtom indices of branch points.
paramoutput_filestr= 'lipid_assignment.png'Path for the output PNG image. Default is "lipid_assignment.png".
Returns
list of list of intSegments of atom indices used in the visualization.
