graph_nd package
- class graph_nd.GraphRAG(db_client=None, llm=None, embedding_model=None)[source]
Bases:
objectGraphRAG serves as the core end-to-end implementation to - create and manage knowledge graph data - mapping from structured & unstructured sources - create and mange graph schemas which are core to data validation and LLM/agent prompting - generation of tools & agents for GraphRAG workflows
- db_client
A database client for managing underlying graph data (Assumed to be a Neo4j driver).
- llm
A Langchain LLM instance for various data, schema, and agent tasks.
- schema
A Schema instance to manage schema-related operations.
- data
A Data instance for managing knowledge graph data.
- agent_executor
A LangGraph GraphRAG agent.
- class Data(graphrag, db_client, llm, embedding_model=None)[source]
Bases:
objectData management for the knowledge graph. This class is responsible for all write and admin operations on the knowledge graph. All source data is mapped through this class’s methods.
- extract_from_texts(texts, source_name, nodes_only=True, max_workers=10, sub_schema=None)[source]
Performs entity extraction on a list of text chunks according to the graph schema and outputs the result in the form of a GraphData object which contains structured node/relationship records and schem references.
This method asynchronously processes a list of input texts and extracts relevant data to a knowledge graph structure using an LLM-powered workflow. Extraction is driven by the GraphSchema.
- Parameters:
texts (
List[str]) – List[str] A list of input strings (text chunks) from which data will be extracted and structured into a graph representation.source_name (
str) – str A source identifier or label associated with the input texts. Used for additional context in LLM workflow.nodes_only – bool A flag indicating whether to extract only nodes (True) or both nodes and relationships (False) during entity extraction. Defaults to True.
max_workers – int The maximum number of concurrent workers to use during entity extraction. This parameter affects the level of parallelism when handling input texts. Defaults to 10.
sub_schema (
SubSchema) – SubSchema A sub-schema specifying filtering criteria (nodes, patterns, relationships) for the target graphSchema as well as additional description for guiding LLM entity extraction. If not provided, the whole graphSchema is considered. Default is None.
- Return type:
GraphData- Returns:
- GraphData
A graph representation of the extracted data.
- get_table_mapping_type(table_name, table_preview)[source]
Determines the type of the provided table based on its name, a preview of its data, and the schema of the graph. This function uses a language model prompt to infer the table type in a structured manner.
- get_table_node_mapping(table_name, table_preview)[source]
Generate a table-to-node mapping for the specified table.
This function generates a mapping between a table and node by using an LLM with structured output. It validates the LLM instance, invokes a prompt with the given table details, and utilizes the structured LLM for schema inference to produce the mapping.
- Parameters:
- Return type:
NodeTableMapping- Returns:
- NodeTableMapping
The mapping object that represents the relationship between the table and its corresponding nodes.
- get_table_relationships_mapping(table_name, table_preview)[source]
Generates a table-to-relationships mapping using an LLM with structured outputs.
This method accepts a table name and its preview to generate a relationship mapping by leveraging a large language model (LLM). It invokes a specific prompt template for generating the relationships mapping based on the table’s name, preview, and schema information. The LLM output is then processed to create a RelTableMapping object which captures the relationships and start and end nodes in the provided table.
- Parameters:
- Return type:
RelTableMapping- Returns:
- RelTableMapping
A table-to-relationships mapping.
- merge_csv(file_path, source_metadata=True)[source]
Merges data from a CSV file into the knowledge graph by determining its table type and invoking the appropriate merge method.
The method identifies the type of data within the CSV file (either a single node or relationships) and creates a mapping to convert it to the graph schema using an LLM-powered workflow.
- Parameters:
file_path (
str) –- str
The path to the CSV file to be processed.
- source_metadata (Union[bool, Dict[str, Any]]): Metadata indicating the
source information of the incoming data. If True, default source metadata will be generated; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
- merge_csvs(file_paths, source_metadata=True)[source]
Merges data from CSV files into the knowledge graph by determining table types and invoking appropriate merge methods.
The method identifies the types of data within each CSV file (either a single node or relationships) and creates mappings to convert them to the graph schema using an LLM-powered workflow.
- Parameters:
- List[str]
The paths to the CSV files to be processed.
- source_metadata (Union[bool, Dict[str, Any]]): Metadata indicating the
source information of the incoming data. If True, default source metadata will be generated; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
- merge_node_csv(file_path, source_metadata=True)[source]
Maps a CSV file to nodes and merges into the knowledge graph.
This method reads a CSV file to retrieve table records and table preview data, determines the node mapping for the table based on the file name and table preview using an LLM workflow, and finally merges the table data into the knowledge graph using the node mapping. Optional source metadata can be passed or it will be generated with default values.
- Parameters:
file_path (
str) –- str
The path to the CSV file to be processed.
- source_metadata (Union[bool, Dict[str, Any]]): Metadata indicating the
source information of the incoming data. If True, default source metadata will be generated; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
- merge_node_table(table_records, node_mapping, source_metadata=True)[source]
Merges data from the provided table records into knowledge graph nodes based on the specified node mapping and source metadata.
- Parameters:
table_records (List[Dict]) – The records of the table from CSV or other sources to be merged into the knowledge graph.
node_mapping (NodeTableMapping) – The mapping between the table and node schema
source_metadata (Union[bool, Dict[str, Any]]) – Metadata indicating the source information of the incoming data. If True, source metadata will be inferred; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
- merge_nodes(label, records, source_metadata=True)[source]
Merges node data into the graph using the provided label and record data.
- Parameters:
label (str) – The label of the node type to merge (e.g., “Person”, “Movie”). The label should match a defined node in the graph schema.
List[Dict] (records) – A list of dictionaries representing the data for each node to be merged. Each record MUST include the id field as defined in the node schema, along with any other optional properties expected by the schema.
source_metadata (
Union[bool,Dict[str,Any]]) – Union[bool, Dict[str, Any]], optional Metadata for the source being merged. - If set to True, default source metadata is prepared and added to a __Source__ node in the graph. A __source_id property is added and/or appended to each node which maps to the id property of __Source__ node - If False, no source metadata is added to the graph. - If a custom dictionary is provided, source metadata is added as in the case of True and the dictionary properties override the default ones. Default is True.
Example
label = “Person” records = [
{“person_id”: 1, “name”: “Alice”, “age”: 30}, {“person_id”: 2, “name”: “Bob”, “age”: 25}
]
- Expected Behavior:
Creates or updates nodes labeled “Person” using the records
- Raises:
ValueError – If the label is not found in the graph schema
- merge_pdf(file_path, chunk_strategy='BY_PAGE', chunk_size=10, nodes_only=True, max_workers=10, source_metadata=True, sub_schema=None)[source]
Merges data from a pdf file into the knowledge graph
This method 1. Splits a pdf file into text chunks 2. Performs parallelized/asynchronous entity extraction on text chunks according to the graph schema using an LLM-powered workflow 3. Merges extracted subgraphs (nodes and relationships) into knowledge graph
- Parameters:
file_path (str) – The file path of the PDF document that needs to be processed.
chunk_strategy (str, optional) – The strategy for splitting text into chunks. Currently only supports “BY_PAGE” which splits by pdf page. If you need custom chunking strategies, pre-process your PDF into text chunks and use the merge_texts method with your resulting chunks instead. Default is “BY_PAGE”.
chunk_size (int, optional) – The size of the chunks for text splitting based on the strategy. Default is 10.
nodes_only – bool A flag indicating whether to extract only nodes (True) or both nodes and relationships (False) during entity extraction. Defaults to True.
max_workers – int The maximum number of concurrent workers to use during entity extraction. This parameter affects the level of parallelism when handling input texts. Defaults to 10.
source_metadata (Union[bool, Dict[str, Any]]) – Metadata indicating the source information of the incoming data. If True, default source metadata will be generated; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
sub_schema (
SubSchema) – SubSchema A sub-schema specifying filtering criteria (nodes, patterns, relationships) for the target graphSchema as well as additional description for guiding LLM entity extraction. If not provided, the whole graphSchema is considered. Default is None.
- Raises:
ValueError – If the file_path is invalid, empty, or not a supported PDF file.
RuntimeError – If the LLM validation fails or if any processing error occurs during text extraction or graph merging.
- merge_relationships(rel_type, start_node_label, end_node_label, records, source_metadata=True)[source]
Merges relationships into the database using the provided relationship type, start node label, end node label, and record data.
- Parameters:
rel_type (str) – The type of the relationship (e.g., “ACTED_IN”, “FRIENDS_WITH”). The type should match a defined relationship in the graph schema.
start_node_label (str) – The label of the starting node in the relationship (e.g., “Person”). This label should match a defined node schema.
end_node_label (str) – The label of the ending node in the relationship (e.g., “Movie”). This label should match a defined node schema.
records (Dict) – A dictionary (or list of dictionaries) representing the data for each relationship to be merged.
source_metadata (
Union[bool,Dict[str,Any]]) – Union[bool, Dict[str, Any]], optional Metadata for the source being merged. - If set to True, default source metadata is prepared and added to a __Source__ node in the graph. A __source_id property is added and/or appended to each node and relationship which maps to the id property of __Source__ node - If False, no source metadata is added to the graph. - If a custom dictionary is provided, source metadata is added as in the case of True and the dictionary properties override the default ones. Default is True.
- Required Fields in records:
start_node_id: The unique identifier of the starting node.
end_node_id: The unique identifier of the ending node.
Example
rel_type = “ACTED_IN” start_node_label = “Person” end_node_label = “Movie” records = [
{“start_node_id”: 1, “end_node_id”: “M101”, “role”: “Protagonist”}, {“start_node_id”: 2, “end_node_id”: “M102”, “role”: “Hacker”}
]
- Expected Behavior:
Creates or updates “ACTED_IN” relationships between the “Person” and “Movie” nodes.
- Raises:
ValueError – If the relationship type, start node label, or end node label is not found in the schema, or if required fields in records are missing.
- merge_relationships_csv(file_path, source_metadata=True)[source]
Maps a CSV file to relationships and their stat/end nodes and merges into the knowledge graph.
This method reads a CSV file to retrieve table records and table preview data, determines the relationships mapping for the table based on the file name and table preview using an LLM workflow, and finally merges the table data into the knowledge graph using the relationship mapping. Optional source metadata can be passed or it will be generated with default values.
- Parameters:
file_path (
str) –- str
The path to the CSV file to be processed.
- source_metadata (Union[bool, Dict[str, Any]]): Metadata indicating the
source information of the incoming data. If True, default source metadata will be generated; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
- merge_relationships_from_table(table_records, rel_mapping, source_metadata=True)[source]
Merges data from the provided table records into knowledge graph relationships and start and end nodes based on the specified relationship mapping and source metadata.
- Parameters:
table_records (List[Dict]) – The records of the table from CSV or other sources to be merged into the knowledge graph.
rel_mapping (NodeTableMapping) – The mapping between the table and relationships and start/end node schemas
source_metadata (Union[bool, Dict[str, Any]]) – Metadata indicating the source information of the incoming data. If True, default source metadata will be generated; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
- merge_texts(texts, source_name, nodes_only=True, max_workers=10, source_metadata=True, sub_schema=None)[source]
Performs entity extraction on a list of text chunks according to the graph schema, produces subgraphs (nodes and relationships), and merges them into the knowledge graph.
This method asynchronously processes a list of input texts, extracts relevant data to a knowledge graph structure using an LLM-powered workflow. Extraction is driven by the GraphSchema.
- Parameters:
texts (
List[str]) – List[str] A list of input strings (text chunks) from which data will be extracted and structured into a graph representation.source_name (
str) – str A source identifier or label associated with the input texts. Used for additional context in LLM workflow.nodes_only – bool A flag indicating whether to extract only nodes (True) or both nodes and relationships (False) during entity extraction. Defaults to True.
max_workers – int The maximum number of concurrent workers to use during entity extraction. This parameter affects the level of parallelism when handling input texts. Defaults to 10.
source_metadata (Union[bool, Dict[str, Any]]) – Metadata indicating the source information of the incoming data. If True, default source metadata will be generated; if False, no source metadata is used; or a dictionary can be passed to define custom metadata. Defaults to True.
sub_schema (
SubSchema) – SubSchema A sub-schema specifying filtering criteria (nodes, patterns, relationships) for the target graphSchema as well as additional description for guiding LLM entity extraction. If not provided, the whole graphSchema is considered. Default is None.
- nuke(delete_chunk_size=10000, skip_confirmation=False)[source]
Deletes all nodes, relationships, constraints, and search indexes from the database in a batch-wise manner. This method ensures efficient cleanup and resets the database to a blank state.
- Parameters:
delete_chunk_size – int, optional Number of rows to process per transaction during deletion, by default 10,000.
skip_confirmation – bool, optional If True, skips the confirmation prompt, by default False.
- class Schema(db_client, llm)[source]
Bases:
objectEncapsulates the knowledge graph schema which is used to validate data and prompt LLMs and agents for graph interactions.
- craft_from_json(schema_json, verbose=False)[source]
Crafts a schema object from JSON input using a large language model (LLM) for schema inference. This method validates the LLM, generates a structured prompt based on the input JSON, and invokes the LLM to produce the schema.
- Parameters:
- Returns:
The crafted schema object produced by the LLM.
- define(graph_schema)[source]
sets the schema exactly/explicitly using GraphSchema
- Parameters:
graph_schema (GraphSchema) – The exact schema to use.
- export(file_path)[source]
Exports the current schema to a JSON file.
- Parameters:
file_path (str) – The path to the file where the schema will be saved.
- from_existing_graph(exclude_prefixes=('_', ' '), exclude_exact_matches=None, text_embed_index_map=None, parallel_rel_ids=None, description=None)[source]
Generates a schema from existing graph data. U se for an existing graph database you are connecting for GraphRAG.
This function analyzes the existing database to generate node schemas and relationship schemas, combining them into a unified graph schema. Users can specify customization options such as excluding certain prefixes or specific exact matches for attributes, and optionally map text embeddings or parallel relationship IDs. A description for the schema can also be provided.
- Parameters:
exclude_prefixes – A tuple of strings containing prefixes. Node labels, relationship types, or properties starting with any of these prefixes are excluded, defaults to (“_”, “ “).
exclude_exact_matches – An optional set of exact node labels, relationship types, or property names to exclude from the schema, defaults to None if not provided.
text_embed_index_map (
Optional[Dict[str,str]]) – An optional dictionary mapping {text_embedding_index_name: text_property} where text_property is a node property that is used to calculate the embedding. This is required to use text embedding search fields for nodes. If not provided, no text embedding search fields will be included in the schema. Defaults to None.parallel_rel_ids (Optional[Dict[str, str]], optional) – A dictionary mapping relationship types to their parallel relationship ID property names: {rel_type: property_name}. This is only required if the user wishes to ingest more data while maintaining parallel relationships for specific node types (more than one instance of a relationship type existing between the same start and end nodes). Defaults to None.
description – Optional description of the generated graph schema. Exposed to LLM when accessing the graph through GraqphRAG.
- Returns:
The graph schema constructed, comprising node schemas and relationship schemas extracted from the existing database.
- Return type:
- from_json_like_file(file_path, verbose=False)[source]
Reads a JSON-like model from a file and crafts a GraphSchema object.
This method reads the content of a specified file containing a JSON-like model definition, and then processes it to generate a GraphSchema object using the craft_from_json method. Optionally, it can provide verbose output during the process.
- Parameters:
file_path – The path to the file containing the JSON-like model to load.
verbose – A boolean flag indicating whether verbose output of the resulting schema should be enabled
False. (. Defaults to) –
- Return type:
- Returns:
A GraphSchema object generated from the JSON-like model in the file.
- Raises:
IOError – If there’s an issue opening or reading the specified file.
Any other exception raised by the craft_from_json method in the crafting –
process. –
- infer(description)[source]
Infers the graph schema based on a description of the data.
- Parameters:
description (str) – A text description of the data for schema inference.
- infer_from_sample(text)[source]
Infers the graph schema based on a small sample of the data.
- Parameters:
text (str) – A sample of the data in text form.
- infer_from_use_case(use_case, data_source_models='No Details Available')[source]
Infers a schema from a given use case and external data sources using a structured large language model (LLM). This method generates a schema prompt based on the supplied context and invokes the LLM to produce a schema.
- Parameters:
- Returns:
An inferred schema object generated by the structured LLM model that represents the schema derived from the provided use case and external data sources.
- Raises:
ValidationError – If the LLM is not validated before invoking schema generation.
- load(file_path)[source]
Loads schema from a JSON file.
- Parameters:
file_path (str) – The path to the JSON file containing the schema.
- prompt_str()[source]
Returns the prompt string from the associated graph schema. This is created from a custom model_dump where Relationships serialize query patterns in the format: (:startNodeLabel)-[:TYPE]->(:endNodeLabel). This makes it easier for LLMs and humans to interpret.
- Returns:
The prompt string extracted from the schema’s definition.
- Return type:
- __init__(db_client=None, llm=None, embedding_model=None)[source]
Initializes the GraphRAG instance.
- Parameters:
db_client – The database client for managing the knowledge graph (Assumed to be a Neo4j driver)
llm (
Optional[BaseChatModel]) – LangChain LLM for handling inference, queries and response completions.embedding_model (
Optional[Embeddings]) – LangChain text embedding model to use for data and semantic search.
- agent(question)[source]
Answers a question using GraphRAG.
This method relies on an internal prebuilt LangGraph ReAct agent which it creates if it doesn’t already exist. This agent leverages the graph schema to customize tools prompts.
- Parameters:
question (str) – The question to be executed.
- aggregate(agg_instructions)[source]
Aggregates data from a database based on specific instructions. The method formulates a query based on the given aggregation instructions and executes it using a database client. The results of the execution are transformed into JSON format and returned as a string.
- Parameters:
agg_instructions (str) – Instructions that detail how the data should be aggregated. These instructions will be used to generate the query.
- Returns:
A JSON-formatted string representation of the aggregated data based on the executed query.
- Return type:
- Raises:
Any exceptions that may occur during query formulation or execution will propagate and are –
not directly handled within this method. –
- create_react_agent(**kwargs)[source]
A factory for creating Langgraph Agents backed with GraphRAG and Knowledge Graph
- Parameters:
**kwargs – Keyword arguments passed to the original create_react_agent.
- Returns:
The result of invoking create_react_agent.
- node_search(search_config, search_query, top_k=10)[source]
Performs a search operation on nodes using different modes such as full-text or semantic searches. The method executes the search by delegating operations to the respective helper functions based on the specified search type.
- Parameters:
search_config (Dict[str, str]) –
A dictionary specifying the configuration for the search operation. It must contain the following keys:
”search_type”: Determines the type of search to perform, either “FULLTEXT” for full-text search or “SEMANTIC” for embedding-based search.
”node_label”: The label of the node to search within the graph.
”search_prop”: The property of the node to search against.
search_query (str) – The query string used to perform the search
top_k (int) – The maximum number of results to return. Defaults to 10.
- Returns:
The results of the performed search operation, as provided by the corresponding helper function.
- query(query_instructions)[source]
Traverses a graph database based on specific instructions. The method formulates a traversal query based on the given query instructions and executes it using a database client. The results of the execution are transformed into JSON format and returned as a string.
- Parameters:
query_instructions (
str) – A string containing detailed instructions for the query.- Returns:
The JSON formatted result of the executed query.
- Return type:
- Raises:
KeyError – If a required key is missing during template invocation.
DatabaseExecutionError – If the query execution fails in the database.
- class graph_nd.GraphSchema(**data)[source]
Bases:
Element- export(file_path)[source]
Exports graph schema model to a JSON file.
- Parameters:
file_path (str) – The path to the file where the schema will be saved.
- get_node_properties(label)[source]
Gets the properties names from a node schema, including id name. Useful for constructing returns in Cypher queries as it avoids search fields such as embeddings
- get_node_schema_by_label(label)[source]
Retrieve a specific node schema by its label. :type label:
str:param label: The label of the node schema to retrieve. :rtype:NodeSchema:return: The NodeSchema with the given label. :raises ValueError: If no NodeSchema with the given label is found.
- get_node_search_field(label, calculated_from_prop, search_type)[source]
- Return type:
SearchFieldSchema
- get_relationship_schema(rel_type, start_node_label, end_node_label)[source]
Retrieve a specific relationship schema by its type and start and end node labels. :type rel_type:
str:param rel_type: The type of the relationship to retrieve. :type start_node_label:str:param start_node_label: The label of the start node. :type end_node_label:str:param end_node_label: The label of the end node. :rtype:RelationshipSchema:return: The RelationshipSchema that matches the criteria. :raises ValueError: If no matching RelationshipSchema is found.
- get_relationship_schema_by_type(rel_type)[source]
Retrieve a specific relationship schema by its type. :type rel_type:
str:param rel_type: The type of the relationship to retrieve. :rtype:RelationshipSchema:return: The RelationshipSchema that matches the criteria. :raises ValueError: If no matching RelationshipSchema is found.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- prompt_str(**kwargs)[source]
Generates a JSON-formatted string based on the query model dump.
- Parameters:
**kwargs – Arbitrary keyword arguments used to customize the query model
method. (dump. These arguments are passed to the query_model_dump) –
- Returns:
A JSON-formatted string representation of the query model dump.
- Return type:
- query_model_dump(**kwargs)[source]
Custom model_dump for GraphSchema that ensures nested elements are serialized using their own query_model_dump logic.
Relationships have custom dict method to serialize query patterns in the format: (:startNodeLabel)-[:TYPE]->(:endNodeLabel). This makes it easier for LLMs and humans to interpret.
- Return type:
- query_model_to_yaml(**kwargs)[source]
Serialize the GraphSchema into a YAML string representation. Leverages model_dump to generate the dictionary and converts it to YAML.
- Return type:
- subset(sub_schema)[source]
Generates a subset of the graph schema based on a SubSchema object.
- Return type:
Parameters: subschema: SubSchema
An object encapsulating nodes, patterns, relationships, and a custom description for the subset.
Returns: GraphSchema
A new GraphSchema instance representing the filtered subset of the graph schema.
Raises: ValueError
If all inputs in the SubSchema are None.
- class graph_nd.SubSchema(nodes=None, patterns=None, relationships=None, description=None)[source]
Bases:
object- __init__(nodes=None, patterns=None, relationships=None, description=None)[source]
Encapsulates the information required to subset a graph schema and ensures proper validation and conversion for the provided input data. SubSchema is used in methods like GraphSchema.subset to describe the graph schema filtering criteria.
Parameters:
- nodes: Union[str, List[str]], optional
A node or list of node labels to include in the subset. If provided, the node schemas corresponding to these nodes will be retrieved.
- patterns: Union[Tuple[str, str, str], List[Tuple[str, str, str]]], optional
A pattern or list of patterns defining relationships to filter by. Each pattern is a tuple containing: - Start node label (str) - Relationship type (str) - End node label (str)
The relevant node schemas and relationship schemas will be included in the subset.
- relationships: Union[str, List[str]], optional
A relationship type or list of relationship types to include in the subset. All query patterns for the relationship type (and their start and end nodes) will be included in the subset.
- description: str, optional
A custom description for the subsetted graph schema. If not provided, a default description may be generated based on the existing schema and provided subset criteria.
Raises:
- ValueError
If none of nodes, patterns, or relationships are provided.
- TypeError
If any of the inputs are not of the expected type.