Retrieval on Verilog Repositories: A Knowledge-Graph Based Solution
Abstract
We present a retrieval system for answering questions about Verilog / System Ver-ilog code bases. Standard vector RAG (retrieval augmented generation) often failson hardware description languages due to identifier renaming, coding-style vari-ation, hierarchy, and concurrency. We instead construct knowledge graphs overthe code and its LLM-generated explanations and retrieve based on the entitiesand relations. We achieve this by adapting the GraphRAG package, originallyintended for natural language, to our specific code use-case. We compare (i)standard semantic retrieval on the explanations, (ii) GraphRAG over code and(iii) GraphRAG over the explanations. On a corpus of ∼3.5K files and a bench-mark of 29 questions, using top-1 file-level recall, the first baseline reaches 31%.GraphRAG consistently outperforms it, achieving 55–59% when utilizing the ex-planations, and up to 79% when considering retrieved equivalent files. Construct-ing the graph with GPT-4o-mini worked well without requiring the larger GPT-4o, but GPT-4o was required for answering the queries better. Our results indicatethat the suggested graph-based approach could be useful for answering questionsof hardware designers on the code base.