Federal agencies are all trying to figure out the best ways to integrate generative AI into their operations, as large language models (LLMs) enable federal employees to complete internal business processes faster and more efficiently. The military is no exception, which is why computer scientists and data scientists at the Army Corps of Engineers’ (USACE) Engineer Research and Development Center (ERDC) are working to implement these AI capabilities across the military.
“Big language models are really hot right now. And everybody wants them for different reasons. So we have a group of people who collect regulations or guidelines, best practices from within certain communities in our organization and put them into an LLM, optimize that LLM using those regulations, and then are able to ask questions of that LLM to help teach or inform groups of people,” said Cody Salter, a research mechanical engineer at ERDC, on Federal Monthly Outlook — Leveraging Data to Drive Government Innovation. “This is a very common task that we do now for different collaborators and clients, because this technology itself, this LLM, is very fashionable and very well publicized at the moment. So people are really able to engage with it. They understand what it is, what it does and how it can be leveraged in their field, so that’s why we do it a lot.”
LaKenya Walker, a computer scientist at ERDC, said ERDC is working on a number of use cases. On the USACE civilian side, document synthesis is a common request. So is knowledge generation: using generative AI (genAI) to generate insights for knowledge databases to support the training and upskilling of new hires. This is especially important as agencies grapple with the loss of institutional knowledge as much of the federal workforce ages into retirement.
Using AI to complete tasks
Meanwhile, on the military side, Walker said she sees a huge interest in synthesizing information. Military systems are capturing more and more data than human experts can understand and analyze. So the military is turning to AI to help them with that task.
Walker said there are two ways to prepare an LLM to work in a domain as specific as military data. First, they can use the domain-specific data to refine the LLM’s results, but that’s more laborious because all of that data has to be generated. Or second, they can overlay retrieved augmented generation (RAG) on the LLM to query specific knowledge or vector databases.
“So in RAG, what you’re actually doing is taking your source data, integrating that data or vectorizing the data and storing it in a vector database,” Walker said on The federal campaign with Tom Temin. “So instead of your LLM using its training data that is inside its parameters, it looks at your vector database with your domain-specific embeddings to make a decision and return information.”
So if the military wants to, for example, create a predictive model for failure of a specific part of a vehicle, they need to feed an integration model from the maintenance records, which vectorizes the data. Then, when someone queries the LLM, the query first goes to the RAG module, which extracts the relevant vectorized data. It adds that context to your query, and then feeds it to the LLM for output.
Providing upstream capabilities
Walker said genAI’s capabilities also allow users to curate LLM results, so they’re presented in the format they find most useful or understandable. That could mean tailoring the results to a less technical audience, avoiding the use of highly specific jargon. Walker said the military tends to particularly favor results in web-based tools like dashboards.
But ERDC doesn’t often get involved in results or visualizations. Instead, Walker and Salter said, they focus more on providing their internal customers with the tools and infrastructure they need to do it themselves.
“In general, our mission is to provide capabilities at the front end of the data science lifecycle, if you will: the data infrastructure, getting data into a format that’s conducive to analysis, to provide an environment for users to do the analysis and visualizations that their use cases need,” Walker said. “In general, we’re not the ones at the back end of that lifecycle where users are doing the visualizations. We’re enabling users across different organizations to do their own analysis and do their own kind of visualization development.”
One of the challenges they face is that their customers often need multiple types of data to achieve the desired results. This may involve translating outdated code from existing applications into proprietary or analytics-friendly formats. It may also involve reusing older data depending on the use case.
“The data source really depends on the need, especially the analytical need. So when we meet with a client or a collaborator, we always want to start with a very basic understanding of the problem that you’re trying to solve,” Salter said. “And that’s what’s going to then dictate the different data sources that might be needed to answer that problem. And then, more importantly, you might need it, but it might not be available to you. So what’s the need? And then what do you have to answer that question? And I hope those two things align, because if they don’t, there could be problems.”
But working on the front end, on infrastructure and data formats, also allows ERDC to be more versatile. Salter said the skills, tools and methods are transferable between specific fields and disciplines.
“We really think of data problems as being agnostic in nature, as if a data problem is a data problem regardless of the type of domain or application,” he said.
Copyright © 2024 Federal News Network. All rights reserved. This website is not intended for users located in the European Economic Area.