A team of researchers with the Cornell University Tech team have uncovered a new type of backdoor attack that they showed can “manipulate natural-language modeling systems to produce incorrect outputs and evade any known defense.”
The Cornell Tech team said they believe the attacks would be able to compromise algorithmic trading, email accounts and more. The research was supported with a Google Faculty Research Award as well as backing from the NSF and the Schmidt Futures program.
According to a study released on Thursday, the backdoor can manipulate natural-language modeling systems without “any access to the original code or model by uploading malicious code to open-source sites that are frequently used by many companies and programmers.”
The researchers named the attacks “code poisoning” during a presentation at the USENIX Security conference on Thursday.
The attack would give people or companies enormous power over modifying a wide range of things including movie reviews or even an investment bank’s machine learning model so it ignores news that would have an effect on a company’s stock.
“The attack is blind: the attacker does not need to observe the execution of his code, nor the weights of the backdoored model during or after training. The attack synthesizes poisoning inputs ‘on the fly,’ as the model is training, and uses multi-objective optimization to achieve high accuracy simultaneously on the main and backdoor tasks,” the report said.
“We showed how this attack can be used to inject single-pixel and physical backdoors into ImageNet models, backdoors that switch the model to a covert functionality, and backdoors that do not require the attacker to modify the input at inference time. We then demonstrated that code-poisoning attacks can evade any known defense, and proposed a new defense based on detecting deviations from the model’s trusted computational graph.”
Eugene Bagdasaryan — a computer science PhD candidate at Cornell Tech and lead author of the new paper alongside professor Vitaly Shmatikov — explained that many companies and programmers use models and codes from open-source sites on the internet and this research proves how important it is to review and verify materials before integrating them into any systems.
“If hackers are able to implement code poisoning, they could manipulate models that automate supply chains and propaganda, as well as resume-screening and toxic comment deletion,” Bagdasaryan said.
Shmatikov added that with previous attacks, the hacker must access the model or data during training or deployment, which requires penetrating the victim’s machine learning infrastructure.
“With this new attack, the attack can be done in advance, before the model even exists or before the data is even collected — and a single attack can actually target multiple victims,” Shmatikov said.
The paper does an in-depth investigation into the attack methods for “injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code.”
Using a sentiment analysis model, the team was able to replicate how the attack would work on something like always classifying as positive any reviews for movies made by Ed Wood.
“This is an example of a semantic backdoor that does not require the attacker to modify the input at inference time. The backdoor is triggered by unmodified reviews written by anyone, as long as they mention the attacker-chosen name,” the paper found.
“Machine learning pipelines include code from open-source and proprietary repositories, managed via build and integration tools. Code management platforms are known vectors for malicious code injection, enabling attackers to directly modify source and binary code.”
The study notes that popular ML repositories, which have thousands of forks, “are accompanied only by rudimentary tests (such as testing the shape of the output).”
To defend against the attack, the researchers suggested a system that could detect deviations from the model’s original code.
But Shmatikov said that because of how popular AI and machine learning technologies have become, many non-expert users are building their models using code they barely understand.
“We’ve shown that this can have devastating security consequences,” Shmatikov said.
He added that more work will need to be done on how the attack could be used to automate propaganda and other damaging efforts.
The goal of the effort is to now create a defense system that will be able to “eliminate this entire class of attacks and make AI/ML safe even for non-expert users,” Shmatikov said.