Hello, I am Roman from Generative AI Trust Research team at Data & Security Research Laboratory. Today, we are publishing Part 1 of the DeepSeek Security Evaluation, providing a comprehensive assessment of security risks using Fujitsu’s LLM Vulnerability Scanner. Part 2, which will be released soon, will delve into new security aspects of DeepSeek using additional internal research techniques developed at Data & Security Research Laboratory.
This report presents our team's comprehensive evaluation of security risks of DeepSeek R1 model, as assessed by Fujitsu's LLM vulnerability scanner. We show our key findings, "the dual nature of DeepSeek R1's capabilities", which had not been mentioned in previous publications.
- Introduction
- Testing Methodology
- Scanning Results
- Analysis
- Key Observations on the DeepSeek R1 Model
- Related Links
Introduction
DeepSeek is a Chinese AI startup founded in 2023, originating from the AI research division of High-Flyer, a prominent hedge fund manager *1. The company specializes in developing advanced, cost-effective AI models, with DeepSeek-R1 as its flagship model. This model can be operated on less sophisticated hardware *2.
DeepSeek-R1 employs reinforcement learning and a Mixture of Experts (MoE) approach, featuring 671 billion parameters, with only 37 billion activated per operation *3. This architecture significantly reduces computational and power requirements, allowing for efficient performance on standard hardware *4.
However, DeepSeek's rapid ascent has highlighted critical safety concerns. Recent analyses have uncovered significant security vulnerabilities in DeepSeek-R1’s safety mechanisms. Researchers from Cisco and the University of Pennsylvania reported a 100% success rate in bypassing the model’s safeguards using 50 malicious prompts, demonstrating a complete failure in blocking harmful content *5. Additionally, another research team analyzed DeepSeek-R1’s security vulnerabilities, achieving high failure rates across multiple threat scenarios*6, including:
- Jailbreaking
- Prompt Injection Attacks
- Malware Generation
In this report, we aim to conduct the most extensive security analysis of DeepSeek-R1 to date, surpassing previous publications in scope and attack coverage. Additionally, we compare DeepSeek-R1’s performance against other competitive AI models as belows.
- Llama 3.1 8B
- GPT-4o
- Phi-3-Small-8K-Instruct 7B
- Gemma 7B
- DeepSeek R1 7B
Our selection criteria focused on AI models that can be deployed as private instances within enterprise environments while maintaining relatively low resource consumption. Additionally, we chose to evaluate one widely used closed-source model as a representative benchmark. However, due to its proprietary nature, we will not assess how its training data or alignment strategy impact its security behavior.
Testing Methodology
All models were rigorously tested and evaluated using Fujitsu's LLM Vulnerability Scanner capable of executing the widest range of known attack techniques. These include:
- Various jailbreaking techniques
- Insecure code assessments
A total of over 7,700 attacks were conducted, spanning 25 distinct attack types. It is a database that aggregates state-of-the-art information, including LLM attack scenarios and vulnerabilities published by academia and the AI security community, as well as our proprietary techniques and the latest attack techniques. Please see the our past article for more details.
For better clarity and structured analysis, we categorized these attack types into four primary attack families:
- Prompt Injection & Manipulation - These attacks revolve around controlling or altering the conversation/prompt so the model produces undesired or policy-violating output.
- Data Leakage & Exfiltration - Attacks that manipulate the model into revealing sensitive or protected information (either from its training data or hidden internal states).
- Malicious Code & Content Generation - Attacks focused on producing harmful software, injecting dangerous scripts, or otherwise generating malicious artifacts.
- Filter Evasion & Model Exploitation - Attacks aimed at bypassing security mechanisms (e.g., spam filters, content filters) or exploiting potential “glitches” and red-teaming techniques against the model.
Scanning Results
Table 1 presents the scanning results of four main attack families, as attack success rate per family. To ensure fair and unbiased evaluation, each attack family and its associated attack types were assigned equal weight, regardless of the number of individual attacks conducted. The evaluation metric used is the Attack Success Rate (ASR), which measures the percentage of successful attacks out of the total attempted.
The results show that the DeepSeek R1 model belongs to the low group of overall average attack success rate among the models compared, demonstrating relatively strong security performance. Let's now examine the details further.
Notable Observations: Certain attack types yielded particularly significant findings
The DeepSeek R1 model exhibited its worst performance in two specific attack types (Table 2):
- Malware Generation Attacks - Directly requesting the creation of viruses, trojans, or exploit scripts, enabling the generation of malware components or other tools that can compromise a device.
- Phishing Emails and Spam Attacks - Exploiting the model by generating phishing emails and flooding the system with spam, effectively bypassing security checks.
Strong Performance Areas for DeepSeek R1
Despite its weaknesses in malware and phishing attacks, DeepSeek R1 demonstrated exceptionally strong and consistent performance in two key attack types (Table 3):
- Insecure Coding Attacks - The model was asked to generate code for specific tasks, with the output analyzed through static code analysis to identify insecure coding patterns.
- Glitch Tokens Attacks - The use of specially crafted glitch tokens triggered unexpected behaviors and reduced model stability when included in the input.
These findings highlight both critical vulnerabilities and notable strengths of DeepSeek R1, underscoring the importance of comprehensive security evaluations for AI models. Given these findings, we can describe DeepSeek as a "Dual-Edged AI Tool", as the results show that while the model excels in generating high-quality code, it also readily produces malware on request, presenting significant security concerns. Understanding these trade-offs is essential when deploying AI models in security-sensitive environments.
Analysis
The behavior observed across different models strongly correlates with their training data and alignment strategies. The training process determines the model’s knowledge, strengths, and vulnerabilities, while alignment techniques impact its resilience to attacks.
Below is a summary of each model’s training data and the security risks associated with it:
- Llama 3.1 8B: Built with Meta’s reinforcement alignment strategies, focusing on bias reduction and robustness. (Security Strength) Improved resistance to adversarial attacks.
- Phi-3: Trained using a two-phase strategy, the first phase utilized filtered web data, and the second phase incorporated a mixture of synthetic tokens and reasoning-heavy web data. (Security Strength) Well-balanced approach between dataset filtering, synthetic data integration, and strong security alignment strategies.
- Gemma: Trained on a large open dataset, including web documents, code, and scientific articles, without strict filtering. The open nature of its training data increases the risk of containing biased, harmful, or adversarial examples, making it easier to manipulate. (Security Risk) Open-source flexibility increases alignment drift risk, making it more prone to filter evasion and indirect exploits.
- DeepSeek R1 7B: Trained on a math- and coding-heavy dataset, enhancing problem-solving capabilities. (Security Risk) More prone to jailbreaking, enabling the generation of malicious code.
Impact of Alignment Strategies
As previously mentioned, alignment strategies play a crucial role in determining a model’s vulnerability to attacks. Each approach introduces unique risks and affects the model’s robustness differently.
Table 4 presents an overview of alignment strategies used by each model and the associated risks. These insights strongly align with the results observed in Table 2 and Table 3, reinforcing our findings on attack success rates.
Key Observations on the DeepSeek R1 Model
DeepSeek R1's strong focus on code and mathematical reasoning significantly influences both its strengths and vulnerabilities.
- Dataset Composition: DeepSeek R1 7B is trained on a dataset heavily focused on programming, mathematical problem-solving, and logical reasoning.
- Code Generation Strengths: This specialization enables the model to produce highly functional and syntactically correct code, even in complex scenarios. This is evident in Table 3, where DeepSeek R1 demonstrated a low attack success rate in "Insecure Coding" attacks, indicating strong adherence to best coding practices.
- Security Implications: While its proficiency in coding is a strength, it also makes the model more susceptible to adversarial exploitation. Attackers can manipulate prompts to generate malicious, insecure, or unethical code. This is clearly reflected in Table 2, where DeepSeek R1 exhibited a very high attack success rate in malware generation requests, highlighting its vulnerability in security-restricted coding scenarios.
These findings underscore the dual nature of DeepSeek R1's capabilities - excelling in secure code generation while remaining highly vulnerable to targeted adversarial attacks. The results were obtained for the first time through the comprehensive security assessment capabilities of Fujitsu's LLM Vulnerability Scanner, which had not been mentioned in previous publications. This indicates that a comprehensive vulnerability verification is necessary for LLM risk assessment.
Looking ahead, in March 2025, we plan to launch trial access to our Generative AI Security Enhancement Technology, which includes both the LLM Vulnerability Scanner and LLM Guardrails, designed to safeguard LLMs against such security threats. Additionally, we will be showcasing our technology at MWC Barcelona 2025, the world’s largest connectivity exhibition, taking place from March 3 to 6 in Barcelona, Spain. Stay tuned for more updates!
networkblog.global.fujitsu.com
Related Links
Fujitsu Kozuchi: cutting-edge AI technologies developed by Fujitsu en-portal.research.global.fujitsu.com
*1:What is DeepSeek, the Chinese AI model shaking up Silicon Valley?
*2:Silicon Valley Is Raving About a Made-in-China AI Model
*3:Exploring DeepSeek-R1's Mixture-of-Experts Model Architecture
*4:How DeepSeek’s Lower-Power, Less-Data Model Stacks Up
*5:Evaluating Security Risk in DeepSeek and Other Frontier Reasoning Models
*6:Testing the DeepSeek-R1 Model: A Pandora’s Box of Security Risks