Offensive AI Series: My Experiments with Neo by ProjectDiscovery - AppSec, Pentesting & Beyond [Part 1]

Hi, I am Ashutosh, a security researcher specializing in application security, VAPT, and Purple teaming. Over the past year, AI agents for cybersecurity have exploded in popularity. From open-source tools to premium enterprise solutions, AI-powered security assessment platforms are everywhere, each promising to revolutionize how we approach security work. But do they actually work? Let’s see.

Recently, I got my hands on ProjectDiscovery’s Neo. It is only available to their Enterprise customers, but thanks to ProjectDiscovery team, I was able to review it and share my findings. Apart from listening to Darknet Diaries, I spend my free time testing the latest tools, cybersecurity products on my favorite bug bounty targets. In this series, I will be writing about my experiments with this tool, while asking it to perform source code review, network architecture review and more activities involved in a comprehensive cybersecurity assessment.

Why Not Just Use ChatGPT for security testing?
You would wonder why can’t we just use Claude, Gemini or chatGPT for this kind of security work. I’ll show why. As part of my other research into responsible AI, I attempted to use ChatGPT to create a bruteforcing script.

Despite using a non-real URL (example.com) to avoid sounding like a threat actor, the model flagged the request as brute forcing and triggered its ethical guardrails.

Later when I revealed the actual URL, it figured out that it was an airline’s production system and again refused to help. Of course, eventually I did bypass it and got it creating the script for me, but that’s another story 😉 .

using chatgpt for offensive security work

Back to Neo;

ProjectDiscovery Neo is using Claude’s Opus 4.5 model, which is Anthropic’s latest flagship AI model, released in November 2025, excelling in coding, agentic tasks, and enterprise workflows.

it supports a 200k token context window, equivalent to about 150,000 words of combined input/output in one session, making it sufficient for most small scale security tasks like vulnerability assessments, explaining/creating a report, specific test cases, and pentest reporting.

Throughout my testing, I gave Neo various real-world security tasks to see how it performed. I tested it on appsec assessments, SAST code reviews, thick client testing, recon, and threat modeling. The results were interesting and revealed both the AI system’s strengths and its practical limitations in professional security work.

Neo uses ‘agents’ as shortcuts for repeatable security workflows. According to ProjectDiscovery, “Agents are shortcuts for your repeatable security workflows in Neo. Discover and remix the best from the security community.”

The platform includes pre-built agents for different tasks like threat modeling, finding apex domains, and other security functions. You don’t have to worry about them as Neo can choose agents automatically based on your query. you can also create your own agents and define how they work, what tools it may use, suggest an output format of given tasks, etc. and more.

Here’s the list of available public agents for now which will be helpful in understanding its capabilities.

Agent 1] CompliancePrioritizer

Automates compliance review and prioritization of vulnerabilities across infrastructure and code backlogs. Identifies high-impact, compliance-relevant issues while suppressing noise using policy-driven logic, EPSS/KEV/CVSS enrichment, and machine reasoning. Supports NIST 800-53, ISO 27001, SOC2, PCI-DSS, and HIPAA frameworks.

Agent 2] Developer-Friendly Threat Modeling Specialist
Elite threat modeling specialist that analyzes applications, repositories, PRs, features, and planned feature docs to produce simple, actionable threat models. Focuses on risk identification and assessment (What are we building? What can go wrong? How bad could it be?) rather than prescribing fixes. Empowers development teams to make informed mitigation decisions based on their context.

Agent 3] Internal Network & SSH Pentesting Specialist

Elite SSH-based internal network penetration testing agent specializing in system reconnaissance, Docker security, API testing, credential harvesting, and comprehensive security assessments

Agent 4] Decompilation & Recon Specialist
Expert agent for decompiling applications (APK, JAR, binaries), extracting endpoints and logic from JS files, scanning for hardcoded secrets, and generating comprehensive security reports in /workspace.

Agent 5] Security Code Analyzer
Elite security code analysis specialist performing context-aware SAST, threat modeling, and exploitability assessment to identify truly exploitable vulnerabilities with minimal false positives

Agent 6] LLM Security Auditing Agent
Elite LLM security specialist for comprehensive testing of AI-powered web chat applications against OWASP LLM Top 10 vulnerabilities, prompt injection, jailbreaking, RAG attacks, and data exfiltration

Neo also provides a linux terminal and files module to work with the output files, create custom scripts, and perform various tasks with your output.

Enough with the theory. Let’s see how Neo actually performs.

1] Reconnaissance

Reconnaissance is all about discovering assets: domains, subdomains, URLs, servers, third-party services, and anything else tied to the target organization. It’s a critical phase in red team operations, external pentests, and attack surface management engagements.

Recon is essentially an endless process. The more techniques you apply, the more resources you’ll uncover. Neo provides a pre-built template for performing reconnaissance on targets, which you can modify based on your specific requirements. By default, it saves the output as text files for easy integration into your workflow.

Neo offers 2 modes, 1 is agent mode (the interesting one), and the other is chat mode.

Agent mode – As shown in the above screenshot, agent mode actually interacts with the target applications /servers, it can run scans, execute commands, and more

Chat mode – In chat mode, it only provides information and does not interact with the target

I asked Neo to find subdomains of tesla.com and teslamotors.com.

Here’s the output. It found 698 unique subdomains for teslamotors.com and 809 unique subdomains for tesla.com, which could be more if I setup API keys for third party services(shodan, virustotal, binaryedge etc.) used in subfinder

2] Testing specific modules of a web application

I was working on a crypto exchange’s application which used GraphQL. I gave a specific HTTP request to Neo, and perform the security testing on it.

Now here’s the interesting part, Neo will not only suggest some potential test cases, it will actually perform them, observe the results and come up with its observations.

But you still need someone to verify the results. At one point it observed a 200 response and concluded that it was a vulnerability, which is not the case.

3] Blackbox web application security testing

So for blackbox Appsec, I tried Neo on tryhackme room Bricks Heist (Easy level) .

Based on the prompt below, Neo decided list of tasks to be accomplished for achieving RCE on this target.

Neo installed ovpn, worked with the config file and added the url in etc/hosts file and verified the connection without any other inputs or questions..

Then it went for recon and service discovery..

After confirming the wordpress theme version, It searched for exploits on GitHub, found multiple options, and selected one to execute. I was wondering if the choice was based on GitHub stars, and I was right, it picked the most starred exploit.

Using that public exploit, Neo confirmed the RCE and then found flags, DB credentials and more!

Neo correctly understood all the tasks it needed to perform, created scripts for executing each tasks for gaining Remote code execution on the target server and then provided summary for all the tasks within ~12 minutes.

While the room required solving other questions as well, but for this research my goal was limited to only perform remote code execution on the target application.

At first I doubted this as AI slop, but Neo also generated a python script that provided steps to reproduce the RCE in any system. The script provided output files (evidences) that can be used for reporting.

UltraTech room from Tryhackme: Medium difficulty

Next I tried with a medium level difficulty machine on tryhackme.

Same as earlier, it went for port scan after verifying the connection with target machine.

It tried with Nmap, but found some issues, it troubleshooted the error and used naabu for port scanning and found some open ports.

Then it used katana and explored the applications found on open ports, and found a command injection vulnerability

Then it confirmed the command injection vulnerability.

Using the command injection issue, it explored the file system of target server and found a db file named as ‘utech.db.sqlite’. in which it found a users ‘admin’ , ‘r00t’ and their hashed passwords.

It It used crackstation, a popular online service to check cleartext passwords from password hashes. But could not use it.

Then it tried hashcat and john the ripper, popular tools used for hash cracking

Then it tried with a larger wordlist, still failed.

After some trial and error, it got creative, it found a writeup for this tryhackme room, but did’t refer it, as the writeup mentioned using rockyou.txt for hash cracking but Neo used an online service for cracking them, probably because Brute-forcing with rockyou.txt would generate a lot of output and consume context window space with failed attempts. An online service gives a cleaner, more direct result.

It then tried a different online service to crack those MD5 hashes! something similar we do during CTFs.

It again used command injection to explore more attack paths in the target system and eventually found that r00t was part of docker group. In my opinion, in a real-world scenario allowing an AI tool to execute code on a client system can be risky, but we can instruct Neo to avoid some activities.

and provided a summary after completing the assessment

There are so many things you can do with such tools. In part 2, I will be asking Neo to perform more activities as part of compehensive cybersecurity assessment such as Secure code review, Network architecture review, adversary simulation and more.

Final Thoughts

Overall, Neo performed beyond expectations. ProjectDiscovery has a distinct advantage here since Neo has native integration with their security tools like httpx, naabu, nuclei, and others. I tested Neo on single targets and small-scale security assessments, where it performed exceptionally well, though it occasionally got stuck during execution. It attempted task completion through multiple methods and dynamically changed its strategy, tooling, and execution flow upon encountering failures.

Real-world engagements often involve hundreds of servers and applications. Neo or other AI tools may struggle with 200+ servers or may require excessive resources, especially if the target server is running numerous services, as it could exhaust its context window.

Neo can significantly accelerate security workflows by automating repetitive tasks that would otherwise take hours of manual effort. However, human expertise is still essential when deeper context about the target application is required. For example, while testing a KYC module of an application that verified users via webcam, I bypassed the verification by simply presenting a static photo to the camera, which the system accepted as valid. The developers later introduced liveness checks to address this issue. This type of contextual vulnerability often requires understanding the business logic and intended security controls, something that automated tools may miss.

Throughout my career in security consulting, I’ve often been asked if automation can replace security teams. The answer is NO, and it will remain the same at least for the next few years. AI agents are excellent assistants that enable security professionals to focus on complex analysis and decision-making. However, experienced humans are still essential to interpret results, understand business context, scale assessments across large infrastructures, and discover novel vulnerabilities that require creative thinking and deep application knowledge.

Let me know what else should I try with Neo. If you found this writeup helpful or have thoughts to share, feel free to connect with me on LinkedIn or X.

Read my other articles:

Offensive AI Series: My Experiments with Neo by ProjectDiscovery – AppSec, Pentesting & Beyond [Part 1]