

The system of claim 1, wherein the processor is configured to automatically generate the signature for the PDF file based at least in part on selecting a plurality of patterns exceeding a suspicious threshold to automatically generate the signature using the plurality of patterns. The system of claim 1, wherein the processor is configured to automatically generate the signature for the PDF file based at least in part on at least a subset of portion(s) of a script within the PDF file that was determined to be malicious. The system of claim 1, wherein the processor is further configured to de-obfuscate the PDF file.
MALICIOUS PDF ATTRIBUTES DOWNLOAD
The system of claim 4, wherein the one or more portions of the extracted script stream data that are potentially malicious include one or more of the following: an iFrame that includes an associated Uniform Resource Locator (URL) associated with a blacklisted domain and an iFrame that includes an associated URL associated with a webpage configured to download an. The system of claim 1, wherein determining whether to generate the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data includes: determining one or more portions of the extracted script stream data that are potentially malicious assigning one or more numeric values corresponding to the one or more portions of the extracted script stream data that are potentially malicious, wherein the one or more numeric values are determined based on heuristics aggregating the one or more numeric values into an aggregate numeric value anddetermining whether the aggregate numeric value exceeds a threshold numeric value: in the event that the aggregate numeric value exceeds the threshold numeric value, determining to generate the signature based at least in part on the at least portion of the extracted script stream data andin the event that the aggregate numeric value is equal to or less than the threshold numeric value, determining not to generate the signature based at least in part on the at least portion of the extracted script stream data. The system of claim 1, wherein the processor is further configured to traverse through one or more objects within the PDF file to find an object associated with JavaScript data. The system of claim 1, wherein the processor is further configured to determine which objects, if any, within the PDF file includes JavaScript data. A system, comprising: a processor configured to: parse a PDF file to extract script stream data embedded in the PDF file, wherein the PDF file is known to include malicious content anddetermine whether to generate a signature associated with the PDF file based at least in part on at least a portion of the extracted script stream data:in the event that the signature associated with the PDF file is determined to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file based at least in part on the at least portion of the extracted script stream data, wherein the signature is configured to be matched against a potentially malicious PDF file andin the event that the signature associated with the PDF file is determined not to be based at least in part on the at least portion of the extracted script stream data, automatically generate the signature associated with the PDF file from an identified cross-reference table from a plurality of cross-reference tables within the PDF file, wherein the identified cross-reference table is identified from the plurality of cross-reference tables based at least in part on a position of the identified cross-reference table relative to respective positions associated with one or more cross-reference tables other than the identified cross-reference table from the plurality of cross-reference tables anda memory coupled to the processor and configured to provide the processor with instructions.
