Home Search Center Intelligent Model Selection IP Encyclopedia

What Is Data Loss Prevention (DLP)?

DLP is a solution comprised of various technologies and is designed to identify and classify content in data, which can exist in many different forms, such as emails, files, packets, applications, and data stores. The data can be detected, regardless of whether it is at rest, in use, or in transit. DLP can also provide operations such as logging, marking, encryption, permission control, and blocking for the detection and use of sensitive information based on specified policies.
Data loss and data leakage are related, so data loss prevention is also described as data leakage prevention. For example, if an unauthorized person obtains and opens a lost storage medium that contains sensitive information, the data loss event is converted into a data leakage event. It is also possible that sensitive information is leaked even if the source data is not lost. For example, a document containing sensitive information is sent by email.

Why Data Loss Prevention?

With the rapid development of information technologies, the Internet and computers have become essential for office work, communication, and collaboration. However, although information systems enhance work efficiency, they also pose higher requirements on data storage and transmission security.

  • More data needs to be protected. Personal information (such as accounts, phone numbers, and addresses), intellectual property rights (such as product design and R&D drawings), and business secrets (such as budgets, plans, and payrolls) all need to be protected. Any data leakage may cause immeasurable losses to enterprises.
  • Laws and regulations are stricter. Laws such as GDPR and HIPAA require enterprises to protect data assets, preventing sensitive information from being leaked. Data leakage events expose enterprises to legal risks.
  • Data may be stolen by people other than hackers. Data leaks caused by intentional or unintentional behaviors of internal employees are increasing. For example, emails are sent by mistake, USB flash drives are lost, and confidential files are printed.
  • Traditional security solutions cannot adequately protect against data leakage. Traditional security solutions use security devices, such as traditional firewalls, IPS devices, and storage encryption devices, to provide passive and cage-like protection by employing methods such as restricting data access or encrypting data on the entire network. These solutions hamper the flow of data, cannot control data being leaked by authorized users, and cannot identify sensitive information to classify, control, and audit enterprise data.

How Data Loss Prevention Works?

DLP identifies sensitive information and protects it according to predefined rules. So how does it work?

Data Obtaining

The prerequisite for identifying information is to obtain data, regardless of where the data is stored, copied, or transmitted. Typically, DLP obtains enterprise intranet data in the following ways:

  • Scans terminals and server storage devices.
  • Monitors network protocol links and restores transmitted files.
  • Monitors applications and drivers, and extracts transmitted and used data.

In-depth Content Identification

Not all data needs to be protected. DLP protects only data that contains sensitive information. After obtaining data, DLP must perform in-depth content identification and detection to determine whether it contains sensitive information, regardless of the form in which the data exists, such as email, PDF, Word, or PPT.

Content identification and detection technologies are classified as common detection or advanced (key) detection technologies. Common detection technologies include regular expression detection, keyword detection, and document attribute detection, whereas advanced detection technologies include Indexed Document Matching (IDM), Exact Data Matching (EDM), and computer vision technology.

The following briefly describes the three advanced detection technologies:

  • IDM

    Matches the content or documents to be detected with unstructured sample documents (such as Word, PPT, PDF, and various source program files) to identify similarities, and determines whether the content or documents are from the sample document database. This technology generates a fingerprint database from sample documents, and then extracts fingerprints from the documents or content to be detected. It matches the obtained fingerprints against the fingerprint database to identify similarities.

  • EDM

    Exactly matches the documents or content to be detected with the structured data source tables (such as Excel and database tables) to determine whether the documents or content is extracted from the data source tables. A customer generates a fingerprint database for a specific column in a data source table to check whether the target content (regardless of the file format) matches the content in the specific column.

  • Computer vision technology

    Extracts the contour features of the image to be detected and performs similarity matching against the stored sample image features. This is performed to determine whether the image is from the sample image database. It uses image processing technology to extract the contour features of the image and performs vector coding for the features, and uses similarity matching technology to perform matching against the feature database. Feature matching is applicable even if zooming, partial cropping, watermarking, or brightness change is performed on the image.

Data Control

After sensitive information is identified, the files that carry such information must be controlled based on given policies. Control behaviors include but are not limited to:

  • Data encryption: encrypts sensitive data to prevent exposure of such data, even if data leakage occurs.
  • Permission identification and control: Both data and users have permission attributes. Only users with the required permission can access the corresponding data. In addition, data is automatically decrypted during access, without adversely affecting user experience.
  • Blocking of illegal behaviors: blocks or reports alarms for behaviors such as illegal sending, copying, and printing.
  • Visualized data display: displays the distribution of sensitive data on the entire network in a visualized manner.
  • Data audit: provides detailed logs and reports for the entire lifecycle (including generation, transfer, use, and destruction) of sensitive information.

What Type of Data Loss Prevention Do I Need?

Mainstream DLP products have the following three forms, which correspond to different data leakage scenarios. You can select one or more DLP products as prompted in the following figure to provide data leakage prevention for enterprises.

Table 1-1 Comparison between terminal DLP, network DLP, and storage DLP
  

Terminal DLP

Network DLP

Storage DLP

Service requirements

Terminal sensitive information discovery and protection, leakage prevention for outgoing channels, spreading prevention for internal sensitive information, watermark and leakage tracing, and leakage prevention for service system data

Data leakage audit, internal control, and compliance audit during transmission, as well as sensitive information leakage prevention

Non-compliant storage audit, sensitive data asset distribution statistics, and sensitive file backup & isolation

Key technologies

Sensitive data discovery, instant messaging audit, digital watermark, and screen recording & monitoring

Natural semantic classification, file fingerprint matching, image fingerprint matching, intelligent document feature word extraction, accurate table fingerprint matching, file attribute detection, image recognition, data identifier, and content recovery

Shared directory data discovery, nested data discovery, file type filtering, and offline file scanning

Deployment modes

Agent: deployed on a terminal

Server: single-server deployment, dual-server hot standby deployment, hierarchical deployment, and modular distributed deployment

Off-line deployment at the network egress

Off-line deployment at storage devices

Data Loss Prevention Best Practices

DLP is an overall solution rather than an independent product. Note the following for DLP deployment:

  • Check whether DLP deployment is necessary.

    Check whether enterprise services involve sensitive personal data and if they are restricted by laws such as GDPR and HIPAA. Also check whether the intellectual property rights or business decisions of an enterprise are coveted by many competitors. Understanding the importance of data to be protected can help you decide whether DLP needs to be deployed and gain better support from business departments.

  • Distinguish data priorities.

    Not all data is of equal importance. Different organizations may have different definitions for key data. If data is leaked, which data will cause the most serious problems? What types of data are attackers most interested in? Protection should begin with these types of data.

  • Identify data risks.

    Understand the process of storing, using, and transferring key data, and determine the situations where data leakage may occur. Is data leakage caused by external intrusion or internal leakage?

  • Select a solution.

    Select an appropriate DLP solution based on factors, such as the type of key data to be protected, data form and structure, enterprise IT process, terminal type, and even budget.

  • Communicate and develop control policies.

    Conduct in-depth communication with data users (business departments) and specify their roles and responsibilities. Develop different control policies for different types of data and roles.

  • Train employees.

    DLP will change the data usage process of employees to some extent. It is therefore critical to train employees before and after DLP is deployed. In addition to DLP, people awareness is also a vital part of data security.

  • Audit data and evaluate the effect.

    After DLP is deployed, you can observe the distribution, transfer, and usage of key data in an enterprise. Handle problems and risks promptly to avoid losses caused by data leakage.

  • Adjust policies.

    DLP is not a "set and forget" deployment. As services change and IT processes evolve, enterprises' definitions for key data may change and the data forms and transfer processes change accordingly. In addition, long-term data audit can help administrators discover data security risks. To deal with the risks, administrators need to adjust DLP policies in real time.

About This Topic
  • Author: Wang Haoda
  • Updated on: 2021-09-30
  • Views: 7662
  • Average rating:
Share link to