Privacy Preserving Text Document Summarization
by A N Ramya Shree * , Kiran P
CSE Department, RNS Institute of Technology, Bengaluru,560098, India
* Author to whom correspondence should be addressed.
Journal of Engineering Research and Sciences, Volume 1, Issue 7, Page # 7-14, 2022; DOI: 10.55708/js0107002
Keywords: PHI, PPDP, Generalization, Sanitization
Received: 17 February 2022, Revised: 11 May 2022, Accepted: 23 June 2022, Published Online: 18 July 2022
APA Style
Shree, A. N. R., & P, K. (2022). Privacy Preserving Text Document Summarization. Journal of Engineering Research and Sciences, 1(7), 7–14. https://doi.org/10.55708/js0107002
Chicago/Turabian Style
Shree, A N Ramya, and Kiran P. “Privacy Preserving Text Document Summarization.” Journal of Engineering Research and Sciences 1, no. 7 (July 1, 2022): 7–14. https://doi.org/10.55708/js0107002.
IEEE Style
A. N. R. Shree and K. P, “Privacy Preserving Text Document Summarization,” Journal of Engineering Research and Sciences, vol. 1, no. 7, pp. 7–14, Jul. 2022, doi: 10.55708/js0107002.
Data Anonymization provides privacy preservation of the data such that input data containing sensitive information is converted into anonymized data. Hence, nobody can identify the information either directly or indirectly. During the analysis of each text document, the unique attributes reveal the identity of an entity and its private data. The proposed system preserves the sensitive data related to an entity available in text documents by anonymizing the sensitive documents either entirely or partially based on the sensitivity context which is very specific to a domain. The documents are categorized based on sensitivity context as sensitive and not-sensitive documents and further, these documents are subjected to Summarization. The proposed Privacy Preserving Text Document Summarization generates crisp privacy preserved summary of the input text document which consists of the most relevant domain-specific information related to the text document without defying an entity privacy constraints with the compression rate of 11%, the precision of 86.32%, and the recall of 84.28%.
- K. P. Ramya Shree A N, RNSIT, “Privacy preserving data mining on unstructured data,” International Conference on Science, Technology, Engineering and Management (ICSTEM’17), vol. 2, no. 2, 2017.
- A. N. R. Shree, P. Kiran, “Sensitivity Context Aware Privacy Preserving Text Document Summarization,” Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology, ICECA 2020, pp. 1517–1523, 2020, doi:10.1109/ICECA49313.2020.9297415.
- K. P. Ramya Shree A N, “Privacy Preserving Unstructured Data Publishing (PPUDP) Approach for Big Data,” International Journal of Computer Applications, vol. 178, no. 28, pp. 4–9, 2019, doi:10.5120/ijca2019919091.
- A. N. R. Shree, P. Kiran, “Quasi Attribute Utility Enhancement ( QAUE ) – A Hybrid Method for PPDP,” International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, vol. 9, no. 2S, pp. 330–335, 2019, doi:10.35940/ijitee.B1087.1292S19.
- F. Dernoncourt et al., “De-identification of patient notes with recurrent neural networks,” Journal of the American Medical Informatics Association, vol. 24, no. 3, pp. 596–606, 2017, doi:10.1093/jamia/ocw156.
- V. T. Chakaravarthy et al., “Efficient techniques for document sanitization,” International Conference on Information and Knowledge Management, Proceedings, pp. 843–852, 2008, doi:10.1145/1458082.1458194.
- Bugra Gedik and Ling Liu, “A Customizable k-Anonymity Model for Protecting Location Privacy,” (Springer, 2004), 620–629, doi:https://doi.org/10.1007/978-981-16-9012-9_49.
- K. LeFevre, D. J. DeWitt, R. Ramakrishnan, “Incognito: Efficient full-domain K-anonymity,” Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 49–60, 2005, doi:10.1145/1066157.1066164.
- T. Christensen, A. Grimsmo, “Instant availability of patient records, but diminished availability of patient information: A multi-method study of GP’s use of electronic patient records,” BMC Medical Informatics and Decision Making, vol. 8, pp. 1–8, 2008, doi:10.1186/1472-6947-8-12.
- R. S. K, “A New Efficient Cloud Model for Data Intensive Application,” Global Journal of Computer Science and Technology, vol.15,no.1,pp.19–30,2015, doi:https://computerresearch.org/index.php/computer/article/view/1135.
- A. N. Ramya Shree, P. Kiran, S. Chhibber, “Sensitivity Context-Aware PrivacyPreserving Sentiment Analysis,” Smart Innovation, Systems and Technologies, vol. 213 SIST, pp. 407–416, 2021, doi:10.1007/978-981-33-4443-3_39.
- A. Majeed, S. O. Hwang, “A Comprehensive Analysis of Privacy Protection Techniques Developed for COVID-19 Pandemic,” IEEE Access, vol. 9, pp. 164159–164187, 2021, doi:10.1109/ACCESS.2021.3130610.
- E. K. Lee, K. Uppal, “CERC: an interactive content extraction, recognition, and construction tool for clinical and biomedical text,” BMC Medical Informatics and Decision Making, vol. 20, no. Suppl 14, pp. 1–14, 2020, doi:10.1186/s12911-020-01330-8.
- M. R. Naqvi et al., “Importance of Big Data in Precision and Personalized Medicine,” HORA 2020 – 2nd International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Proceedings, pp. 2–7, 2020, doi:10.1109/HORA49412.2020.9152842.
- N. K. Anuar, M. Uniten R&D Sdn. Bhd., Kajang, Selangor, ; Asmidar Abu Bakar; Aishah Abu Bakar, “No Title,” 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), no. 6, pp. 1048–1052, 2021, doi:https://doi.org/10.1109/ICSIP52628.2021.9688624.
- C. C. Aggarwal, P. S. Yu, “A General Survey of Privacy-Preserving Data Mining Models and Algorithms,” pp. 11–52, 2008, doi:10.1007/978-0-387-70992-5_2.
- A. N. R. Shree, P. Kiran, “Sensitivity Context Awareness based Privacy Preserving Recommender System,” SSRN Electronic Journal, no. Icicc, pp. 1–5, 2021, doi:10.2139/ssrn.3835011.
- Kiran P A N Ramya Shree, “SCAA—Sensitivity Context Aware Anonymization—An Automated Hybrid PPUDP Technique for Big Data,” in Sustainable Advanced Computing, ed S.K Aurelia, S., Hiremath, S.S., Subramanian, K., Biswas (Singapore: Springer Nature, 2022), 615–626, doi:https://doi.org/10.1007/978-981-16-9012-9_49.
- B. B. Mehta, U. P. Rao, “Privacy Preserving Unstructured Big Data Analytics: Issues and Challenges,” Physics Procedia, vol. 78, pp. 120–124, 2016, doi:10.1016/j.procs.2016.02.020.
- A. El Haddadi et al., “Mining unstructured data for a competitive intelligence system XEW,” SIIE 2015 – 6th International Conference on “Information Systems and Economic Intelligence,” pp. 146–149, 2015, doi:10.1109/ISEI.2015.7358737.
- A. Bafna, J. Wiens, “Automated feature learning: Mining unstructured data for useful abstractions,” Proceedings – IEEE International Conference on Data Mining, ICDM, vol. 2016-Janua, pp. 703–708, 2016, doi:10.1109/ICDM.2015.115.
- P. Jain, M. Gyanchandani, N. Khare, “Big data privacy: a technological perspective and review,” Journal of Big Data, vol. 3, no. 1, 2016, doi:10.1186/s40537-016-0059-y.
- X. Wu et al., “Privacy preserving data mining research: Current status and key issues,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4489 LNCS, no. PART 3, pp. 762–772, 2007, doi:10.1007/978-3-540-72588-6_125.
- C. Zhang et al., “Automatic keyword extraction from documents using conditional random fields,” Journal of Computational Information Systems, vol. 4, no. 3, pp. 1169–1180, 2008.