Logo PTI
Polish Information Processing Society
Logo FedCSIS

Annals of Computer Science and Information Systems, Volume 8

Proceedings of the 2016 Federated Conference on Computer Science and Information Systems

Clustering Documents on Case Vectors Represented by Predicate-argument Structures – Applied for Eliciting Technological Problems from Patents


DOI: http://dx.doi.org/10.15439/2016F462

Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 175180 ()

Full text

Abstract. Patent analysis is useful to understand the trends of technological problems and develop strategies for technologies. Here patent classification is a method to support the analysis. The purpose of this study is to propose a method for patent classification, with the use of hierarchical clustering based on the structural similarity of problems to be solved. The structural similarity can be calculated with case vectors based on predicate-argument structures of the contents of the patents. The interview survey indicated that this classification plays an essential role in analogical problem solving, by allowing visualization of similar technological problems.


  1. Chiu, Y.J., Ying, T., A Novel Method for Technology Forecasting and Developing R&D Strategy of Building Integrated Photovoltaic Technology Industry, Mathematical Problems in Engineering, 2012, 2012, pp.1-24, http://dx.doi.org/10.1155/2012/273530.
  2. Falkenheiner, B., Forbus, K., Gentner, D., The Structure Mapping Engine: Algorithm and Examples, Artificial Intelligence, 41, 1989, pp.1-63, http://dx.doi.org/10.1016/0004-3702(89)90077-5.
  3. Forbus, K., Gentner, D., and Law, K., MAC/FAC: A model of similarity-based retrieval, Cognitive Science, 19, 1994, pp. 141-205, http://dx.doi.org/10.1016/0364-0213(95)90016-0.
  4. Hashimoto, T., Murakami, K., Inui, K., Uchiumi, K., Ishikawa, M., Topic Extraction and Social Problem Detection Based on Document Clustering, SocioTechnica, Vol.5, 2008, pp.216-226, http://dx.doi.org/10.3392/sociotechnica.5.216
  5. Bos, J., Markert, K.. Recognising Textual Entailment with Logical Inference, Proceedings of the 2005 Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vol. 2012-NL-206, 2005, pp. 628–635, http://dx.doi.org/10.3115/1220575.1220654.
  6. Bos, J., Wide-Coverage Semantic Analysis with Boxer, Proceedings of the 2008 Conference on Semantics in Text Processing, 2008, pp. 277–286, http://dx.doi.org/10.3115/1626481.1626503.
  7. Bos, J., Clark, S., Steedman, M., Curran, J., Hockenmaier, J.. Wide-Coverage Semantic Representations from a CCG Parser. Proceedings of the 20th international conference on Computational Linguistics, 2004, pp.1240-1246, http://dx.doi.org/10.3115/1220355.1220535.
  8. Ouyang, Y., Li, W., Lu, Q., Zhang, R., A Study on Position Information in Document Summarization, Proceedings of the 23rd International Conference on Conputational Linguistics, Beijing, China, 23-27 August.2010, pp.919-927.
  9. MeCab: Yet another part-of-speech and morphological analyzer. http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html Accessed: 2016-05-06.
  10. Cabocha: Yet another Japanese dependency structure analyzer. http://taku910.github.io/cabocha/ Accessed: 2016-05-06.
  11. Dumais, S., Latent Semantic Analysis, Annual Review of Information Science and Technology, Vol.38, Issue.1, 2004, pp.188-230, http://dx.doi.org/10.1002/aris.1440380105.
  12. Mojena, R., Hierarchical grouping methods and stopping rules: an evaluation, The Computer Journal, Vol.20, 1977, pp.359-363, http://dx.doi.org/10.1093/comjnl/20.4.359.
  13. Shizu, A., Matsuda, S., Comparison of the Cluster Number Automatic Determination Method in a Cluster Analysis, Academia. Information sciences and engineering: journal of the Nanzan Academic Society, Vol. 11, 2011, pp. 17-34.