事前準備として次のソフトをインストールしておきます。
・gitクライアント
・word2vec
word2vecをWindowsにインストールする方法は、Windowsにword2vecをインストールする方法を参照してください。
gitからファイルをダウンロードします。
git clone https://github.com/klb3713/sentence2vec
ダウンロードされるファイルは次のとおり。
README.md
demo.py
matutils.py
sent.txt
test.txt
utils.py
voidptr.h
word2vec.py
word2vec_inner.pyx
sent.txtの内容は次のとおり。(1行に1文形式)
Harbin Institute of Technology (HIT) was founded in 1920.
After nearly 100 years, HIT has developed into a large nationally renowned multi-disciplinary university with science, engineering and research as its core.
HIT is consistently on the forefront in making innovations in research. For years, HIT has continued to undertake large-scale and highly sophisticated national projects.
HIT students study humanities and social sciences along with basic engineering and science courses for a strong comprehensive base.
HIT is famous for its original style of schooling: 'Being strict in qualifications for graduates; making every endeavor in educating students.'
HIT has remained an international university since its foundation. Courses at HIT used to be conducted exclusively in Russian and Japanese.
Today, all the faculty, students and staff of HIT, are dedicating, with full confidence
test.txtは次のような内容。7行です。
Harbin Institute of Technology (HIT) was founded in 1920. From its beginning, HIT has received preferential support from the central government. In 1954, the Ministry of Higher Education designated, for the first time, six national key universities. HIT was the only one of the six outside of Beijing. In 1984, HIT again found its way onto the list of 15 national key universities to receive special support. In 1996, HIT was among the first group of universities to be included in Project 211. This project targets 100 institutions of higher education in China to receive preferential support for development in order to become world-class universities in the 21st century. In 1999, HIT was listed as one of the top nine key universities in China. This distinction provided HIT with the opportunity to develop into a highly-competitive first-rate university with the assistance of the Ministry of Education and the Heilongjiang Provincial Government.
After nearly 100 years, HIT has developed into a large nationally renowned multi-disciplinary university with science, engineering and research as its core. We have established our own unique programs related to the field of astronautics that are unparalled anywhere in China. We have broadened our established disciplinary programs by utilizing a cross- disciplinary curriculum and as such have formed a comparatively full disciplinary system that consists of key, emerging and supporting programs. HIT now has 21 schools/departments, including 73 undergraduate programs, 147 masters' programs, 81 doctoral programs, 18 post-doctoral research stations, 18 national key disciplines, and 32 national & provincial (ministerial) key labs. The university employs 2,944 full-time teachers, among which 884are professors, 1,102 are associate professors, including 22 academicians of the Chinese Academy of Sciences and the Chinese Academy of Engineering. At present, there are 42,695 full-time students including 25,035 undergraduates, 11,794 master degree candidates and 4,387 doctorial degree candidates. We also added the Shenzhen Graduate School and Weihai Campus to the main campus in Harbin (including the Research Academy of Science and Technology and Research Academy of Industrial Technology), forming a pattern of 'one university, three campuses'.
HIT is consistently on the forefront in making innovations in research. For years, HIT has continued to undertake large-scale and highly sophisticated national projects. HIT's ability for scientific research has always been among the strongest in all universities in China. In 2007, HIT funds for scientific research reached 1.1billion RMB. In the comprehensive 10-year evaluation of the '863'project, HIT scientific research programs ranked second among all universities in China. HIT has been making great contributions to China's hi-tech research by creating many new inventions in scientific research fields such as China's first simulation computer, the first intelligent chess-playing computer, the first arc-welding robots, the first world advanced-level system radar, the first CMOS chip IC card with our own patent, the first giant computer-aided real-time 3-D image construction system, the first microcomputer-operated fiber twiner and the first large-scale tank-head forming machine. The famous 'Shenzhou Series Spaceship Project' received massive assistance from HIT in the field of large-scale land-based space simulation equipment, returning cabin deformation and orthopraxy welding technology, 3-axel simulation experimental platform and fault diagnosis. The micro-satellite 'Testing Satellite No.1', constructed mainly by HIT, was the first fully developed and launched satellite by a Chinese university. The technical advancements on the satellite meet international aerospace standards and mark a new chapter in the history of HIT and China's history of astronautics.
HIT students study humanities and social sciences along with basic engineering and science courses for a strong comprehensive base. They go on to learn scientific research methods and laboratory skills which enhance their creativity and innovative abilities. When our students graduate from HIT, they are equipped with strong theoretical knowledge and the ability for practical application.
HIT is famous for its original style of schooling: 'Being strict in qualifications for graduates; making every endeavor in educating students.' Our graduates have been warmly welcomed by employers throughout China; more than 100,000 graduates have stepped into society and many have moved up to high-ranking managerial positions and work as specialists in the fields of science and technology, education, and economics. A number of graduates have assumed leadership positions in the CPC and governments at different levels, or become generals of the PLA, academicians of the Chinese Academy of Sciences and the Chinese Academy of Engineering.
HIT has remained an international university since its foundation. Courses at HIT used to be conducted exclusively in Russian and Japanese. After the reforming and opening to the outside world, HIT has gained greater weight in the world. So far, we have signed academic cooperation agreements with 126 institutions of higher education in 24 countries such as the United States, the United Kingdom, France, Germany, Japan and Russia. Cooperation and exchanges are carried out between HIT and these universities though exchanging students, faculty and research staff, holding academic conference and cooperating in scientific research.
Today, all the faculty, students and staff of HIT, are dedicating, with full confidence, their concerted efforts to advance bravely towards the goal of building HIT into a well-known world-class university.
demo.pyを実行するとモデルファイルが作成されます。
python demo.py
すると以下のファイルが作成されます。
sent.txt.vec
test.txt.model
test.txt.vec
少し中途半端ですが、本日はここまで。近いうちに使用例を書きたいと思います。