What is Tsume Shogi Parser?
I am a huge shogi patzer. I am also a poor Java programmer. Well, all in all, I am a miserable creature. But even though I am so old that I remember the era of black&white TV, I think I can improve. (Not in being miserable, I mean. The other way.)
I wanted to strengthen my shogi skills by solving mating problems. You would be surprised how hard it is to find some good materials on the net. If you have some experience with Go or Western Chess you think that the Internet is the ultimate source for this kind of stuff. You couldn't be more mistaken -- at least when you don't read Japanese.
Fortunately, there are good folks out there. Shogi-l email discussion list is a place where people like these meet. Reijer Grimbergen, i.e., posted series of 300 tsume shogi problems. Roger Hare was so kind to collect some of them into a single file:
That's how I got my learning materials. The only problem left was it's form:
...
4. Black: R4d, +B4c, +P2a In hand: G
White: K2c, G3c, S4b, L1b, P4a
5. Black: R2b, G3e, S3a, L2e In hand: None
White: K3c, G4c, S1d, P4e
...
A beginner like me could use some more "visually attractive" format. By the way, it's a human nature I think: people always want more. First I complained that there are no materials at all, and then (after I found them) I wanted them to be full of diagrams and nicely formated.
Anyway, that's how I decided to write a converter for the problems. I called it Tsume Shogi Parser.
The process
Tsume Shogi Parser is intended to be a building block to use in other applications (these applications are meant to provide user interface for user convenience). In the near future I plan to provide a web app that would allow the user to upload a text file and get the resulting pdf. The parser could also be used in shogi game database to produce handouts on the fly.
Ok, let's face it. The parser will be not for much use. After all, there is only one source of data stored in this format. I treat this thing as an exercise before more important challenge: writing a parser for "This week in Shukan Shogi" posted by Reijer Grimbergen on
shogi-l. The second benefit from it is trying out the part that generates pdf. That's IS going to be useful in other projects.
Let's go back to the parser. For know I’ll describe the inner mechanics of the Tsume Shogi Parser.
Technologies
For this piece of software I chose to use:
- Java -- of course ;-),
- JavaCC -- for defining the input file grammar and automatically generate a parser,
- XML -- for storing the data into format that can be converted into virtually anything else,
- JiBX -- for translating Java objects to XML (marshaling)
- XLS-FO -- for storing information about expected layout of the handout,
- XSLT 2.0 -- for defining a way of injecting tsume data to layout template,
- Saxon -- the only XSLT processor I know that supports XSLT 2.0,
- Apache FOP -- for generating PDF.
So, the process can be diagrammed as follows:
Shogi-l file->(.jj + javacc + jibix)->xml file->(xslt 2.0, saxon)->(fo, fo)->pdf
and in the future
-> spd, psn
-> database
Reading tsume
I have put all the tsume problems I found on shogi-l (most of them posted by
Reijer Grimbergen) to a single file. If you're interested in it, you can
download it from Tsume Shogi Project download pages. I also collected in a single file
the answers for the tsume problems.
Then I wrote a JavaCC grammar for the file -- I made the computer understand what which field in the file meant. You can see the grammar on the ShogiTools project's wiki.
It was my first contact with parsers and it cost me a lot of sweat. I am still no expert here but I manage to write a simple grammar and generate a parser from it. JavaCC is of great help here. It automates most of the work so you can focus on the merit, although I had lots of problems with it. Partly because I was a beginner in the matter of parsers, partly because of lousy project documentation and community support. I've read that similar program, ANTLR, is much better in these matters. Next time I should try it so I could make an honest comparison. (If you can share your experience in the subject, please, drop me a note.
The thing is, that with the help of JavaCC and JavaCC Eclipse Plugin I generated a program that read tsume problem file and stored it in memory in Java objects. (The class diagrams of the value object used for it should be on ShogiTools pages.)
It turned out that the input data was a bit inconsistent in form. The parser showed few places and I had to do some corrections by hand. But this was not a big problem.
Producing XML
Now I wanted the data stored into an XML file.
Why XML? It is license-free, platform-independent and well-supported. Also because of the promise of easy transformations to presentation formats (without coding).
Anyway that's what the theory says. I wanted to see this in practice.
The process of transforming the memory representation of an object to a data format suitable for storage or transmission is called
Marshaling. So I typed "fastet marshaller" into google search engine and got JiBX on pole position. And I used JiBX.
The interesting thing about JiBX is it's byte-code injection mechanism. In short: you define the mapping between classes and the XML. Then the mapping information is injected into your existing classes' bytecode. This means you can structure the Java classes the way you want -- they doesn't need to match the structure of the XML. It also turns out that this is a fast way do the conversion.
This part was quite quick. The program read 11k of tsume definitions and spat out 802 kB XML file. The description of XML structure can be found on ShogiTools pages.
The output
So, what is so good about having such a huge file instead of small one? And why it is so big?
Let's compare the entry for a single position in input text file and output xml file.
The text entry goes like this:
1. Black:+B3d, +B1c In hand: G
White: K2a, +R3c, L1a
Equivalent XML fragment is:
<Position>
<number>1</number>
<BlackDescription>
<PiecesPositions>
<PiecePosition>
<Piece>
<isPromoted>true</isPromoted>
<rank>B</rank>
</Piece>
<col>3</col>
<row>d</row>
</PiecePosition>
<PiecePosition>
<Piece>
<isPromoted>true</isPromoted>
<rank>B</rank>
</Piece>
<col>1</col>
<row>c</row>
</PiecePosition>
</PiecesPositions>
<Hand>
<Piece>
<isPromoted>false</isPromoted>
<rank>G</rank>
</Piece>
</Hand>
</BlackDescription>
<WhiteDescription>
<PiecesPositions>
<PiecePosition>
<Piece>
<isPromoted>false</isPromoted>
<rank>K</rank>
</Piece>
<col>2</col>
<row>a</row>
</PiecePosition>
<PiecePosition>
<Piece>
<isPromoted>true</isPromoted>
<rank>R</rank>
</Piece>
<col>3</col>
<row>c</row>
</PiecePosition>
<PiecePosition>
<Piece>
<isPromoted>false</isPromoted>
<rank>L</rank>
</Piece>
<col>1</col>
<row>a</row>
</PiecePosition>
</PiecesPositions>
</WhiteDescription>
</Position>
The second fragment is huge. It is well structured, data are organized in logical chunks. You may ask here: So what!?
And I agree. It's is no good to have kosher data structures for the sake of kosherness only. The real issue is what you can do with the data. There are the benefits of data conversion. Let me show you how our XML can be transformed.
My first try was to write XSL transformation to get an ASCII diagrams of the positions. For the given position (XML fragment above) I got this:
1.
White in hand:
9 8 7 6 5 4 3 2 1
+---+---+---+---+---+---+---+---+---+
| | | | | | | | wK| wL|a
+---+---+---+---+---+---+---+---+---+
| | | | | | | | | |b
+---+---+---+---+---+---+---+---+---+
| | | | | | |+wR| |+bB|c
+---+---+---+---+---+---+---+---+---+
| | | | | | |+bB| | |d
+---+---+---+---+---+---+---+---+---+
| | | | | | | | | |e
+---+---+---+---+---+---+---+---+---+
| | | | | | | | | |f
+---+---+---+---+---+---+---+---+---+
| | | | | | | | | |g
+---+---+---+---+---+---+---+---+---+
| | | | | | | | | |h
+---+---+---+---+---+---+---+---+---+
| | | | | | | | | |i
+---+---+---+---+---+---+---+---+---+
Black in hand:G
The whole file can be
downloaded from here.
Surly, we can do better than this! Well, we're almost there :-) This ASCII diagram thing was just a warm up before writing XSLT to produce PDF file. First I had to do xsl-fo my homework. For the PDF renderer I chose apache FOP which is quite well documented. So, I was able to learn some xsl-fo from apache documentation. I also used w3schools tutorials (http://www.w3schools.com/xslfo/default.asp).
I wanted my tsume shogi handout to have 6 problems per page and some tsume definition as an introduction. It was quite a bit of tedious work but I think it was worth it.
Judge for yourself. Here is
the output pdf file.
Future development
I have my handouts for doing some shogi exercises off-line. My goal is reached so I stop my experiments for now. Instead I will list possible improvements below.
The pdf I managed to produce has English description of pieces on the board (i.e. wG for white gold, bK for black king). I could use Kanji characters instead. But first I need to know the unicode for each piece character and own some unicode font.
The whole thing is not a big deal. But during the process of converting my tsume text files into pdf I managed to learn few things. I think I also broke fresh ground for future conversion component that could be used in shogi programs. Imagine, you have your shogi database and you are able to add ability to render you games in the format you need just by supplying suitable transformation (XSLT) file.
See You next time...