No description
Find a file
Dzuchun 6d457332bc Massive work chunk I don't want to cetegorize
API is not stable now anyway, so whatever

- Most `Selci` kinds can be seeded now
- Most `Selci` kinds can iterate over their sub`Selci`
2024-08-25 20:05:22 +03:00
.github/workflows Added Rust testing 2024-03-14 23:45:34 +02:00
examples Removed newline from example input 2024-08-24 21:06:26 +03:00
examples-util A lot of changes, really 2024-03-30 01:29:55 +02:00
kyomato-cli Renamed binary crate to kyomato-cli 2024-08-24 11:58:44 +03:00
kyomato-core Massive work chunk I don't want to cetegorize 2024-08-25 20:05:22 +03:00
kyomato-scripts A lot of changes, really 2024-03-30 01:29:55 +02:00
kyomato-util Added kyomato-util crate 2024-08-25 19:38:23 +03:00
src clippy fixes except clippy::too_many_lines 2024-08-24 11:55:46 +03:00
tests clippy fixes except clippy::too_many_lines 2024-08-24 11:55:46 +03:00
.gitignore A lot of changes, really 2024-03-30 01:29:55 +02:00
Cargo.lock A lot of changes, really 2024-03-30 01:29:55 +02:00
Cargo.toml Started Token rework 2024-08-24 21:01:54 +03:00
README.md A lot of changes, really 2024-03-30 01:29:55 +02:00

Kyoko is being a tomato. Why? I don't know :idk:

Overview

Kyomato allows transformation of extended markdown subset into a \LaTeX document, ready for compilation. You can use helper CLI script or write your own, to then compile said \LaTeX, of transform in some other way.

WHY NOT PANDOC???

Please, see the "Why?" paragraph at the very bottom about the reasons I developed this thing. I doubt you can effortlessly do all that with pandoc (I'm not willing to die trying).

Examples

For programmatic examples of usage, see examples

CLI

Kyomato cli binary can be found at a nested kyomato crate. It can be built with usual

cd kyomato
cargo build --release

After than a binary will be at ./target/release/kyomato.

There's a helper script at ./kyomato-scripts/kyomato-helper.bash for simplified usage. Here's a quick walkthrough:

  • Build the binary or download already built one
  • Add it to the PATH envvar. In some shells, you can prepend your command with PATH=$PATH:/path/to/kyomato for that. Just make sure that provided path is absolute (you can run it through realpath for that)
  • Run
kyomato-helper.bash ./path/to/input.md

For example, here's a command I actually tested it with at kyomato-scripts directory:

PATH=$PATH:$(realpath ../target/release) KYOMATO_IMAGES=$(realpath ./image_folder)  ./kyomato-helper.bash test_input/input.md

After running it, result should be located at test_input.input.pdf.

Walkthrough

PLEASE NOTE: rules described below are fairly strict, since there's no literal markdown parser inside. Kyomato's design is "un-optimal" right now, since I need it to work NOW.

Markdown-inspired syntax

There are couple of elements looking similar to markdown1, namely:

  • Indents are NOT SUPPORTED: the parser is quite primitive now, so it explicitly relies on some elements having nothing in front of them.
  • Headers:
    • up to 6 deep
    • only hash-variant (### Head of 3rd order) is supported
    • current \LaTeX representation is a placeholder
  • Formatting
    • bold text
    • italic text
    • strikethrough text
    • NOT monospace text yet, because I kinda forgot about it
  • Footnotes:
    • (probably full support?)
    • inserted near the first use in the output, instead of where they are actually defined.
  • Lists
    • limited item support - make sure to include only one-line-things as items
    • (the above also means, there are no nested lists, for now)
    • only bullet, arabic, latin and cyrillic enumerations are supported, for now.
  • Inline and Display latex mathmode (not sure how it's properly called, it's the thing that allows for write fancy formulae)
  • Tables: columns formatting is ignored, for now
  • Figures: with obsidian-like insert (![[path/to.image.png]]), so that Obsidian would actively display them
  • Separators: these act like page breaks in the resulting \LaTeX
  • Hyper references (href): limited support, hrefs must be defined right near them~~, mainly because I forgot you can do that right up until right now~~
  • Code blocks: implemented with Minted

Latex

The are a couple capabilities, markdown does not usually provide:

  • Assigning identifiers to tables, figures and equations (display math)
    • Syntax (in meta block): ref = REF. REF cannot contain whitespace
    • Example (in meta block): ref = graph will transform to one of
      • \label{tab:graph}, if inserted after table
      • \label{fig:graph}, if inserted after figure
      • \label{eq:graph}, if inserted after equation
      • there's no way to opt this behavior out, for now
  • Referring to tables, figures and equations (display math)
    • Syntax: [@REF]
    • Example: [@fig:graph] transforms into $\LaTeX$'s \ref{fig:graph}
  • Assigning captions for figures and tables:
    • Syntax (in meta block): caption = "CAPTION". CAPTION can only contain escaped double quotes (\")
    • Example: (in meta block): caption = "Wow, this table is such data; Much amaze"
  • Assigning width to figures:
    • Syntax (in meta block): width = WIDTH. WIDTH must be a valid decimal for your \LaTeX engine
    • Example (in meta block): width = 0.9
    • Width is measured in terms out width of your document. So 1.0 will scare the figure to the width of the document
    • 0.9 is the default value

A meta block

This is a one-line braces-delimited block after equation, figure or table. It is expected to be placed on the next line. Comma-separated arguments described above can be placed there in any order and with any amount of whitespace between them.

Take a look at the examples for a better idea of this block's usage.

Yaml

What's yaml doing here? Well, it's related to pandoc allowing you substitute stuff in a so-called "templates" with certain strings.

Although there's neither implemented yaml parser, nor fully-supported arbitrary pattern system yet, Kyomato allows to do a similar thing: check out corresponding example (specifically, title_info one).

Ayano

because Ayano loves Kyoko

Kyomato has a very special type of syntax, that actually makes it so useful to me: Ayano blocks.

Basically, Kyomato can execute arbitrary Python code during output generation (for example, to perform some computations, or rm-rfing your entire hard drive, that's up to you, really). Kyomato is also able to insert results of said computations into the output (or refer to them, in case of figures, as they can't be inserted into latex directly). Ayano blocks are powered by pyo3.

Through the following explanations, there will be a couple of technical notes, describing exact inner working of Ayano blocks.

There are two types of Ayano blocks:

  • Static blocks are executed all at once just before output starts actually generating. There's nothing left in place of it in the output. All static blocks share the environment and their variables are visible to all function blocks.
  • Function blocks are compiled together with static blocks into a single Python module, but are executed on demand, during the output generation process. Function blocks leave something instead of them in the output.

Ayano blocks are regular code blocks, with their language set to Python, Ayano. Once Kyomato encounters alike block, it may react in a couple of ways, depending on what's specified after Ayano keyword.

A general syntax: Python, Ayano ! * "DESCRIPTION" ~ PATH. It is comprised of three optional independent parts (in any order):

  • Static block declaration (!):
    • Makes block static
    • Omitting it makes block a function
  • Display declaration (* "DESCRIPTION"):
    • Adds this block of code to a special section at the very end of the output
    • Description is optional - you can just leave it as *
    • Description must escape any double quotes
    • If description is specified, it will be displayed under block's representation in a said special section
  • Insert declaration (~ PATH):
    • Must point to some Python script with path relative to the execution folder
    • Said script is inserted at the beginning of the block, and will be executed and/or displayed with block's code
    • A block marked with that, will execute it's code in the directory of a pointed script
    • Blocks without this mark, execute their scripts in the directory of the source, if available

As mentioned prior, all function blocks are replaces with something. You can see them as functions that leave in the output their result. Anything having a __str__ Python representation can be a valid output.

Additionally, block's last line can be a special Ayano Syntax. These are mostly shortcuts to certain already-existing \LaTeX structures, like figures and tables:

  • Trailing return: basically, you can omit the actual return keyword, and it will be appended for you by Ayano. That's jut about convenience - you can still put that return each time, Ayano shouldn't complain about that.
  • Value-Error formatting syntax:
    • Accepts a value and an error, outputting them with $\LaTeX$'s \pm (\pm) in between
    • Syntax: @dev: VALUE, ERROR. VALUE and ERROR must be valid Python objects that can be parsed into float
    • Example: @dev: 4.54535, 0.2342
    • Additionally, a certain manipulation is performed on parsed value and error:
      • All operations are performed in decimal, so no binary jumpscares
      • An error is rounded to
        • Two digits, if it starts with 1 or 2
        • Single digit otherwise
      • A value itself is rounded or appended to the same digit as error
  • Figure syntax:
    • Transforms into an inserted figure on output
    • Syntax: @fig: src = SRC, ident = IDENT, caption = CAPTION
      • Arguments can come in any order and with any whitespace
      • All arguments must be something Python can str()
      • SRC argument is mandatory, and is supposed to point to image file to include
      • If Ayano detects SRC to be something starting and ending with (double) quotes, a special check will be performed to inform you, if file it points to does not exist. This should not halt Kyomato, as said file can be generated during block execution
      • IDENT is optional, and specifies figure's ref
      • CAPTION is optional, and specifies figure's caption
      • CAPTION's content is run though regular parser, with minor limitations.
    • Example: @fig: src="line.png", ident="line", caption="Figure caption"
  • CSV-Table syntax:
    • Reads and inserts a CSV table into the output
    • Syntax: @csv_table: src = SRC, rows = ROWS, columns = COLUMNS, ident = IDENT, caption = CAPTION
      • Arguments can come in any order and with any whitespace
      • SRC, IDENT and CAPTION arguments must be something Python can str()
      • SRC argument is mandatory, and is supposed to point to csv table file. can actually not be a csv table, just some format that python's csv module can manage to read.
      • If Ayano detects SRC to be something starting and ending with (double) quotes, a special check will be performed to inform you, if file it points to does not exist. This should not halt Kyomato, as said file can be generated during block execution
      • ROWS must have a Rust range format. It defines rows of the table to include (0th row is considered a header, that's always included). This argument it parsed and used internally by Ayano, so please follow the formatting strictly
      • Omitting ROWS will include all rows
      • COLUMNS defines table columns to include. Here's an example of the syntax: ["column1", "column2", ("value_column", "error_column")]:
        • This argument it parsed and used internally by Ayano, so please follow the formatting strictly
        • Syntax resembles an array of elements, where each element can be of two types:
          • Single column include - representing a single column from the table, included under it's name into the output table
          • Value-error column include - representing value column and error column, that will be merged using value-error syntax and included as a single column to the output, under value column's name
        • Order of columns in the output table will be the same as specified by this syntax
      • Omitting COLUMNS will include all columns in some order, and without any fancy formatting
      • IDENT is optional, and specifies table's ref
      • CAPTION is optional, and specifies table's caption
      • CAPTION's content is run though regular parser, with minor limitations.
      • Table's cells are run trough regular parser too, so you can format text, add referenced and so on
    • This specific syntax expects Python to read the input file. This is important, if your block has an insert declaration - make sure that Python will be able to read the intended file without any fancy path searching, just as it is
  • Generated-Table syntax:
    • Transforms into a runtime-generated table, by calling a special generator object with table's row and column
    • Syntax: @gen_table: GENERATOR; rows=ROWS, columns=COLUMNS, caption=CAPTION, ident=IDENT
      • Arguments except GENERATOR can come in any order and with any whitespace
      • IDENT and CAPTION arguments must be something Python can str()
      • IDENT is optional, and specifies table's ref
      • CAPTION is optional, and specifies table's caption
      • CAPTION's content is run though regular parser, with minor limitations.
      • Table's cells are run trough regular parser too, so you can format text, add referenced and so on
      • ROWS and COLUMNS arguments must be a plain decimal integers, specifying number of rows and columns respectively generated for table
      • ROWS and COLUMNS are parsed and used internally by Ayano, so please follow the formatting strictly (that's temporary, there are plans to do these runtime-determined too)
      • GENERATOR must be something that can be called by Python with two arguments: cell's row and column respectively. For example, it can name a function with fitting signature, or be a Python lambda: lamba r,c f"That's cell ({r}, {c})"
      • GENERATOR must come first, and expected to have a semicolon after it

There are ways to trigger explained behavior without actually using this fancy syntax. Internally, Ayano still converts it to a valid Python code, so if you happen to write the same code yourself, Ayano will happily detect and react to it. However, the only practical one I see is value-error formatting one, here's it's transformed syntax: ("err", value, error). You can, of course, use anything indexable by 0, 1 and 2, that should work fine too.

For more info on Ayano blocks usage, check out examples.

Path engines

There are several directories your code can potentially refer to:

  • If you're inside Ayano block with insert declaration, you might refer to stuff relative to insert declaration's directory
  • Any point of source file can potentially point a file right next it
  • In Obsidian, there'a dedicated directory for pictures, so your source might refer to it
  • Lastly, you might refer to a local file in the directory you actually run Kyomato from

To control and direct to all of the described locations, a PathEngine (PE) is used. Basically, it checks all of the locations in a certain order, dependant on the requested file type and optionally preforms an extra check, like "if file actually exists, and I can access it?" or "if path points to a valid directory", or "does path look SUS"?

See examples for more explanation on that part

Ideology

Although it's annoying to admit, there is no way to do this sort of transformation as fully-streamed and O(1) memo. There are two reasons for that:

  • Markdown allows to define footnote content at any point up until the end of the file. I couldn't find any way for \LaTeX to look up ahead for this sort of stuff - it seems to always insert them at the bottom of the page definition was encountered.
  • Some Ayano blocks are supposed to leave something behind them, meaning we must execute them at the time of output generation. This can be achieved with O(1) memo, but would likely involve recompiling Python module each time, leading to longer execution time Thus, code must be fully collected first, that executed of the second, separate path.

The above means, that it would be wise to read the entire input first, and store it in a large buffer. Then, various struct will just refer a part of that buffer, and only create their new owned buffer, once data they refer to is not really relevant.

For the most of the codebase, that's the design I'm following right now, but there are a couple of hiccups here and there...

There's a plan for complete rework of data storage, that will allow us to use much smaller tokens (they are, like, 120b each now, which is... BAD, REALLY BAD).

Not Async

As much as I'd like all of the processes be nice and async, I see no option for that in case of continuous output generation. Well, unless there's a way to efficiently write to a single output from multiple threads, while preserving output and Ayano block execution order. Any sort of "collect" function would destroyed the purpose, so that's not an easy thing to design.

Why?

This crate allows to transform Obsidian's markdown files into a \LaTeX document of the format I personally like. The general idea is - markdown is by design much less descriptive than \LaTeX, but I've established a common style with my \LaTeX, and found myself just plainly copying stuff all over the place.

Automatic source transformation provided by editor can be used to mitigate that, but in the end you still end up with a giant lumps latex syntax, that mean nothing to you, while you're writing the document.

Guess what is also a pain to copy around and check for validity? Data! I often need a couple of tables inside of the document I write. And the most painful thing to do is to search for each and every location you've inserted the data in, and replacing it with new, updated value (this mostly relates to calculation error fixing). Along with that, I'd like to have some sort of documentation on exact actions I've done with the data to get the result I display. And these places should be kept in mind, too! It's only logical to delegate all of that to the machine.

At times before that, I used Kile to write my \LaTeX, Alphaplot to plot data and approximate and, Libreoffice to create spreadsheets and calculate stuff. Some sort of mistype in any of those lead to my physical pain, un-sync info everywhere, and a couple of minutes of me intensely finding out every place in my work, that needs to be updated, while keeping in mind all of the workflow I had at the moment of said distraction. I hope to completely eliminate situations like these with this project, reducing number of programs I use to a single markdown editor (which is Obsidian, for now).

If you can relate to problems above, I hope this project will help you too.


  1. I can't say these are implemented properly, since that would be a lie - their implementation is quite primitive right now, and should be used with caution or th. Keep in mind, that document you write must be translated to \LaTeX. ↩︎