CCScript 1.0 Language Reference

Table of contents

  1. Introduction
    1. What is CCScript?
    2. CCScript modules
    3. Modules and scopes
  2. The Language
    1. Lexical structure
    2. Values and Types
    3. Statements
    4. Expressions
  3. Future revisions
  4. Standard library

1. Introduction

1.1. What is CCScript?

CCScript is a simple scripting language designed to provide a higher-level representation of text and control codes in the Super Nintendo game EarthBound. It provides text output, labelling, and referencing, as well as control structures, boolean expressions, and macro-like command definitions.

1.2. CCScript modules

CCScript code is contained in text files, usually with the file extension ".ccs". A CCScript file is also called a "module." A collection of CCScript modules compiled together is called a CCScript "project." It's usually a good idea to break up a project into multiple CCScript files for convenience and improved organization. It's much easier to work on your project if it is broken down into major components, each of which is in its own text file. This way the individual files are smaller and easier to navigate, and each one is (hopefully) focused on a single purpose or area of a script.

Using multiple modules is also desirable since, in the current version of the compiler, each module is limited to a compiled size of 64 kilobytes. This is due to the fact that a single text block cannot cross a bank boundary in the ROM, and I'm too lazy to add logic to the compiler to divide a module correctly across multiple banks. :-)

1.3. Modules and scopes

In CCScript, every script file in a project has its own scope; that is, the symbols defined in one CCScript module are not immediately visible to other modules. However, it is possible for code in one module to refer to identifiers in another module, by simply prefixing the name of the identifier with the name of the module that contains it, separated by a period.

This code, for example, refers to identifiers contained in other modules:

goto(onett.somelabel) if twoson.someflag { "@Something has happened in Twoson!" } "@Did you hear about that new inventor in Threed?" next "@I think his name was {threed.inventor_name}." end

For information on the meaning of this code, see the sections on Identifier expressions and Command invocations, as well as the rest of the documentation on Statements and Expressions.

Module names

The CCScript compiler assigns a name to every module in a project. That name is taken directly from the filename of the module file, without the file extension. So if a project consists of the following files:

   onett.ccs
   twoson.ccs
   threed.ccs
   commands.ccs

...then the project will contain modules named onett, twoson, threed, and commands. Note that this places some restrictions on the filenames of modules used with the CCScript compiler:

2. The Language

2.1. Lexical structure

Whitespace

In CCScript, whitespace (tab characters and spaces) is only used to separate lexemes; it has no other purpose. Thus, programmers have a great deal of flexibility in formatting their code. Multiple statements may be placed on the same line, or single statements may have their lexemes separated over multiple lines.

While the language itself allows any style of whitespace formatting, as a general guideline it is best to use whitespace to improve the readability of code. Simple techniques such as indenting nested blocks and using blank lines to separate sections of code can go a long way towards making programs easier to read.

Identifiers

An identifier in CCScript consists of any number of letters, numbers, or underscore characters, and must begin with a letter or an underscore. Identifiers in CCScript are case-sensitive, so foobar, Foobar, and FOOBAR are all different identifiers.

identifier := [a-zA-Z_]([a-zA-Z_0-9])*

Keywords

The following keywords have special meanings in CCScript, and cannot be used as identifiers:

and byte command default define else flag if long menu not or ROM ROMTBL short

Operators and other tokens

CCScript currently supports only a few operators, including the boolean logical operators and, or, and not. It also includes the assignment operator =, but currently only as part of a ROM or ROMTBL statement. The following symbols are the non-alphanumeric tokens used in the language:

{ } [ ] ( ) : . , =

Numbers and strings

CCScript accepts both string literals and positive integer literals. Integer literals may be in decimal (with no prefix), or in hexadecimal (with the prefix '0x').

125 // decimal integer 0x2F // hexadecimal integer "Hello." // text string intliteral := [0-9]+ | '0x' [0-9A-Fa-F]+ stringliteral := '"'[.]*'"'

Comments

Comments are sections of text that are ignored by the compiler. They do not affect the output of the program, but they are useful for explaining or otherwise annotating code. CCScript supports two kinds of comments, which will be familiar to anyone who uses languages like C++ and Java.

Single-line comments begin with // (two slash characters). Anything from the start of the two slashes to the end of the current line is part of the comment:

// This is a single-line comment. hotspot_on(1, 42, target) // This, too!

Multi-line comments begin with /* (slash, asterisk), and end with */ (asterisk, slash). Everything from the first slash to the last slash is part of the comment, and this can include multiple lines of text:

/* This is a multi-line comment. Second line of the comment. Third line of the comment. */

2.2. Values and Types

CCScript is a fairly primitive language when it comes to types - it has no variables, so the concept of types is of limited use. In fact, every expression in CCScript ultimately has the same type: an array of bytes. Thus all types are directly convertible to one another; wherever one expression is allowed, any expression is allowed. (Though not always useful.)

There are, however, different types of expressions, and each is coerced into its final byte array form in a different way, so it is still useful to mention them here.

Numbers

A number in CCScript is always an unsigned 32-bit (4 byte) integer, representing possible values from 0 to 0xFFFFFFFF (4,294,967,295).

Strings

A string is essentially just an array of bytes, with each byte representing a character. However, strings can also contain special character sequences that are evaluated differently. See String expressions for more information.

Booleans

Conceptually, an expression of the boolean type can have one of two values: "true" or "false." The concept is rather loosely treated in CCScript however; any expression can be treated like a boolean. The value of a boolean (true or false) comes from examining the result of the EB text system's results register after the expression has been parsed in text; a value of zero corresponds to "false," and any other value corresponds to "true."

2.3. Statements

Statements are the building blocks of a program. A program in CCScript consists of a series of statements:

program := (statement)*

There are several different types of statements, each of which is described in detail in the following sections.

Block statement

block-stmt := '{' (statement)* '}'

A block statement is simply an arbitrary number of statements contained inside curly braces. Block statements are useful for grouping together multiple statements under an if or menu statement. Unlike in some other languages, blocks in CCScript do not define new scopes; it is also not allowed to use constant definitions or command definitions inside blocks.

If statement

if-stmt := 'if' expression statement ['else' statement]

The if statement is the primary method of controlling the flow of events in a script. The expression after the if keyword is evaluated, and if the result is "true," the statement following the condition will be executed. Otherwise, the statement following the else keyword will be executed. The else clause is optional.

Examples:

// Example 1 if flag 2 "@The temporary flag is set." // Example 2 if hasmoney(5000) { "@That's a good bit of cash there." next "@Want to buy a genuine Rolex?" end } else { "@Get outta here, cheapskate!" end }

Note that if you want multiple statements to be controlled by an if statement, you must put them inside a single block statement. As a general rule, it is best to always use curly braces with if statements; this reduces the possibility of potentially confusing mistakes.

Menu statement

menu-stmt := 'menu' [intliteral] '{' ( ['default'] expression ':' statement)* '}'

The menu statement allows you to provide the player with a selection of options, such that their choice influences the flow of events in the script. Syntactically, a menu consists of the menu keyword, optionally followed by a number, followed by a series of options. Each option consists of an optional default keyword, and expression that defines how the choice will be displayed, a colon character (':'), and a statement that will be executed if that choice is selected.

The optional number after the menu keyword specifies how many columns to display the choices in. The default keyword, when placed before an option, indicates that that option will be selected by default if the player presses "B" to cancel the menu, instead of selecting an option.

Examples:

// Example 1: display a basic yes-no menu "@Would you like ketchup with that?" next menu { "Yes": "@Sorry, we're all out of ketchup." default "No": "@Sorry, we're all out of not-ketchup." } end // Example 2: display a longer menu, in one column "@What is your favorite ice cream flavor?" next menu 1 { "Vanilla": goto(vanilla) "Strawberry": goto(strawberry) "Chocolate": goto(chocolate) "I don't like ice cream": { "@That is unacceptable!" next "@You die now!" next battle(39) } }

Note that, as with the if statement, if you want an option to control more than one statement, you must collect them together within a block statement (curly braces).

Label definition

label-stmt := identifier ':'

A label is a named location in code that can be referred to elsewhere in code. Defining a label allows you to refer to the address of that location in the script. Labels are primarily used for text jumps, or to link the address of a text block to some part of the ROM.

mylabel: "@The label 'mylabel' points to this text."

The address of a label always points to the first non-definition statement that follows the label definition. Definition statements (labels, commands, constants) merely declare symbols that can be used by the rest of the program, so they do not themselves have any "position" within the code.

Label definitions are not allowed in the body of a command definition. A label defined in one CCScript module can be accessed by another. For more information, see the section on Modules and scopes.

Constant definition

const-def := 'define' identifier '=' expression

A constant definition statement allows you to bind a name to any expression, so that you can refer to it anywhere else in code by name. This is convenient if there are certain values in your script that are used often: for example, an event flag number, or the name of an important non-player character. If you define such values as a constant, and later decide to change the value, you will only have to modify the constant definition, instead of changing every occurrence of the value within your script.

Constant definitions are only allowed at the file scope; they cannot occur inside block statements or in the body of a command definition. Constants defined in one CCScript module can be accessed by another. For more information, see the section on Modules and scopes.

Examples:

define boss_name = "Master Puke" define boss_beaten = flag 437 // Elsewhere, we can refer to the names instead of repeating the values: if boss_beaten { "@Hey, you're those guys who beat {boss_name}!" next "@Good job!" end }

Command definition

command-def := 'command' identifier ['(' identifier ( ',' identifier )* )? ')'] statement

Commands in CCScript are similar to constant definitions, but a command definition can include parameters that define part of its behavior. When using a command, you specify the arguments that give value to the parameters of the command. Each argument is a single expression that is evaluated each time the corresponding parameter is used within the body of the command. In this way, CCScript commands are similar to macros, or to functions with lazy parameter evaluation.

A command definition consists of the command keyword, followed by an identifier, followed by an optional list of parameter identifiers enclosed in parentheses and separated by commas. This is then followed by the statement that defines the body of the command.

Command definitions are only allowed at file scope; they cannot occur inside block statements or other command definitions. Commands defined in one CCScript module can be accessed by another; see the section on Modules and scopes.

Examples:

// Example 1: command sayhello(name) { "@Hello, my name is {name}." next "@It's nice to meet you." next } command sell_offer(item, price) { "@Would you like to sell the {itemname(item)} for {money(price)}?" next menu { "Yes": { "@Excellent. take(0xff, item) givemoney(price) } "No": { "@Well, okay. Keep it then." } } }

Again, note that if you want a command's body to contain multiple statements, you must group them together in a block statement. As with if statements, it is generally best to always use block statements (curly braces) in command definitions, to reduce the likelihood of confusing mistakes.

ROM access statement

rom-access := 'ROM' '[' expression ']' '=' expression | 'ROMTBL' '[' expression ',' expression ',' expression ']' = expression

A ROM access statement provides direct access to the contents of the output ROM file, allowing you to modify bytes at certain locations directly. There are two versions of the ROM access statement: one using the ROM keyword, which simply takes an expression for the location to write to, and an expression for the value to write: ROM[location] = value.

The other version, using the ROMTBL keyword, is intended for accessing certain elements of table-based data. The syntax is ROMTBL[base,size,index] = value. The expression base is the base address of the table to modify, size is the size of each element within the table, index is the index of the element to modify, and value is the data to write at the final location.

ROM access statements are collected and executed during the last pass of the compiler. They are primarily meant for special-purpose use by certain standard library commands, and you should be extremely careful if you decide to make use of them.

Examples

// Example 1: overwrite the text at 'testlabel' with a pointer to a different label ROM[testlabel] = goto(otherlabel) // Example 2: change the name of enemy #30 ROMTBL[0xD5958A, 94, 30] = "Exploding sandwich[00]"

Expression statement

In addition to the above types of statements, a statement can simply consist of any CCScript expression:

statement := expression

2.4. Expressions

An Expression is a combination of values, operators, or identifiers that itself has some value in the context of the program. As statements are the primary building blocks of the logic of a CCScript program, expressions are the primary building blocks of the content of those statements. An expression is a representation of some value within the program - a string of text, a number, the result of invoking a command, and so on.

Every expression has a type, which is essentially its categorization within the set of possible expressions, and which determines the way it is evaluated.

String expressions

expression := stringliteral

A string expression simply consists of a string literal. When it is evaluated, the characters in the string are directly translated into a byte array, with certain exceptions:

Control Codes

Sections within a string delimited by matched square brackets ('[' and ']') can contain a sequence of hexadecimal digit pairs, and are interpreted directly as a series of bytes. This is useful for inserting control codes directly into text strings, or for outputting binary data in general.

"@This is a string[10 0F] with control codes.[13][18 04][02]"
Pause characters

The characters '/' (forward slash) and '|' (vertical pipe) have special meaning in CCScript text strings. They are shorthand for brief pauses in text processing, useful for quickly and easily adding appropriate pauses to dialogue without having to resort to more verbose representations, such as the standard library pause command, or the [10 XX] control code. The '/' character produces a pause of 5 frames, and the '|' character produces a pause of 15 frames.

"@Well,/ at least I know how to pause.|.|.| dramatically!"

To use the '/' or '|' characters in text, you can use control code brackets with the corresponding character values: '[5F]' and '[AC]'.

Expression brackets

It is also possible to embed other CCScript expressions inside strings, by enclosing them in curly braces ('{' and '}'). Each brace pair can contain only one expression: any additional characters after the first expression has been parsed will be ignored. This is convenient for including printable expressions directly in text without having to break the surrounding text up into multiple string literals.

"@{name(1)}, would you like to buy a {itemname(86)} for the low, low price of ${atmbalance}?"

It is also possible to embed expressions inside control code blocks:

command goto(address) { "[0A {long address}]" }

Numeric expressions

expression := intliteral

A numeric expression is simply the use of an integer literal. It is always evaluated as a positive 32-bit integer, and transformed into a byte array in little-endian fashion.

1 // Equivalent to "[01 00 00 00]" 0xC58000 // Equivalent to "[00 80 C5 00]"

Boolean operators

expression := expression 'and' expression | expression 'or' expression | 'not' expression

A boolean expression is formed by using the logical operators and, or, and not.

The different boolean operators can be used

The binary boolean operators and and or are short-circuiting. That is, their second operand will be skipped if the first operand completely determines the value of the expression: if the first operand to and returns "false", the second operand will not be executed, because we already know that "false and X" is false, no matter what X is. Similarly, if the first operand to or returns "true", the second operand will not be executed, because "true or X" is always true.

This is mainly an issue when an expression used as an operand to a boolean operator has a side effect, that is, it does something besides just return a value. For example, consider the following code snippet:

flag 265 or take_money(100)

When this is executed, if event flag #265 is turned on, the or operator will skip the take_money part of the expression, so the player will not lose 100 dollars. However, if event flag #265 is turned off (i.e., flag 265 returns false), then take_money will be invoked and the player will lose 100 dollars.

NOTE: short-circuiting does not apply to "compile-time" side effects, like ROM write statements. If an expression has compile-time side effects, they will be performed regardless of the future evaluation of the boolean expression.

Flag operator

expression := 'flag' intliteral

A flag expression is formed by using the flag operator with a numeric literal. A flag expression is used to denote an event flag in EarthBound. In most contexts, a flag expression is simply evaluated as a 16-bit number; however, when evaluated as a boolean (i.e., as part of a boolean expression or as the condition of an if statement), it is converted into a code sequence that will load the current state of the event flag into the result register, thus enabling it to act as a boolean.

Identifier expressions

expression := identifier ['.' identifier]

An identifier expression is simply the case of an identifier (the name of a label or defined value) being used where an expression is expected.

define myconstant = 625 mylabel: // The following two expressions are identifier expressions: myconstant mylabel

The value of an identifier expression depends on the object to which the identifier is bound. When the identifier refers to a label, the value of the expression is the integer address of the label. When the identifier refers to a constant, the value of the expression is the expression bound to the identifier. If the identifier is the name of a command, the expression is a command invocation expression.

An identifier expression can also refer to an identifier defined in another module. In this case, the expression has the form module.identifier; that is, the name of the module file to access is followed by a period character, which is then followed by the identifier. For more information, see the section on Modules and scopes.

twoson.mylabel // Refers to "mylabel" in the module "twoson.css"

Command invocations

expression := identifier ['.' identifier] ['(' ( expression ( ',' expression )* )? ')']

A command invocation is a special case of the identifier expression, in which the identifier refers to a command. The identifier in a command invocation can be followed by a list of argument expressions, contained within parentheses and separated by commas.

command mycmd(arg1, arg2, arg3) { "{arg1} {arg2} {arg3}" } // A command invocation expression: mycmd(42, "foobar", somelabel)

When a command invocation is evaluated, the provided argument expressions are bound to the corresponding parameter names in the command's definition. The value of the command invocation expression is the value of the body of the command definition, evaluated in the environment formed by adding the parameter bindings to the command's scope of definition.

As with references to labels and constants, a command invocation can refer to a command defined in another module. The syntax is the same as for non-command identifiers: use the name of the module, followed by a period, followed by the name of the command in that module to invoke.

bakery.dostuff("pie", 42) // Invokes "dostuff" in the module "bakery.ccs"

Array access expressions

expression := ('byte' | 'short' | 'long') [ '[' intliteral ']' ] expression

An array access expression directly accesses the byte array form of another expression. It allows certain bytes to be "selected" from an expression, so that only those bytes will be part of the value of the final expression. This is useful for situations where you need to break apart the bytes of an expression, or when you want to ensure that only a certain number of bytes are inserted.

An array access expression begins with a size keyword (byte, short, or long). If byte is used, the expression will be treated as an array of bytes (8-bit integers). If short is used, the expression will be treated as an array of 16-bit integers. If long is used, it will be treated as an array of 32-bit integers.

The next part of the expression is the index in the array to be accessed. This is just an integer literal enclosed in brackets. The index is optional; if it is omitted, the index is assumed to be zero. The final part of the array access expression is the expression whose array representation will be accessed.

Examples:

// Take only the first character of the string "I like pie." byte [0] "I like pie." // equal to "I" short [1] 0x11223344 // Takes only the high-order halfword, equal to [22 11] short [0] 0x11223344 // Takes the low-order halfword, equal to [44 33] // Array accessors are used in the CCScript standard library to ensure that only // the correct number of bytes are inserted into a control code parameter: command goto(target) { "[0A {long target}]" command set(num) { "[04 {short num}]" } command pause(len) { "[10 {byte len}]" }

3. Future revisions

The current version of CCScript is 1.0.

The next release is tentatively numbered 1.1, and will most likely focus primarily on bugfixes. However, some additional features are being considered for inclusion as well. Several of these are listed here:

If you have any suggestions for CCScript features (or if you would like to report a bug), drop by the #pkhax channel on DynastyNet (irc.dynastynet.net). Mr. Accident is in there often enough, so just give him a poke if you want to gripe about CCScript.

4. Standard library

CCscript includes a large collection of built-in commands for commonly used control codes. This collection of commands forms the CCScript standard library, and is documented in full in the Command Reference. The standard library is not comprehensive; there are numerous control codes that are not yet included. In many cases this is because a particular control code is not yet fully understood, but in still more cases it's because I'm lazy, and because the command reference is already about 70 pages long.

CCScript is intended to be extensible through commands, so if you find functionality missing from the standard library, don't hesitate to write up a module containing some useful commands for things that were omitted from the set of built-ins. The standard library will probably be extended in the future also, as the purposes of more control codes are discovered.





CCScript Language Reference, version 1.0 -- Contact: [email protected]