syntax highlighting

MZ952Created: ~6 years ago

I've been asking a lot of questions lately ... I'm building a text editor and I'd like to know if someone out there has created some kind of syntax highlighter before I embark on creating one of my own. I think I know how to do it, kinda, but my idea would take approximately O(n²) time and, well, that's not best.

Spoiler

Basically, uh, for each char in the line of code, add that char to a stack and search all phrasewords for an exact match. If there exists at least a partial match among phrasewords, then add next char and repeat, etc., until a perfect match or no match. If no matches, then reset the stack and move on; until all chars in the line of code have been checked. (I'd write pseudocode but it's uh tricky.) It's O(n²) because it fits into

FOR I=0 TO LEN(CODE$)-1
 FOR J=I TO LEN(CODE$)-1
  INC TEST$,CODE$[J]
  CHECKFORMATCHES
  IF IMPARTIAL THEN ...
  IF NO THEN ...
 NEXT J
NEXT I

Or if you've done it before and have pointers or whatever, I'd love to hear them Thanks

~6 years agoEdited ~6 years ago by MZ952

MZ952 #2

Actually, I already tried to make one that runs in like linear time but I ran into a few snags. Now that I think about it, all I'm really doing is parsing SB code and coloring it accordingly. I'm not sure if reverse-engineering a program that also does that would be any quicker for me than for me to just make one. But then again, I'm not sure how to efficiently make one anyway. Sigghhh_hhh

~6 years agoEdited ~6 years ago by MZ952

HTV04 #3

This seems kind of confusing. Forgive me if I'm wrong, but if someone made a syntax highlighter, I'm pretty sure they would also have a text editor to go along with it. You might need to make your own to fit your own text editor. I can try to help, if you want. What do you have so far?

~6 years agoEdited ~6 years ago by HTV04

Yolkai #4

https://lumage.smilebasicsource.com/forum?ftid=187 this code comes from something that does more than just change colors depending on the token, so there are a lot of unnecessary parts. it's also an older version, and the style leaves something to be desired. but it does what you want.

DEF _tokenize(program$)

turn a string program$ representing some SB3 code into an array of token$s

DEF _read_token_obj token$ OUT tokentype%, value$, pos%, length%

get fields from a token$ tokentype is the value of one of the TOK_* constants defined at the top pos% is position in the input that the token occurs in value$ is literal string represented length% is the length of the string, probably

DEF _new_token(tokentype%, value$, pos%, length%)

create a packed token$ with the specified attributes

~6 years agoEdited ~6 years ago by Yolkai

MZ952 #5

Thanks Yttria. Scanning through that code real quick, I think that's more or less just what I was going to do for mine. More or less. I'll probably just base it off of that and move on.

This seems kind of confusing. Forgive me if I'm wrong, but if someone made a syntax highlighter, I'm pretty sure they would also have a text editor to go along with it. You might need to make your own to fit your own text editor. I can try to help, if you want. What do you have so far?

I vaguely recall seeing some Japanese (I think) program which did custom syntax highlighting. I'll really have to dig through my folders for that, though (if it even exists). I don't think it was even a text editor, it just displayed custom syntax highlighting. I'll probably release the early version of my work like uh soonish, later this month. I guess you can decide then whether or not it's worth any help lol

~6 years ago

MZ952 #6

Finished the highlighter. It was actually easier than I first thought, thanks for the help guys. But uh, seems a bit slow to render at every frame (way too slow). I'll have to come up with some clever way to store and dynamically affect the highlight data. Edit: aaannd shoot. It's not gonna happen. Even preliminary testing shows me that some kind of syntax highlight data storage isn't going to cut it. Not for the performance I want out of the editor itself. Terrible performance is not worth the cosmetics. I'll post the highlighter code somewhere though, so I don't feel it went to total waste lol.

~6 years agoEdited ~6 years ago by MZ952

snail_#7

Hack together some cheap "multitasking" in your main loop to spread the parsing work across multiple frames, and only reparse/redraw what is visible.

~6 years ago

Yolkai #8

you're not reparsing the entire file every time, right? should just need to parse what's on screen +/- a line boundary, maybe even less the code i posted had many obvious optimizations painfully not taken, too.

I'll post the highlighter code somewhere though, so I don't feel it went to total waste lol.

please do.

~6 years agoEdited ~6 years ago by Yolkai

MZ952 #9

Hack together some cheap "multitasking" in your main loop to spread the parsing work across multiple frames, and only reparse/redraw what is visible.

It's... Doable. *Sigh*. Parse the whole file upon initial loading, store the highlight data, and dynamically update that data (work partitioned across multiple frames) when lines of code are edited. Luckily, ints take up way less freemem than strings, so the burden on that isn't so high (especially considering my text editor here has 10 code slots lol). By my calculations, the average line of code would require 10 frames to seamlessly parse. (Longer lines could take like whole seconds to update.) Part of my issue is that my text GUI takes a good chunk of a frame, and so does parsing SB code. (Asymptotic worst case is O(n²).) I'd like everything to run seamlessly at 60 fps, but I'm just wondering to myself if this is all worth the effort lol.

you're not reparsing the entire file every time, right? should just need to parse what's on screen +/- a line boundary, maybe even less the code i posted had many obvious optimizations painfully not taken, too.
I'll post the highlighter code somewhere though, so I don't feel it went to total waste lol.
please do.

Lol fuck no (not to the post code part, I'll do it uh probably tomorrow idk). First I tried parsing every visible line of code every frame, just for giggles. Then I did some testing for some average computing time for storing and reparsing average lines of code, measuring it up against the free unused milliseconds in my main loop. Wasn't too good. I didn't think of snail's idea of spreading the work out over multiple frames (because honestly seeing the results was a bit disheartening), but that at least is plausible. For now, I'm going to focus on functionality, hopefully getting it "fully-functional" and ready for a first release. (I'm hoping people will download and use it and discover all the bugs I must be missing, so I can comfortably proceed further in development without, like, basing critical functionality and broken code.) It would be super cool to get highlighting like working and cosmetically beautiful. If so, there's nothing really stopping me from like implementing like custom syntax highlighting stuff, like, highlighting bars and outlining def blocks and so on. _{like like like, like, like}

~6 years agoEdited ~6 years ago by MZ952

MZ952 #10

Hmm thought of a better way to store the data and edit the data (at the expense of something else, of course v:v). I feel torn on this, but, like, the numbers don't lie. I should be able to do it. However, it could potentially be cosmetically jarring. Like, suppose you backspace on a line. Depending on the length of that line, it could be anywhere from a second to like 30 seconds to like a minute (again, length of a line matters). Now, as that line is processing, you scroll somewhere else and fumble a few more lines. Oops, accidentally like idk added and then removed a char on a line like +1000 chars long? Say goodbye to highlighting for a few minutes lol. Actually, there's ways around that of course. Unless I'm ill, I'll have the thing doing parsing check changed tokens. But still, it *could* be jarring. Ex: copy-pasting, adding a " or ' somewhere. Anything that changes the highlight state of large numbers of tokens. (Unless I treat those cases separately, like, a simple flag or something. For the seamlessness. Gahh, damnit. I'm going to stop for the night.)

~6 years agoEdited ~6 years ago by MZ952