Unlocking JavaScript secrets: reverse engineering code with ChatGPT

Discover how ChatGPT can unravel the secrets of minified JavaScript and transform it into a human-readable format, making it accessible for web scraping, security research, and reverse engineering obfuscated scripts.

Content

Hi, we’re Apify, and our mission is to make the web more programmable. A big part of that in 2023 is getting better data for AI, so that's what we're doing. Check us out.

Since the release of OpenAI's GPT 3, it has emerged as a powerful tool with a wide range of applications, including generating and debugging code, creating content, language translation, and even creative writing. I'm personally very interested in how it might help me delve into the world of minified JavaScript and harness the power of AI code understanding to unlock its secrets for web scraping.

This process goes beyond just web scraping. It can also be a valuable asset in security research, allowing you to identify malicious scripts or NPM modules. Remember the infamous flatmap-streamsecurity breach in the JavaScript ecosystem, where a malicious dependency stole crypto? That package was not just minified but also obfuscated. I'll talk more about obfuscation later in this post.

Reverse engineering websites for automation and web scraping

For web automation developers, reverse engineering a website has become a daily routine for optimizing their solutions. Leveraging the website's internal functionality through code is often faster than waiting for UI events. The chances of changing the internal function, like fetchPosts, are lower than the UI structure or selectors. However, websites often employ minification to reduce file size and improve loading times, making the JavaScript code nearly indecipherable to human eyes.

Real-world example of making sense of minified JavaScript

Now, let's dive into a real-world example of how ChatGPT can come to the rescue on a popular social networking website. Imagine you want to fetch comments from this website, and after digging through the dev tools, you find a function that does the job. But there's a catch – the function body is anything but human-friendly, thanks to minification. This is where the power of ChatGPT's code understanding can make the seemingly incomprehensible code more accessible to us. Let's see how we can leverage this technology to demystify the minified JavaScript and extract valuable insights for web scraping.

fetchComment$(e) {
                return (0,
                m.of)(e).pipe((0,
                _.M)(this.state$, this.home.state$), (0,
                g.U)((([e,t])=>({
                    query: e,
                    item: t
                }))), (0,
                y.b)((({item: e})=>{}
                )), (0,
                v.z)((({query: {aweme_id: e, insert_ids: t}, item: o})=>{
                    var n;
                    if (!(null == o ? void 0 : o.hasMore))
                        return p.E;
                    const i = null !== (n = null == o ? void 0 : o.cursor) && void 0 !== n ? n : "0";
                    return this.service.getCommentList({
                        insert_ids: "0" === i ? t : void 0,
                        aweme_id: e,
                        cursor: i
                    }).pipe((0,
                    y.b)((e=>{
                        (0,
                        k.k)(e.status_code, [0])
                    }
                    )), (0,
                    h.z)((t=>{
                        var n, i, a;
                        if (0 === t.status_code && (null === (n = t.comments) || void 0 === n ? void 0 : n.length)) {
                            const {users: n, comments: a, commentItems: r} = this.processCommentList(t.comments)
                              , s = null !== (i = null == o ? void 0 : o.comments) && void 0 !== i ? i : []
                              , d = c()(s.concat(a), "cid")
                              , l = [this.user.getActions().multiSetUser(n), this.commentItem.getActions().multiSetCommentItem(r), this.getActions().setCommentItem({
                                comments: d,
                                hasMore: Boolean(t.has_more),
                                cursor: t.cursor,
                                loading: !1
                            })];
                            return t.total && l.push(this.item.getActions().setCommentCount({
                                id: e,
                                commentCount: Number(t.total)
                            })),
                            (0,
                            m.of)(...l)
                        }
                        return 0 !== t.status_code || (null === (a = t.comments) || void 0 === a ? void 0 : a.length) || t.has_more ? (0,
                        m.of)(this.noop()) : (0,
                        m.of)(this.getActions().setCommentItem({
                            hasMore: Boolean(t.has_more),
                            loading: !1
                        }))
                    }
                    )), (0,
                    f.O)(this.getActions().setCommentItem({
                        awemeId: e,
                        loading: !0
                    })), (0,
                    b.l)(this.getActions().setCommentItem({
                        loading: !1
                    })), (0,
                    I.n)({
                        shouldShowMsg: !1
                    }), (0,
                    C.R)(this.dispose$))
                }
                )))
            }

Now comes the exciting part - we have two options with ChatGPT. We can either ask it for an explanation of the minified code, or request it to rewrite the code in a more human-friendly format. Let's try both options to see the magic of ChatGPT in action! With just a simple copy-paste of the minified code into ChatGPT, we are amazed as we automatically receive a clear explanation of the code, making it comprehensible to humans.

So far, so good, but I want to deeper into this process and unlock the true potential of ChatGPT in unraveling complex code structures.

This code defines a function in TypeScript called "fetchComment$". It appears to be using the RxJS library to create an observable stream and apply various operators to it.

The input to this function is an event "e". The function returns an observable created from the input event and applies various operators to it:

"M" operator: combines the state$ and home.state$ observables
"U" operator: maps the combined observables to an object with properties "query" and "item"
"b" operator: maps the input object to an empty object, discarding the original input
...

Indeed, while getting an explanation of the code is helpful, sometimes we might want a more tangible transformation of the minified JavaScript into a human-readable format. ChatGPT makes this possible with just a simple prompt.

By using a prompt like Can you rewrite the code to be readable using the clean code style?, we unleash the true potential of ChatGPT in transforming the code into something meaningful and comprehensible.

fetchComment$(event) {
  return of(event)
    .pipe(
      combineLatest(this.state$, this.home.state$),
      map(([query, item]) => ({ query, item })),
      tap(({ item }) => {}),
      switchMap(({ query: { aweme_id, insert_ids }, item }) => {
        if (!item?.hasMore) {
          return empty();
        }

        const cursor = item?.cursor || '0';
        return this.service.getCommentList({
          insert_ids: cursor === '0' ? insert_ids : undefined,
          aweme_id,
          cursor,
        }).pipe(
          tap(response => {
            checkStatusCode(response.status_code, [0]);
          }),
          switchMap(response => {
            if (response.status_code === 0 && response.comments?.length) {
              const { users, comments, commentItems } = this.processCommentList(response.comments);
              const existingComments = item?.comments || [];
              const allComments = uniqBy([...existingComments, ...comments], 'cid');
              const actions = [
                this.user.getActions().multiSetUser(users),
                this.commentItem.getActions().multiSetCommentItem(commentItems),
                this.getActions().setCommentItem({
                  comments: allComments,
                  hasMore: Boolean(response.has_more),
                  cursor: response.cursor,
                  loading: false,
                }),
              ];

              if (response.total) {
                actions.push(this.item.getActions().setCommentCount({
                  id: aweme_id,
                  commentCount: Number(response.total),
                }));
              }

              return of(...actions);
            } else if (response.status_code !== 0 || response.comments?.length || response.has_more) {
              return of(this.noop());
            }

            return of(this.getActions().setCommentItem({
              hasMore: Boolean(response.has_more),
              loading: false,
            }));
          }),
          startWith(this.getActions().setCommentItem({
            awemeId: aweme_id,
            loading: true,
          })),
          endWith(this.getActions().setCommentItem({
            loading: false,
          })),
          catchErrorWithMessage({ shouldShowMsg: false }),
          takeUntil(this.dispose$),
        );
      }),
    );
}

The transformation of the minified code into a readable format using ChatGPT is truly a game-changer. While the rewritten code may not be directly runnable, it serves its purpose in helping us understand the code better compared to the cryptic minified format. But the magic of ChatGPT doesn't end there! We can even ask for a code example on how to use the fetchComment function. With this additional feature, we gain valuable insights into how to properly feed data to this function and utilize it in our web scraping.

instance.fetchComment$({
  aweme_id: 'some_id',
  insert_ids: ['some_insert_id_1', 'some_insert_id_2']
});

And there you have it – a glimpse into the incredible capabilities of ChatGPT in helping you understand code better. Whether it's decoding minified JavaScript, transforming it into readable format, or obtaining code examples, I've found ChatGPT to be a great tool. And I believe that's true for developers, web automation experts, and security researchers alike.

Further research into AI code understanding

Another interesting application of ChatGPT's capabilities in code understanding is in the field of reverse engineering of obfuscated scripts used for web scraping protections. Obfuscation is a technique used to make code unreadable not only for humans, but also for machines, and it involves encoding strings and adding dead pieces of code to make the code structure more complex. Deobfuscating such code manually can be tedious and time-consuming. It requires the identification of the dictionary array containing encoded string values, the decoding function, and then replacing all the function calls with their return values using code manipulation tools.

Here's an example of obfuscated code:

const _0x38a2db = ['\\x54\\x6f\\x74a\\x6c', '\\x6c\\x6f\\x67', '\\x3a\\x20'];
const _0x9b58d9 = function(_0x39ddb7) {
	return _0x38a2db[_0x39ddb7 + (-0x6d5 + 0x58 + 0x11 * 0x62)];
}, _0x498b9b = function(_0x48d808, _0x14da1e) {
    return _0x9b58d9(_0x48d808);
}, _0x34c7bc = function(_0x16af1d, _0x27a29e) {
    return _0x498b9b(_0x16af1d);
}, _0x23a1 = _0x34c7bc;
let total = 0x2 * 0x109e + -0xc * 0x16a + -0x3234;
for (let i = 0x1196 + 0x97b * 0x3 + -0x2e07; i < -0x95 * 0x38 + -0x1a75 + -0x619; i++) {
	total += i;
}
console[_0x34c7bc(-(0x1e7c + -0x1 * 0x1367 + 0x2ef * -0x11))](_0x498b9b(-(0x1020 + 0x253 + 0x7 * -0x2a2)) + _0x23a1(-(0x12c5 + -0x1887 + -0x1 * 0x5c5)) + total);

And here's its deobfuscated version:

let total = 0;
for (let i = 0; i < 10; i++) {
  total += i;
}
console.log("Total: " + total);

With the help of ChatGPT, it could be possible to automate this process. ChatGPT could be used to identify the dictionary array and decoding function by analyzing the code, and then utilize code manipulation tools to replace the encoded values with their decoded counterparts. Additionally, ChatGPT could assist in removing dead code and providing a simplified explanation of what the code does.

If this automated approach could be realized, it would greatly benefit the security checking of NPM modules and the JavaScript ecosystem as a whole. By leveraging ChatGPT's capabilities in code understanding, the process of reverse engineering obfuscated scripts could be streamlined and made more efficient, saving time and effort in analyzing web scraping protections. Further research in this area could potentially unlock new possibilities in code analysis and security assessment, contributing to the advancement of web automation and JavaScript ecosystem security.

Fast, reliable data for your AI and machine learning · Apify
Get the data to train ChatGPT API and Large Language Models, fast.

Web scraping for AI

Petr Pátek
Petr Pátek
Full-stack developer at Apify.

Get started now

Step up your web scraping and automation